Et tu Apple Watch: Integrating Siri with an AI engine to reduce patient wait time

The Patient is in apps integrate with Siri to provide a voice interface for messaging between the doctor and the charge nurse. This includes listening to patient assignments, rejecting an assignment, accepting an assignment and providing an estimated time of arrival, and lastly notifying the charge nurse of assignment completion which allows the staff to start the cleaning process to reduce the wait time for the next patient.

On both the iPhone and Apple Watch, this enables the doctor to use AirPods and a few other Bluetooth headsets to remotely manage patient assignments exclusively with her voice. For shorter distances, the “Hey Siri” voice trigger also works well and that is the exclusive technique for hands-free voice control on Apple Watch as of watchOS 3.2. Click on this link to learn about Siri support in the Patient is in.

In this article, we will look at the underlying natural language processing (NLP) engine built for Siri integration with the Patient is in messaging features.

Natural Language Processing Theory

NLP, is one of the core AI speech technologies along with text-to-speech (TTS) and speech-to-text which is also known as speech recognition. Advances in text-to-speech has led to more natural sounding computer voices and advances in speech recognition has led to better audio transcription.

NLP attempts to understand the meaning or intent of a sentence and to do that, an NLP engine must first be able to determine in which language is the sentence. Next, the part of speech of the words used in the sentence, as well as other components of a language such as word stems and contractions must be identified. For example, the doctor’s sentence: “I’ll go to the front office” would be decomposed or parsed into the pronoun “I”, the verb “will”, the verb  “go”, the preposition “to”, the determiner “the”, the adjective “front”, and the noun “office”.  So even though the doctor spoke the word “I’ll” the NLP engine had to understand the concept of American English language contractions and process the two words “I” and “will”. When the doctor annunciates correctly, then everything works well as we see in these screen shots:

Siri in Healthcare: Combinators and the Patient is in

Beyond contractions, advanced natural language processing algorithms must also consider the accent with which a user speaks. From the NLP engine’s point of view, everyone has an accent whether its the slow-taking southerner’s accent, the neutral mid-atlantic accent, the fast-taking New Yorker’s accent, a Bostonian accent, or perhaps the accent of a partial deaf therapist. And of course, when we are tired, we all tend to mumble a bit making us sound inarticulate. This also happens to surgeons and anesthesiologist after a middle-of-the-night emergency surgery. So, it should not be surprising that the intended word “I’ll” can be misspoken just enough to be transcribed as the word “all” as we see in the following screen shot:


The NLP engine used in the Patient is in apps on both Apple Watch and iPhone compensates for these common transcription errors because the algorithms were tailored for a doctor’s use of Siri.

Other transcription errors can seem insurmountable to deduce the doctor’s intent. For example, can you fix the phrase “go to low or one”? The app’s NLP engine was able to correctly determine that the doctor intended “I’ll go to OR 1” as we see in these screen shots:

You should also notice that the NLP engine also understands American Language homophones as in “8 and eight and ate”, “4 and for and four”, et tu “2 and to and too and two”.

If we reexamine the classic NLP pipeline advocated in AI research as mentioned at the start of this article, we learned that it started with first identifying the context language and then parses the text into parts of speech. When we applied this to the doctor’s message we saw that the phrase “Exam Room” was decomposed and identified as the adjective “front” and the noun “office”. While linguistically correct, that does not help us because the app has to notify the charge nurse that the doctor has accepted the assignment to go to the specific room named Exam Room to treat a patient or perform surgery.

Unlike the Siri messaging support in apps such WhatsApp or Messages which simply relay the transcription provided by Siri’s speech recognition, the Patient is in must determine which room is the doctor discussing, if the doctor is rejecting an assignment, completing an assignment, or accepting an assignment and providing an estimated time of arrival which may be provided in hours or minutes.

The NLP engine uses multiple algorithms and text processing techniques to best ensure that the doctor’s intent is correctly captured even if the doctor’s words were incorrectly transcribed. The approach to use multiple algorithms and extraction techniques is relatively new and is called parser combinators. The NLP engine used in the Patient is in apps on both the iPhone and Apple Watch adds many Siri-specific and doctor-specific algorithms to the classic NLP approach.

Due to the uncertainty in processing speech, a new user interface idiom has evolved. Called the conversational user interface, it enables Siri to mediate a conversation between the user and the app and more abstractly between the doctor and the charge nurse.

Introducing Conversational Interfaces

Like in real life, rarely is information unambiguously clear. If you ask a taxi driver to drive you from the John Wayne Airport in Santa Ana, California to your office “on Main and MacArthur in the next town over” which is a few blocks away in the city of Irvine, he may instead drive you a few miles further to Main and MacArthur in the city of Costa Mesa as both cities are adjacent to Santa Ana and not only have the same street names but are actually the exact same streets which intersect in two different cities a few miles apart.

With voice interfaces, the app has to support a conversation to clarify the user’s words by asking for more information from the user either because the user has not provided enough information or has provided ambiguous information. The app must also confirm its understanding before taking action on behalf of the user.

Siri mediates this conversation between the user and the app and handles the speech recognition and passes the text transcript to the app which processes that with natural language algorithms and other text processing technologies. If the app needs additional information, it asks Siri to prompt the user by providing Siri with text to read to the user. Siri uses text-to-speech to ask those questions. After the app is satisfied that it understands the user’s request, the app asks Siri to ask the user to confirm or reject those assumptions and with the user’s permission, the app finally processes the user’s request.

What’s next in Conversational Interfaces?

Voice technologies and conversational interfaces empower app developers to use AI to provide alternative ways for users to access their app. These technologies, also extend access to a larger set of users. Through a voice interface, perhaps app developers will finally realize that accessibility and usability are fundamentally intertwined or as Tim Cook recently said in an interview marking Global Accessibility Awareness Day 2017,  accessibility is a human right.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: