Watson Goes to Medical School

Designing IBM’s breakthrough first foray into using AI in a clinical setting

In 2011, IBM's Watson made headlines when it competed on and won Jeopardy!. Afterwards, IBM researchers put the technology to work, creating WatsonPaths, a cognitive system that uses natural language processing and machine learning to solve medical problems.

My challenge was to design an interface that helps medical students at Cleveland Clinic learn how to make better decisions—and as WatsonPaths interacts with users, it also learns and becomes more accurate by incorporating their responses into its immense repository of medical knowledge.

At the core of WatsonPaths is a diagram that visualizes the system's decision making process. In it, users can see how Watson creates multiple levels of inferences based on an initial problem statement, ending with the answers it has the most confidence in on the right.

Visualization of Watsons solution to a medical problem

A typical interaction

The first screen shows that Watson has high confidence that hypokinesia is a finding of Parkinson's disease. The student then has the option to agree, disagree, or view evidence that supports this assertion.

Watson asks if Hypokinesia is a finding of Parkinson’s disease

The next screen shows that the student has disagreed, and as a result, Watson's confidence is reduced from high to medium.

The user indicates that Hypokinesia is a finding of Parkinson’s disease

Problem statements and evidence

Below is the full text of a scenario. Watson uses natural language processing to extract the factors that it believes are most relevant to the solution. These details are underlined, and while Watson uses all the information in the scenario as input, the ones it identifies as significant have the greatest influence on its diagram of the solution and final answer.

Each inference that Watson asserts is based on evidence from a corpus of medical texts it has ingested. The engine is not perfect, however. Watson might misunderstand a phrase or the text itself may be incorrect. In this view, the student can see that Watson has 36 pieces of evidence that support an assertion it has made. They can view each citation and Watson's confidence that it is corroborative. They can also verify or refute its relevancy, thereby supporting or eliminating its impact on the final answer.

Exploring evidence for an assertion Watson has made

“Through our research collaboration with Cleveland Clinic, we’ve been able to significantly advance technologies that Watson can leverage to handle more and more complex problems in real time and partner with medical experts in a much more intuitive fashion. These are breakthrough technologies intended to assist future versions of Watson products.”

— Eric Brown, IBM Research Director of Watson Technologies

(Left) Michael Barborak, Lead WatsonPaths Engineer gives a demo, (Right) Interactive installation at the IBM Thomas J Watson Research Center in Yorktown Heights, NY

Getting started

Inspired by the lean startup movement, my team aimed to build an MVP to put in front of users as quickly as possible. The project was managed using Scrum Methodology and began with a two week "Sprint 0" discovery phase that included:

Interviews with medical students at Cleveland Clinic, NYU School of Medicine, and IBM Research

Design Thinking workshop to create personas and a story map (below) that describes our initial assumptions about the activities and tasks users will want to perform

Creation of a product backlog and initial set of user stories

Storymap that defines the features of the product

How to role-play your users

Once we were confident that the main visualization was a clear and elegant representation of Watson's decision making process, we began imagining ways for users to interact with it.

Based on our initial discussions with medical students, as well as our own internal brainstorming sessions, I superimposed thought bubbles on top of the areas that could generate valuable interactions. For each bubble, I imagined what a user might be thinking, what action their thought would trigger, and how that would provide value back to them.

Illustration of user needs superimposed on an example screenshot

Next, I converted the thought bubbles and their corresponding actions into epics that were added to the backlog. Each epics included the object on the graph that the action would be initiated on as well as an estimated availability based on the design and engineering effort associated with it.

List of epics and their availability date

User testing at Cleveland Clinic

Once we had enough of the system designed, I flew to Cleveland to conduct in-person user testing. Each one-on-one session included an interview, hands-on testing, and an open discussion that allowed us gather additional ideas from the subjects. Given that different parts of the product were in different stages of design and development, the tests included multiple fidelities, from fully implemented code to digital prototypes to paper prototypes.

In order to create a setting that would ensure the subjects were relaxed and focused, I was the only person in the room with them. Members of the WatsonPaths team observes the users’ screen via web conference and the room itself via webcam. I did not want to juggle multiple tools during the sessions and used only a rainbow spreadsheet (below) to track whether or not each user demonstrated 41 different quantitative and qualitative conditions focused on the usefulness and usability of WatsonPaths.

Rainbow spreadsheet used in user testing

In a rainbow spreadsheet, each observation is assigned a row and each subject is assigned a column. As a session progresses, I mark whether or not the condition being assessed has been fulfilled and add a check or 'x' for every row. If a row is filled with checks, I can be reasonably certain that the design's intent has been achieved. If a row includes mostly 'x's, I've identified an opportunity for improvement.

For each row that had a high failure rate, I created a new story in the backlog. A story prompts teams to perform work, and the ultimate outcome of user testing should be new stories that address the issues that were uncovered. The Cleveland Clinic user tests resulted in 10 new stories. Below is an example subset.

Findings and recommendations from user testing

Videos

A demonstration of WatsonPaths and how it works

IBM Senior Vice President and Director of IBM Research, Dr. John E. Kelly discusses the significance of WatsonPaths

Results

Called a "breakthrough in interactive cognitive technology," WatsonPaths represents IBM's first step toward using AI in a clinical setting. In fact, the design patterns invented in WatsonPaths remain in use today

Received in-depth coverage in The Verge, Engadget, Fast Company, and dozens of additional publications

Demonstrated by CEO and President Ginni Rometty to over 40 CEOs at Greenbriar, VA. Until then, Mrs. Rometty had never exhibited working software during a presentation.

“The best non-advertisement advertisement.”

— Ginni Rometty after her presentation at Greenbriar