|WHAT WE’RE READING
Stuff we read this week that made us think.
Artificial intelligence roundup: What healthcare needs is “slow AI”
Artificial intelligence (AI) was back in the spotlight this week, with CMS announcing the launch of the AI Health Outcomes Challenge, creating a contest for the development of AI-driven applications to predict adverse events and patient outcomes. Co-sponsored by the American Academy of Family Physicians and the Laura and John Arnold Foundation, the initiative will reward $1.65M in prize money to organizations who devise AI methodologies that use Medicare Part A and Part B claims data to accurately predict unplanned hospital and skilled nursing facility admissions and adverse events, with the goal of putting a tool in the hands of frontline physicians. The timeline for the program is tight, with initial submissions due on June 18. CMS will select first-stage finalists in July, who will gain access to broader data sets and can continue to develop their products across the year, with final awards to be announced in April 2020.
The CMS initiative comes amid a growing debate about how and when AI will make a definitive impact on healthcare. We read two pieces this week that illustrate radically different approaches to developing AI tools, and the implications for patient outcomes. Many AI tools to date have been developed by the private sector, fueled by investor capital hoping to profit from a transformative blockbuster application. Wired magazine profiled the work of Babylon Health, a consumer-directed telemedicine and triage platform that has expanded rapidly in England’s National Health System (NHS), despite growing concerns about its accuracy. Founded by a healthcare entrepreneur and “app-maker”, Babylon is a technology platform integrated into a novel London-based primary care offering, GP at Hand. GP at Hand is a general practice (GP) “surgery”, akin to an American primary care practice, that offers virtual care as well as in-person visits. Patients in the NHS are required to register with a GP as their main source of care. Taking advantage of an NHS rule change allowing patients to register with practices outside of their local area, GP at Hand marketed their virtual care offering to attract thousands of new patients—mostly healthy young adults—from across London.
Central to GP at Hand’s platform is a “chatbot” diagnostic engine powered by Babylon Health. Patients input symptoms and are led through an AI-driven diagnostic tree of questions that determine a likely diagnosis and risk level for the patient’s condition, giving a recommendation on whether the patient should go to the hospital, see their GP, or monitor their symptoms at home. Clinicians from GP at Home, as well as doctors who used the platform as a patient might, have questioned the diagnostic accuracy of the Babylon engine. Concerned GP at Hand doctors performed an internal audit of the system, finding that Babylon missed warning signs of a serious condition or was “flat out wrong” 10 to 15 percent of the time.
A look at Babylon’s development process revealed serious questions about whether the tool has been properly tested and vetted. At a simple level, Babylon’s methodology was created by physicians who took a hypothetical symptom and developed the probability of a range of underlying diagnoses, creating guiding questions to triage among the options. While this may be a decent start to an algorithm, Babylon has not worked to refine the tool using feedback from actual patient outcomes. Nor has the company submitted its data for peer review, with Babylon’s head doctor saying that “the methods for peer review are quite limiting”, and it would take “several months or a year to produce”. Despite these concerns, Babylon’s reach continues to expand. The NHS contracted with Babylon to supplement its phone-based advice line, and earlier this month the company launched a partnership to bring the chatbot to Canada, where patients in British Columbia will now have access.
While Babylon is focused on direct-to-consumer AI technology, a study in Nature Digital Medicine evaluated the efficacy of an AI application to measure the activity level of critically ill ICU patients. Researchers from Stanford University and Intermountain LDS Hospital in Salt Lake City, UT used “machine vision” to continuously monitor ICU patients during day-to-day tasks. To maintain privacy, researchers used depth sensors to collect 3-D silhouette data of patients and staff and developed algorithms to analyze the footage to determine whether patients were moving, and how many staff were involved in supporting them. The technology accurately characterized patient activity 87 percent of the time. Accuracy in determining number of medical personnel in the room was lower (68 percent), but investigators felt this could easily be improved by increasing the number of sensors. Healthcare AI experts hail these results as groundbreaking, potentially laying the foundation for refining patient care in the ICU, and furthering automated analysis and optimization of patient activity and staff workflow in other settings.
The Stanford-Intermountain study is also notable for its comprehensive approach and design. Researchers worked with clinicians in advance to analyze the complex ICU environment and get input on workflow challenges. The process was thorough and slow, but critical to design a system able to scale safely and effectively. This pace of development might not be tolerated by the culture and financial model of private-sector AI research. Babylon’s developers, by contrast, espoused a Silicon Valley-like ethic of (to paraphrase Facebook founder Mark Zuckerberg) “moving fast and breaking things”. Releasing partially-tested technology and later refining it based on user feedback may be a fine development process for a shopping or social media app, but putting forth technology that hasn’t been thoroughly vetted to give clinical guidance carries huge risks for patients.
With industry interest in AI growing, healthcare would benefit from more “slow AI”, developed with clinical and scientific collaboration and rigorous academic study design and testing, over “fast AI”, with pressure to generate returns for private investors pushing entrepreneurs to rapidly develop and deploy technology. We are also convinced that the greatest early impact from AI in healthcare will come not in aiding clinical diagnosis or producing whiz-bang consumer apps, but in improving staff workflow and non-clinical processes. We spent time recently with a group of experts in robotic process automation and came away optimistic that large portions of back-office tasks like claims processing, pre-authorization, and revenue cycle could be performed using machine-based algorithms. Another recent example: voice-activated technology like Amazon’s Alexa can replace a nurse call button with faster response times that both lower staffing burden and increase patient satisfaction.
Even with a $1M top prize, the new CMS AI Challenge provides a payoff that’s meager compared to the lofty aspirations of many healthcare entrepreneurs. But we’re hopeful that the initiative will support the development of vetted frameworks that distill predictive insight from Medicare claims data (although it would be much better if this information could be combined with actual clinical data). That approach, like projects to study patient movement in the ICU or automate claims processing, may seem dull in comparison to AI-driven technology that purports to replace doctors or helps consumers diagnose themselves—but we believe healthcare should prioritize slow over sexy in the path to integrate AI into care delivery.