The landscape of AI technology is evolving at an unprecedented pace, and the future remains largely unpredictable. Despite being an early adopter and frequent user of the OpenAI sandbox (early version of ChatGPT), I confess that I didn’t foresee the explosive growth of AI’s capabilities.
However, it’s vital to strike a note of caution. While the advancements of generative AI (GenAI) platforms like ChatGPT are astonishing and surpass previous expectations for AI, we must stay grounded when predicting its near-term implications. We must also differentiate between types of AI when they are applied in clinical settings.
Working with ChatGPT and similar foundation models could create the false expectation that – very soon – any medical condition could be diagnosed in an image or medical record from a highly generalizable foundation model without the additional work of tuning it to a specific task. While these AI models can prove invaluable for tasks like reducing administrative burdens, the fact is that GenAI models currently do not have the levels of diagnostic accuracy needed in high-stakes clinical settings.
While there remain challenges in adapting foundation models such as ChatGPT to many clinical tasks, the dominant form of AI for clinical practice will remain what we call “Precision AI”. These are models trained for solving specific tasks and above all – achieving diagnostic accuracy to make them valuable in clinical practice.
First, it’s essential to highlight the fundamental question that, while intuitively understood, warrants explicit mention: what is the cost of an error in AI? The risk profile of drafting a consumer email, for instance, is vastly different from making a medical decision.
A recent study at Johns Hopkins revealed that every year in the US, 795,000 patients either die or suffer permanent disability due to medical errors. Clearly, when it comes to creating AI for clinical use, the stakes are remarkably high. So if AI is acting as an aid to physicians, and medical decisions in clinical environments can have such a drastic impact on patient outcomes, clinical AI then must prove its accuracy.
Let’s consider the complexity of healthcare data. It is:
Furthermore, there is a complexity to the tasks you need AI to perform, of which can fall under two broad categories: Detection and Extraction. Current AI systems, including ChatGPT, are used mainly for extraction of insights from the text or corpus they were trained on. However, detection, particularly of subtle anomalies, is far more challenging than extraction.
Consider a radiologist reading a CT scan and detecting a subtle brain aneurysm. This requires “detection” at “diagnostic” accuracy. Once the radiologist writes this finding into the report, anyone reading the report only needs “extractive” accuracy to understand the patient has a brain aneurysm. This is a key differentiator that necessitates “Precision AI” to achieve clinical relevance rather than the extractive accuracy you find in foundation models like ChatGPT.
Achieving accuracy in AI, particularly in healthcare applications, is a more intricate challenge than it might initially seem. And despite its constant advancement, we might still be some distance away from reaching the level of accuracy necessary for effective clinical use of GenAI models. Most GenAI models, like ChatGPT, have been trained/validated to solve problems significantly different from diagnostic-level detection. For example, consider the difference in complexity between answering a question about a text and detecting a subtle brain hemorrhage in a CT scan. The latter is a task of immense precision and subtlety, which might require detecting a subtle change in a 15 pixel needle in a 100 million pixel hay stack. It’s a vastly different problem and the dimension of the problem is immense. Recent research tried utilizing ChatGPT for detection in long text, which is an easy variation of solving the ‘needle in a haystack’ problem. They found that as ChatGPT input size grew (meaning the number of words that you give ChatGPT to search), it was less capable of answering questions about that input, yielding below 60% accuracy.
In short, ChatGPT is not great at finding a needle in a haystack, which is exactly what clinical AI needs.
Aidoc experts, customers and industry leaders share the latest in AI benefits and adoption.
Explore how clinical AI can transform your health system with insights rooted in real-world experiences.
Learn how to go beyond the algorithm to develop a scalable AI strategy and implementation plan.
Explore how Aidoc can help increase hospital efficiency, improve outcomes and demonstrate ROI.