Use of AI in Medical Education, Medical Diagnostics, Clinical Support - and Challenger
What that quote conveys is the reality that generative AI is improving rapidly, and every month brings new and better capabilities. Within weeks or months, what you are using today will be replaced by better and faster AIs, and, within some technological and theoretical constraints, AIs will just keep getting better.
What does that mean for the practice of medicine and for medical education? To explore that, we need to agree on some definitions.
Large Language Models (LLMs) like the ones Challenger uses aren’t what computer scientists in the 1950s envisioned when they coined the term 'Artificial Intelligence.' LLMs do not possess understanding, reasoning, or consciousness. They excel in one area — nuanced search and summarization.
To call them a hyperactive search engine does not really do the algorithmic technique justice, but it’s a handy way for humans to think about them right now.
AIs like Challenger uses are forms of neural networks. A neural network is a program design technique that models itself on the human brain. While not exactly analogous, the neural net models simulate the connections in a human brain to make a pattern identification machine. LLMs today may have over 1 trillion “neurons” or connections. The human brain is estimated to have 100 trillion synaptic connections.
The differences don’t end there. Neurons in the human brain have much more (and really, somewhat poorly understood) feedback mechanisms and biochemical processes and operate as massive parallel computing devices. Digital analogues do not. But it turns out that a ‘trillion’ neuron analogues is sufficient to do one thing with superhuman capability — process and understand language and the structures of information.
You are going to see amazing things out of these AIs, limited though they may be by their design and by technological limitations. Natural language voice processing that makes Siri or Alexa look like a bicycle compared to a jet plane. Conversational capabilities that make them almost indistinguishable from humans. Understanding of most any language that has a sufficient quantity of source material available for training. Visual processing and building, advanced problem-solving, simulation, and modeling — it turns out many things can be achieved with larger and better-trained neural networks.
But they are adjuncts to human capabilities, not replacements. What mechanical engines did for mechanical power, AIs will do for brain power. Engines amplified muscle power. Suddenly you didn’t need sixty workers on the farm, you needed a tractor and ten people, and eventually, a bunch of equipment and just one person. Similarly, AIs can be used to supplement and enhance human attention capabilities.
An AI might carry on a conversation with a patient for an hour, explaining follow-up care and expectations in great detail, and remain available 24/7 for further follow-up questions at the dial of a phone — the physician cannot. An AI can monitor a patient's vital signs in a hospital room (or dozens of rooms) 24/7 without distraction, but a human can’t.
The first steam engines in factories were for cotton-spinning in the 1770s. The first gasoline-powered tractor was deployed in 1892. We didn’t know what kind of world industrial engine power would usher in, and we don’t know what kind of world the AIs will enable. But it will still be a human world because these machines will still have inherent limitations in reasoning, understanding, consciousness, and will.
What We Can Do Now
Neural network AI development has been around since the 1960s, but generative AI like we’re seeing today sprang up in the early 2020s. The algorithms and structures have enabled a lot of different types of generative AIs. Large Language Models for natural language processing; Generative Adversarial Networks (GAN) for image, video and data creation; Convolutional Neural Networks (CNN) for visual pattern recognition. These techniques have all spawned a host of other network types and models.
Challenger primarily uses AI in the form of LLMs. We use them currently for two things that the customer sees. The first is the use of generative AI to create supplementary material adjacent to the course. Glossaries and structured data about diseases and conditions are referenced in our Q&A. The second is the automatic translation of some supplementary information into 21 other languages.
Here is also where the cautions come in. The types of AI we’re using have been commercial products for only a year. They haven’t even reached their second birthday. As Sam Altman’s quote is a guide to what the future of AI holds, Dr. Coiera’s sobering quote is a guide to the stakes of what we’re dealing with. Understanding the limitations and likeliest types of errors that current AIs will generate helps us make the best use of them.
Training Lag
Error rates in printed medical textbooks were 3% to 5%. The normal cycle for authoring, review, and publication — and getting into the students' hands — was three years. Slippage in standards of care and current guidelines was normal. The internet and electronic publishing sped up the cycle, increasing accuracy, but 2% to 4% was still considered normal, and electronic edits and supplements helped fill in that gap.
The internet brought new types of errors into medical training information, with new types of problems. LLMs have these same problems.
On December 5th, 2023, the American Heart Association issued new guidelines for certain atrial fibrillation patients, moving catheter ablation to a frontline treatment on some patients, in place of drug therapy. Had you asked an LLM about guidelines on catheter ablation on December 7, 2023, it would have responded incorrectly. If you asked again in March 2024, it probably still would have responded incorrectly. Ask it today, and it will get it right.
In drug-resistant HIV treatment, the protocol is genotypic resistance testing and optimizing antiretroviral therapy (ART), including integrase inhibitors. But since 2023, capsid inhibitors, like lenacapvir, have become more common. If you ask an LLM the protocol for drug-resistant HIV, it gives you ART. If you ask about newer methodologies approved by the FDA, then capsid inhibitors are mentioned as approved for use if other therapies have not been successful.
Ask an LLM for guidelines on acute appendicitis, and you will get a perfect response. Ask for information where complex treatments exist and change rapidly, and you will see both training lag and the problem we describe next.
The AIs limitations will be reflected in Challenger’s supplementary information. We’re always using the most currently trained versions. And our supplementary material will be as accurate as the current version of the AI understands.
Garbage In Garbage Out (GIGO)
To talk to an LLM successfully means the human needs to carefully specify what the information is, and how it is to be requested. In the example above, by talking to the LLM about all available treatments, and giving it as much information as I had, a mention of capsid inhibitors as a follow-on to other unsuccessful treatments would have been presented. Had I just asked the protocol for HIV treatment, it would not have been. The human working with the AI needs to have extensive knowledge themselves and guide the AI to the goal of the response.
Tell the LLM “rlq pain wbc 14” and you get appendicitis at the top of the differential diagnosis.
Tell the LLM “f, 31, +hcg, rlq pain wbc 14” and you get ruptured ectopic pregnancy at the top.
The more and better information provided, the better the result will be. The AI can do some useful things, like coding, talking to patients, recording hideous EHR information. Even documenting for thoroughness, “No hCG test performed by admitting despite RLQ pain WBC 14”.
That’s why you see the structured calls in the AI supplementary information we use. We’re carefully prompting for currency, authority, and learning objectives relevant to the topic.
Training Bias
Bias here refers to the LLMs' lack of awareness of seasons, geography, regions, environment, population groups — pretty much everything. If the LLM training data is heavily urban, or regional, it may not have a good picture of possible diagnoses. Hantavirus, for instance, is twice as likely in rural forested areas as in urban areas. Coccidioidomycosis (Valley Fever) is more likely in Arizona and the arid regions of California, but can be easily misdiagnosed as a cold or bronchitis.
And here is another aspect of the GIGO problem - those are cases where the LLM is likely to respond more correctly if it has information about the patient’s residence or travel. But it can also throw false positives in the same way.
In medicine, the problem is that patients rarely shows up with just one problem. They show up with a problem, and a set of comorbidities. In our learning material, we do tend to focus on standard textbook cases, and simple presentations of a standard disease pathophysiology is something the LLM does well. But when evaluating an AI’s response as a diagnostic backstop or scanner, be aware of the potential for biased response.
Challenger Corporation has a long history in AI. The company’s first acquisition, way back in the 1990s, was an AI company. It’s an area where we have some broad expertise. We produce 28,000 Q&A, written and curated by humans, backed up by 18,000 images. If we weren’t confident that LLMs had progressed to the point of producing usable supplementary information at a quality level higher than standard publication, we wouldn’t be including it in products.
As it is, we believe they have progressed well beyond standard error rates for the types of applications we have in education. They are also progressing rapidly to the point where interaction with the learning material is of interest. It’s fascinating to watch an LLM narrow down a diagnosis and support by throwing disparate data at it. We’re looking forward to releasing more supplementary features in the near future.
"The integration of AI in healthcare should be about augmenting human capacity and reducing the burden on healthcare professionals.”
- Nathan Pinskier, MBBS