March 27, 2023

Can GPT-4 Analyze Medical Questions?

By David Beard · 5 minute read

We had GPT-3 come out as a chatbot from OpenAI in November of 2022. It was fun and interesting, and had a personality. It responded to one of my questions with an answer that began with "If you're not a complete idiot...".

It performed pretty well on my appendicitis questions. I picked appendicitis because the model was trained on the internet, and the internet has millions of consumer-grade pages about appendicitis.

GPT 3.5 and GPT 4 were introduced this week.

GPT 3.5 is high speed, and doesn't call me an idiot. Which I think is kind of sad - I liked the sassy AI. GPT-4 has some significant differences from GPT-3. The Challenger Team has been testing GPT-4 with medical questions today, primarily questions on which we've received problems or reports from our users. We've also been running significantly large medical papers through the system to see how it handles natural language queries.

Let's test out GPT-4 on a medical education question.

Here is one of our user reported questions:

Question Id: 156401 (yes, we have a lot of Q&A)

Question Name: Prescribing Controlled Substances

Question: You have agreed to consider prescribing opioids to a 47-year-old man with severe neck pain following a motor vehicle accident 2 years ago. You complete a thorough risk assessment for opioid misuse and you determine that he is at low risk. You perform a random urine drug screen prior to initiating treatment. What of the following statements is true?

Answers:

If the urine immunoassay is negative for benzodiazepines, this excludes recent use of all benzodiazepines.
If the urine immunoassay is negative for opiates, this excludes recent use of all opiates.
Urine drug testing is not indicated as he is low risk for opioid misuse.
The urine immunoassay for amphetamines may be positive if the patient has recently taken pseudoephedrine. (correct answer)

Incorrect Remediation:

Educational Objective: Recognize the need for urine drug testing prior to initiating opioids and interpret the results appropriately. Key Point: All patients initiating opioid therapy should undergo urine drug testing prior to starting therapy to ensure there is no use of other substances which may increase the risk of harm from opioids. Interpreting the results of the test can be challenging and clinicians need to understand the implications of negative and positive results.

Explanation: According to the 2016 CDC guidelines, all patients should undergo urine drug testing prior to initiating opioid therapy, regardless of their apparent risk of a subsequent misuse disorder, as currently available risk assessment tools remain inaccurate. In some studies, more than half of patients had abnormal urine test results while receiving opioid therapy. Testing should then be carried out at least annually for low risk patients but more frequently for patients deemed to be at higher risk Interpretation of the results can be difficult as clinicians need a thorough knowledge of potential false positives and negatives.

Screening immunoassays for benzodiazepines have a false negative rate of almost 30%. Most screening immunoassays do not detect clonazepam or Similarly, the current urine immunoassay for opiates can accurately detect the natural opiates, codeine and morphine, although it cannot distinguish which of these 2 is present. It will also detect the presence of heroin. There is some cross-reactivity with the semi-synthetic opioids and these may show as positive on the urine opioid immunoassay. Semi-synthetic opioids include hydrocodone, hydromorphone, oxycodone, and oxymorphone. The urine opiate immunoassay cannot detect the synthetic opioids such as methadone, fentanyl, and meperidine.

Amphetamines immunoassays are highly cross-reactive and hence caution should be taken when interpreting the results. The immunoassay may detect other sympathomimetic amines, such as ephedrine, pseudoephedrine or selegiline. Whenever there is an unexpected urine drug testing result, the patients should be given the opportunity to provide an explanation prior to confirmatory testing. In addition, a lack of a positive result for a prescribed opioid is concerning.

References:

Dowell D, Haegerich TM, Chou R. CDC Guideline for Prescribing Opioids for Chronic Pain - United States, 2016. MMWR Recomm Rep. 2016;65(1):1-49.

Gudin JA, Mogali S, Jones JD, Comer SD. Risks, Management, and Monitoring of Combination Opioid, Benzodiazepines, and/or Alcohol Use. Postgraduate medicine. 2013;125(4):115-130.

Washington State Agency Medical Directors’ Group. Interagency Guideline on Prescribing Opioids for Pain. June 2015. Accessed September 4th, 2018.

Educational Objective:

Recognize the need for urine drug testing prior to initiating opioids and interpret the results appropriately. Key Point: All patients initiating opioid therapy should undergo urine drug testing prior to starting therapy to ensure there is no use of other substances which may increase the risk of harm from opioids. Interpreting the results of the test can be challenging and clinicians need to understand the implications of negative and positive results.

I've cut out the correct answer remediation and a few other fields. The question was reported as having an incomplete sentence.

Asking GPT-4 for a medical education question revision.

Me: Is there an incomplete sentence?

GPT-4: Yes, there is an incomplete sentence in the original response. The incomplete sentence is:

"Screening immunoassays for benzodiazepines have a false negative rate of almost 30%. Most screening immunoassays do not detect clonazepam or..."

The corrected sentence should be:

"Screening immunoassays for benzodiazepines have a false negative rate of almost 30%. Most screening immunoassays do not detect clonazepam or certain other benzodiazepines, so a negative result does not necessarily exclude their use."

GPT-4 found the missing sentence, and completed it for me.

Me: What can cause a positive urine drug screen for amphetamines?

GPT-4: A positive urine drug screen for amphetamines can be caused by various factors, including the use of prescription medications, over-the-counter medications, dietary supplements, or illicit substances. Some of the common causes are:

Prescription medications: Some medications prescribed for attention deficit hyperactivity disorder (ADHD), narcolepsy, or obesity, such as amphetamine salts (Adderall), dextroamphetamine (Dexedrine), lisdexamfetamine (Vyvanse), or methylphenidate (Ritalin, Concerta), can result in a positive test for amphetamines.
Over-the-counter medications: Certain over-the-counter medications, especially those containing pseudoephedrine or ephedrine (found in some cold, allergy, and sinus medications), can cause a false-positive result for amphetamines.
Dietary supplements: Some weight loss or workout supplements may contain ingredients like synephrine, phenylethylamine, or other stimulants that can cross-react in the amphetamines immunoassay, leading to a false-positive result.
Illicit substances: The use of illicit drugs such as methamphetamine (crystal meth), MDMA (Ecstasy or Molly), and other designer drugs or stimulants can cause a positive test for amphetamines.
Laboratory error: Although rare, laboratory errors or specimen contamination may lead to a false-positive result.

If a urine drug screen result is positive for amphetamines and unexpected, confirmatory testing using a more specific method, such as gas chromatography-mass spectrometry (GC-MS) or liquid chromatography-tandem mass spectrometry (LC-MS/MS), is recommended to rule out false positives and accurately identify the substance(s) present.

Reasonable accuracy, but still problems.

Within a certain range, GPT-4 can take a structured medical question, or a published paper, and answer questions about it with reasonable accuracy, including pulling in external information related to the subject but not mentioned in the text.

We've had two problems so far.

In one instance, the external information pulled was not current standard of care, it was the prior recommendation. And in a complex science paper where several confidence intervals (CI) were mentioned for study cohorts, it couldn't spot one of the CI, it found the other four, but couldn't answer our questions about CI on the studies correctly.

For common standard-of-care and current guidelines on common diagnostics, this will be a highly significant tool. It is important to understand a bit about what an LLM does. It doesn't know medicine, it predicts how a conversation should go based on the training it has received. An LLM digests complex datasets, whether it's the internet, or all of the patient records in a system, and allows you to ask questions in natural language.

We're going to start testing it on cases with high co-morbidity rates...