Triple-Threat strategy for using LLMs to answer a medical question

Triple-Threat strategy for using LLMs to answer a medical question


We’ve all heard a lot about using large language models (LLMs) in medicine, but it’s generally in the abstract. I’d like to explore concretely how to apply these models to answer a specific medical question. I make no claim to be an expert in this topic, but rather I would like to propose an approach and see what folks think about it (please comment below!).

 

#1/3: Formulate your own answer

  • Don’t simply outsource all the work to the LLMs.
  • The goal is to use LLMs in addition to what you would normally do to answer the question.
  • Formulating your own answer involves consulting conventional resources (e.g., pharmacopoeias, textbooks, UpToDate).
  • If you’re using LLMs, then you probably won’t be completely sure of the answer. However, it’s still important to at least formulate an opinion.
  • Formulating your own opinion helps avoid blindly following advice from the LLM.

#2/3: Narrow-spectrum medical literature LLM with dense references (OpenEvidence)

  • A narrow-spectrum LLM is based entirely on high-quality medical literature (e.g., PubMed, JAMA, NEJM).
  • OpenEvidence currently seems to be the best narrow-spectrum medical LLM.
  • The advantage of a narrow-spectrum LLM is that it focuses the search on the highest-quality sources. The disadvantage is that it has blinders, which will ignore other high-quality information sources.
  • OpenEvidence seems to have an extremely low rate of hallucinations.

#3/3: Broad-spectrum LLM with dense references (e.g., Perplexity)

  • A broad-spectrum LLM examines a wider range of sources. Many of these sources are excellent and highly reliable (e.g., hospital protocols, health department websites, FDA drug package inserts). Other sources are of lower quality.
  • For more cutting-edge or esoteric questions, the broad-spectrum LLM may be more effective in finding relevant information sources.
  • Dense referencing is essential for any LLM, but particularly critical for a broad-spectrum LLM. To be useful, the LLM must provide frequent references to the sources it uses (ideally, line-by-line citations).  This allows you to evaluate the information source and go deeper as needed.
  • I’ve found Perplexity to be the most useful broad-spectrum LLM, but I’d be interested in others’ experiences (many LLMs fail to provide dense enough referencing to be useful).
  • The LLM may allow you to choose a specific AI model that it uses. Different AI models have different hallucination rates. Currently, Claude Sonnet 4.0 appears to have a relatively low hallucination rate; however, this is expected to change over time as newer models evolve.
  • Ensure that you formulate the question in a neutral manner. For example, if you’re determining whether a drug is useful for a certain condition, don’t ask “what are the benefits of drug X?” You should ask, “Is drug X beneficial?”  LLMs are designed to provide users with answers they prefer, so if you ask the question in a leading manner, it will bias the output.

integration

  • Compare your answer with two LLMs.
  • If all three answers are consistent, this will increase your confidence in the accuracy of the answer.
  • If OpenEvidence and Perplexity disagree, don’t assume that OpenEvidence is necessarily correct. Sometimes, OpenEvidence is “too” evidence-based (i.e., it will equate an absence of evidence with evidence of absence).
  • Any inconsistencies can be addressed by following the references provided in the LLMs to primary information sources.

Photo by SLNC on Unsplash

Josh Farkas
Latest posts by Josh Farkas (see all)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *