Mapping LLM Susceptibility to Medical Misinformation

Large language models are increasingly used in healthcare: summarizing notes, answering patient questions, supporting clinical decisions. But when they encounter fabricated medical claims, how often do they push back? And how often do they simply go along?

We tested this at scale. We embedded false medical recommendations into realistic prompts across three source types and measured whether 20 leading LLMs accepted or rejected them. We also tested whether wrapping claims in logical fallacies changed the outcome.

Format Matters More Than You Think

The single biggest predictor of susceptibility was not model size or architecture. It was the source format. A fabricated claim written in the formal, declarative tone of a clinical discharge note was accepted nearly half the time. The same claim framed as a Reddit post triggered far more skepticism.

Clinical Notes

46.1%

Discharge Notes

Formal clinical language bypasses safety filters most effectively.

Social Media

8.9%

Reddit Posts

Informal, emotional tone triggers more built-in skepticism.

Synthetic

5.1%

Clinical Vignettes

Controlled scenarios produced the lowest acceptance rates.

"Quiet, authoritative falsehoods slip through safety filters far more easily than the rhetorical tricks models have been trained to catch."

Susceptibility Across Models and Fallacy Types

The heatmap below shows how each model responded to each fallacy type. Hover over any cell to see the exact rate. Toggle between susceptibility (how often models accepted false claims) and detection (how often they correctly identified the fallacy).

Model Performance by Fallacy Type

Percentage rate per model-fallacy combination

50%+

The Fallacy Paradox

Counterintuitively, wrapping misinformation in logical fallacies generally reduced susceptibility. Eight of ten framings lowered acceptance rates. Safety fine-tuning has exposed models to adversarial dialogues prefaced with rhetorical markers like "everyone says" or "a famous doctor claims." Models recognize the template of a trick, but miss the quiet lie stated in plain, clinical language.

The two exceptions: slippery slope (+2.2 pp) and false dilemma (+0.4 pp). These framings present false urgency rather than false evidence, and appear underrepresented in current safety training data.

How the Models Stack Up

Composite Robustness Score (Top 7)

1GPT-4o

0.895

2Llama-4-Scout

0.864

3gpt-oss-20b

0.858

4Qwen3-30B-A3B

0.855

5Phi-4

0.820

6Gemma-3-12b

0.811

7Llama-3.3-70B

0.770

Notably, gpt-oss-20b achieved the lowest practical susceptibility of any model (0.7%) despite its moderate size. Medical fine-tuned models consistently underperformed their general-purpose counterparts, suggesting that domain specialization can come at the cost of safety robustness.

Bottom Line

Safety will not come from scale alone. It requires context-sensitive guardrails tuned to clinical language, grounding strategies that verify claims against trusted sources, and targeted immunization against the quiet misinformation that current safety training misses.

Systems that surface discharge recommendations or generate after-visit summaries need safeguards designed specifically for formal medical text, the format where models are most vulnerable.

Research Team

Mahmud Omar Vera Sorin Lothar H. Wieler Alexander W. Charney Patricia Kovatch Carol R. Horowitz Panagiotis Korfiatis Benjamin S. Glicksberg Robert Freeman Girish N. Nadkarni* Eyal Klang*

* Equal contribution · BIDMC · DFCI · Harvard Medical School

Read Full Paper → ← All Publications

Mapping LLM Susceptibility to Medical Misinformation Across Clinical Notes and Social Media

Format Matters More Than You Think

Discharge Notes

Reddit Posts

Clinical Vignettes

Susceptibility Across Models and Fallacy Types

The Fallacy Paradox

How the Models Stack Up

Composite Robustness Score (Top 7)

Safety will not come from scale alone. It requires context-sensitive guardrails tuned to clinical language, grounding strategies that verify claims against trusted sources, and targeted immunization against the quiet misinformation that current safety training misses.