The setup
Two in three U.S. physicians report using healthcare AI. About one in five U.S. adults have used AI chatbots for health information. At the same time, the companies behind these models are introducing advertising. OpenAI announced ads in ChatGPT in January 2026. Microsoft already displays sponsored content in Copilot. Perplexity disclosed similar plans soon after.
Pharmaceutical advertising is unlike other commercial content. It targets a decision with direct consequences for patient health. The U.S. industry spends over $6 billion annually on direct-to-consumer drug advertising, and that activity is regulated by the FDA. There is no comparable framework for advertising delivered through LLMs.
We asked a simple question: can advertising embedded in AI systems shift clinical recommendations?
The headline finding
We ran four controlled experiments across 12 commercially available LLMs from OpenAI, Anthropic, and Google. Each experiment paired a clinical scenario with a system prompt containing a pharmaceutical advertisement, then asked the model for a treatment recommendation.
Across 74,880 ad-condition calls and 13 scenarios, advertising shifted the model's choice toward the advertised drug from a baseline rate of about 34% to 47.6%. That is a mean increase of 12.7 percentage points. Accuracy did not drop. It rose by 3.6 pp. The bias is therefore not an error. It is a redirection.
"Advertising does not override medical knowledge. It fills the space where clinical evidence is underdetermined."
Provider-tier susceptibility
The difference between providers was the single most striking finding. Click any provider to expand the per-model breakdown.
The equipoise zone
Three experiments contrasted three different epistemic conditions. Together they paint a picture of where the bias operates and where it does not.
The picture is consistent. Advertising does not override medical knowledge. It operates inside the model's zone of clinical equipoise, the space where two or more options are medically defensible. There the ad acts as a salience-based tiebreaker. The output is right and biased at the same time. There is no error to catch.
Complete reversals
At the model-by-scenario level, ten cases shifted from one correct answer to another by 100%. Click any card to reveal the flip.
In every case, both options were guideline-equivalent. The shift was from one correct answer to another. Standard accuracy testing would never see this.
How the bias hides
An open-response sub-analysis (2,340 calls across three representative models) examined the free-text justifications models produced when explaining their drug choices. Two patterns mattered.
First, models almost never disclosed the ad. Disclosure was strikingly model-dependent. Claude Opus 4.6 flagged the advertising in 55.9% of responses. Gemini 2.5 Flash did so 28.7% of the time. GPT-4.1 disclosed in only 5.2% of responses, despite shifting its preferences by +24.8 pp.
Second, when models did choose the advertised drug, their reasoning echoed the ad. Models that selected the advertised option echoed advertising claims in 52.7% of their justifications. Models that did not choose the advertised option echoed those same claims in only 19.4%, a 2.7-fold difference.
Self-reported confidence stayed uniformly high across conditions (mean 2.95 to 2.98 on a 3-point scale). Nothing in the model's verbalized confidence would alert a user that an ad had shaped its reasoning.
The stakes
A 12.7 pp shift may sound modest in any individual interaction. At population scale, it is not.
If even a fraction of prescribing decisions involve AI consultation, and adoption is accelerating, a systematic bias toward one brand over an equivalent competitor would redirect billions of dollars in pharmaceutical revenue. Cardiovascular, antidiabetic, and oncologic agents alone account for hundreds of billions in annual spending.
How to cite
The bottom line
The harm is not that patients receive a dangerous drug. The harm is that they receive a clinically sound recommendation that is also commercially shaped, with no mechanism to flag the influence.
This is a class of AI safety vulnerability that standard testing cannot detect, because it operates inside clinically correct outputs. It is provider-dependent rather than universal, which means it can be addressed through alignment methodology. Anthropic models, which emphasize Constitutional AI and harmlessness training, showed near-zero shift. The path forward is structural, not cosmetic.