Sociodemographic Biases in Medical Decision Making

Identical Cases, Different Decisions

Every case had the same chief complaint, vital signs, and clinical details. The only thing that changed was a sociodemographic label: race, gender identity, sexual orientation, socioeconomic status, or an intersectional combination. Each of the 1,000 cases was run through 32 variations across nine models, producing over 1.7 million total responses.

The models answered four clinical questions for each case: triage priority, further diagnostic testing, treatment approach (outpatient vs. inpatient), and whether a mental health assessment was needed. Across all four, demographic labels shifted recommendations in consistent, clinically unjustified directions.

6-7x

Mental Health Flags

LGBTQIA+ patients recommended mental health assessments far beyond clinical indication

+6.5%

Testing Gap

High-income cases received more advanced imaging (CT, MRI) recommendations

+23.9%

Triage Escalation

Urgency recommendations for some marginalized groups exceeded control baseline

Who Gets Flagged, Who Gets Tested

Mental health assessment showed the largest disparities. Cases labelled as Black transgender women, Black transgender men, and Black and unhoused all exceeded 79% recommendation rates for mental health evaluation. The control group sat far below. Two expert physicians found many of these referrals unwarranted, with LLM scores reaching approximately seven times the physician-derived baseline.

Cases labelled as belonging to LGBTQIA+ subgroups were recommended mental health evaluations at rates six to seven times higher than what two board-certified physicians judged clinically appropriate.

For diagnostic testing, the pattern inverted along socioeconomic lines. High-income cases received significantly more recommendations for advanced imaging such as CT and MRI (P < 0.001). Low- and middle-income cases were more often limited to basic testing or none at all. In treatment approach, cases labelled as unhoused or as Black and unhoused received the highest rates of inpatient recommendations.

Disparities by Demographic Group

Above control Below control │ Control baseline

Hover for details. Mental Health %: percentage of model outputs recommending mental health assessment. Invasiveness Score: mean score (0 to 2.33) across triage, testing, and treatment. Data from Figs 3, 4 and Table 1.

Across All Nine Models

The biases were not isolated to a single architecture. Both proprietary and open-source models showed the same directional patterns. Variability scores, measuring how much each model's outputs shifted with demographic labels, ranged from 14% (GPT-4o) to 40% (Qwen2-7B).

#ModelVariabilityScore

1GPT-4o

14%

2Llama-3.1-8B

17%

3Phi-3.5-mini

19%

4Llama-3.1-70B

21%

5Gemma-2-27B

23%

6Gemma-2-9B-it

25%

7Phi-3-medium-128k

28%

8Qwen2-72B

33%

9Qwen2-7B

40%

Can Models Self-Correct?

When confronted with evidence of bias in their own outputs, models revised 66.7% of recommendations that contained explicit bias (where the demographic label was directly cited as a reason). For implicit bias, where the label was not mentioned but the recommendation still shifted, only about 40% of cases were revised. Subtler forms of bias prove harder to address even with direct feedback.

Bottom Line

LLM clinical recommendations shift with patient demographics, not clinical facts

Across 1.7 million responses from nine models, marginalized groups consistently received more urgent, more invasive, and more mental health-focused recommendations than clinically warranted. These patterns appeared in both proprietary and open-source models, exceeded physician baselines by multiples, and persisted after statistical correction. Robust bias evaluation frameworks are needed before LLMs inform real clinical decisions.

Research Team

Mahmud Omar Shelly Soffer Reem Agbareia Nicola Luigi Bragazzi Donald U. Apakama Carol R. Horowitz Alexander W. Charney Robert Freeman Benjamin Kummer Benjamin S. Glicksberg Girish N. Nadkarni Eyal Klang

Icahn School of Medicine at Mount Sinai · Maccabi Healthcare Services · Hadassah Medical Center · LMU Munich

Read Full Paper → All Publications