BIDMC · DFCI · Harvard Medical School

Safe AI. Real Medicine. True Impact.

We research, evaluate, and build generative AI that works — for radiologists, physicians, and the patients who depend on them.

Medical research
Our Mission
Clinicians. Researchers. Builders.
What We Do

Research That Reaches the Clinic

We evaluate generative AI where it matters most: in radiology, at the bedside, and across clinical workflows. Not just whether models can answer — but whether they should be trusted.

Our focus is on AI that's safe enough to deploy and useful enough to matter. We study failure modes, demographic biases, and the gap between benchmark performance and real-world value.

The goal isn't just papers. It's AI that helps physicians make better decisions — and helps patients get better care.

Research Pillars

Three areas. One mission.

Every project maps to a pillar. Together they cover the full lifecycle of clinical AI — from bias detection to safe deployment.

Research Pillar 01
Bias & Equity
Detecting and measuring demographic biases in LLM-generated clinical recommendations — across race, gender, and socioeconomic status — at population scale.
Research Pillar 02
AI Safety & Reliability
Testing how clinical AI fails. Hallucination attacks, adversarial inputs, misinformation susceptibility — mapping vulnerabilities before they reach patients.
Research Pillar 03
Clinical Translation
Bridging benchmarks and bedside reality. Rigorous evaluation of AI in diagnostic workflows, building the frameworks that make deployment responsible.
Core Team

Clinicians. Researchers. Builders.

Eyal Klang MD
Eyal Klang MD
Co-Director
A radiologist at Beth Israel Deaconess Medical Center and Co-Director of the BRIDGE GenAI Lab. He is a leading expert in the application of generative AI and machine learning in medical imaging and clinical care. He has authored hundreds of peer-reviewed publications, including studies in Nature Medicine, The Lancet, and JAMA, with a focus on translating AI methods into real-world clinical practice.
Yiftach Barash MD
Yiftach Barash MD
Co-Director
Co-Director of the BRIDGE GenAI Lab and an interventional radiologist at Beth Israel Deaconess Medical Center, currently in a clinical fellowship at Harvard Medical School. He is an AI healthcare researcher with an MSc in engineering and a background in industry. His work focuses on machine learning and advanced imaging to improve clinical decision-making, combining technical expertise, clinical training, and cross-disciplinary experience to drive innovation in healthcare.
Mahmud Omar MD
Mahmud Omar MD
Head of Research
A family physician and medical AI researcher focused on the evaluation, safety, and equity of large language models in clinical medicine. He has authored hundreds of peer-reviewed publications, including studies in Nature Medicine, other Nature journals, The Lancet Digital Health, and JAMA Network Open. His work spans diagnostic reasoning, multimodal medical AI, and bias assessment, with publications in leading medical and AI journals.
Alon Gorenshtein MD
Alon Gorenshtein MD
Head of AI Engineering
Physician-scientist training in AI and neurology at Harvard Medical School. Postdoctoral fellow at BIDMC's Epilepsy + Data Science Lab. Builds AI-neurotechnology tools for clinical neurophysiology and decision support.
Selected Work

Selected Publications

High-impact research in Nature Medicine, The Lancet Digital Health, JAMA Network Open, and leading medical AI venues.

Nature Medicine
2025
Sociodemographic biases in medical decision making by large language models
Systematic evaluation of demographic biases in LLM-generated clinical recommendations across millions of responses.
Read
The Lancet Digital Health
2026
Mapping LLM Susceptibility to Medical Misinformation Across Clinical Notes and Social Media
LLMs absorb and amplify medical misinformation — and the format it comes in changes how easily they're fooled.
Read
Nature Communications Medicine
2025
Multi-model assurance analysis showing LLMs are highly vulnerable to adversarial hallucination attacks during clinical decision support
Slip a fake lab value into a clinical prompt and LLMs will confidently elaborate on it — 50–82% of the time.
Read
Nature Health
2026
Socio-demographic gaps in pain management guided by large language models
3.4 million AI responses reveal systematic racial and socioeconomic bias in LLM opioid recommendations — at a scale no manual review could catch.
Read
Additional publications
JAMA Network Open · 2025
Sociodemographic Bias in Large Language Model–Assisted Gastroenterology
Sociodemographic factors influence LLM-generated gastroenterology recommendations; mental health referrals varied by demographics. Levartovsky, Omar, Nadkarni et al.
npj Breast Cancer · 2023
Large language model (ChatGPT) as a support tool for breast tumor board
Evaluation of ChatGPT as decision support in breast tumor board; recommendations compared to board decisions. Proof-of-concept with 70% agreement.
American Journal of Medicine · 2026
Impact of patient communication style on agentic AI-generated clinical advice in E-medicine
Patient tone alters AI triage, sick-leave, and prescribing in e-medicine; tone-sensitive biases in agentic LLMs across 120,000 agent runs.
JAMIA · 2025
A scalable framework for benchmark embedding models in semantic health-care tasks
Benchmarking 39 embedding models for healthcare semantic tasks and RAG; scalable evaluation across 3.28 million model assessments.
International Journal for Equity in Health · 2025
Evaluating and addressing demographic disparities in medical large language models: a systematic review
Systematic review of demographic biases in medical LLMs; gender and racial bias prevalent; mitigation strategies still developing.
JMIR Medical Informatics · 2025
Benchmarking the Confidence of Large Language Models in Answering Clinical Questions
Worse-performing LLMs showed paradoxically higher confidence; minimal confidence gap between correct and incorrect answers across 12 models.
Computers in Biology and Medicine · 2025
Refining LLMs outputs with iterative consensus ensemble (ICE)
Iterative consensus among multiple LLMs improved accuracy by up to 27% on medical and reasoning benchmarks; no reward models required.
Therapeutic Advances in Ophthalmology · 2025
Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning
GPT-4o and Claude Sonnet 3.5 for OCT diagnosis; few-shot prompting improved accuracy (e.g. 56% to 73% for GPT-4o).
NEJM AI · 2024
If Machines Exceed Us: Health Care at an Inflection Point
AGI/ASI and ethics in health care; preparing for AI that may meet or surpass expert human capability in treatment and reasoning.
Mayo Clinic Proceedings: Digital Health · 2025
Identifying Bias at Scale in Clinical Notes Using Large Language Models
GPT-4 detects and revises biased language in ED notes with 97.6% sensitivity; physician-endorsed revisions; modifiable factors identified.

To see our full body of work, please see the Google Scholar and LinkedIn profiles of our core team below.

Core team scholar profiles
Eyal Klang MD · Co‑Director
Yiftach Barash MD · Co‑Director
Mahmud Omar MD · Head of Research
Alon Gorenshtein MD · Head of AI Engineering
Research Network

We're a resourceful, creative group.

If you have an idea worth exploring, let's figure it out together. Our network spans clinicians, data scientists, engineers, and policy researchers across institutions.

RadiologyInternal MedicineEmergency MedicineData ScienceNLP EngineeringMachine LearningHealth PolicyBioethicsOncologyNeurology
Collaborators
Dr. Reem Agbareia
Dr. Reem Agbareia
Ophthalmology Resident · Hadassah Medical Center
Ophthalmology resident at Hadassah Medical Center and a physician-researcher with a focused academic profile in clinical ophthalmology and artificial intelligence. She received her MD with honors from the Hebrew University of Jerusalem and completed her internship with honors at Tel Aviv Sourasky Medical Center, including clinical training at UCLA. She has co-authored peer-reviewed publications in journals including Nature Medicine, Nature Health, and Pediatrics, with ongoing work in ophthalmic therapeutics and AI-driven diagnostic systems.
Google Scholar →
Dr. Vera Sorin
Dr. Vera Sorin
Assistant Professor of Radiology · Mayo Clinic
Assistant Professor of Radiology at Mayo Clinic in Rochester, MN. Completed radiology residency at Sheba Medical Center in Israel, followed by fellowships in Radiology Informatics and AI, and in Cardiothoracic Imaging at Mayo Clinic, Rochester. Research interests include large language models, red teaming, ethics in AI, and post-deployment monitoring of AI in radiology.
Google Scholar →

Stay tuned for the full research network.

Let's Build
Something
Together

Have a research idea? Want to collaborate on clinical AI evaluation? We're always looking for creative partnerships.

Affiliated With
Harvard Medical School
Beth Israel Deaconess
Medical Center
Dana-Farber
Cancer Institute