AI in Emergency Medicine

AI Triage in the ED: A 2026 Field Report

Chester "Chet" Shermer, MD • May 2, 2026

Why this matters

AI triage in the ED works for throughput in 2026 but fails on rare high-acuity cases. Three safeguards every department needs.

Recommended next step

Pair this article with the free guide or course store if you want a more structured framework you can apply at the bedside or in leadership conversations.

Browse all articles See the AI course

A patient walked into my triage bay two months ago with chest pain. Mid-fifties, comfortable vitals, a clean initial EKG. The AI triage tool we had embedded into our intake workflow scored her ESI-3 — non-acute, prolonged stay anticipated. The number sat there on the screen, confidently low. Two hours later she was on the cath lab table with a fully occluded LAD.

The model was wrong. The algorithm scored her against a million prior chest pains and missed the one in front of me.

I tell that story up front because the conversation around AI triage in the emergency department has moved past whether to use it. The tools are deployed. The vendors are paid. The question now is how to use them without letting one kill someone. After thirty months of running AI triage across my own ED and reviewing failure modes from peer departments, I want to cover three things: what the prospective 2025–2026 data actually shows, the three-layer safeguard framework I run on every deployment, and the medicolegal pattern most ED leaders are walking into blind.

What the Prospective Data Actually Shows

The marketing finally caught up with the evidence, and the evidence is more sobering than the brochures suggested.

A 2026 JMIR Medical Informatics multicenter analysis tested three AI triage models against nurse triage at a major academic ED, using senior physician consensus as the gold standard. The best-performing model, an LLM-based system, hit an AUC of 0.879 against nurse triage at 0.776 — a real discrimination gain. Sensitivity for the highest acuity classes (ESI 1 and 2 equivalents) was 87.8%. That sensitivity number is the entire ballgame, and I want to be honest about it: 87.8% sensitivity on the cases where a miss can kill someone is not yet acceptable for hands-off deployment. It is supervisory-grade, not autonomous-grade.

A 2025 Cureus systematic review of six AI-triage deployments confirmed the throughput story. Voice-AI documentation tools shaved 19% off triage time. ML models reduced mis-triage rates by 0.3 to 8.9 percentage points depending on the system and the population. Real, durable gains. None of that data overrides the long-tail problem.

The long-tail problem is the atypical chest pain, the silent AAA, the early sepsis without fever, the posterior circulation stroke presenting as isolated nausea and dizziness. Models excel at common presentations because that is what they were trained on. They fail on the rare high-acuity case, and the rare high-acuity case is where the malpractice file lives.

That being said, the cleaner concern in the 2026 literature is calibration drift. Models trained on 2021–2022 clinical data are already drifting on 2026 populations. The post-pandemic illness mix, demographic shifts, and the long tail of post-viral presentations have moved the underlying distribution the model was fit against. In my own thirty-month deployment audit, the primary tool we run had drifted far enough by month eighteen that retraining was clinically necessary, not optional. If your AI triage vendor is not publishing quarterly drift metrics, demand them. If they cannot produce them, that is your answer.

The third issue is rarely on the procurement deck: subgroup performance variance. Aggregate AUROC numbers hide significant gaps across age, race, and language-preference subgroups. Two of the three commercial tools I have evaluated showed materially worse calibration on patients over 75 and on patients whose primary language is not English. Those are exactly the populations who already carry the heaviest under-triage burden in the human-only system. The 2025 Clinical and Experimental Emergency Medicine ethics review names this as one of the top unresolved AI-triage governance problems globally. If your vendor cannot produce subgroup performance breakdowns on demand, that is a dealbreaker, not a negotiation point.

A Three-Layer Safeguard Framework

No algorithm should triage a patient without human backstops. Here is the framework I deploy on every AI triage integration. It is simple by design — complexity fails under night-shift load.

Layer One: A Hard, Non-Overridable Acuity Floor

Any patient who meets specific clinical criteria bypasses AI triage entirely and gets immediate physician eyeball regardless of model output. My criteria: chest pain over age 40, dyspnea, abdominal pain over age 60, any neurological complaint, any pediatric patient under 90 days with fever, and any altered mental status. The model does not get a vote on these patients. The floor is wired into the EHR so the triage nurse cannot accidentally override it and the model cannot under-score a patient out of it.

Layer Two: Continuous Calibration Audit

Monthly random sample of 100 AI-triaged patients. Blinded physician review of the assigned acuity. Missed high-acuity events get flagged, fed back to the vendor, and tracked on a departmental dashboard. Budget roughly eight physician hours per month. This is the only way to catch drift before it hurts someone, and the AMA's 2024 AI principles explicitly call for this kind of continuous post-deployment surveillance as a condition of safe clinical use.

Layer Three: Frictionless Override and Visible Concordance Tracking

When a triage RN disagrees with the model, the RN wins. Full stop. No documentation friction. Make the override one click. Then track at the departmental level when overrides should have happened but did not.

Regardless of which vendor's tool you are running, I have seen more AI-triage harm come from RN learned helplessness — the model always wins, why argue — than from any single model failure. If your override interface requires more than two clicks, you have built a system that will erode clinical judgment within ninety days. The 2025 ethics review I cited above puts the same finding in formal language: automation dependency is now a recognized failure mode in AI-augmented clinical workflows. Build the override pathway like nursing autonomy depends on it, because it does.

The Medicolegal Pattern Most ED Leaders Miss

Here is the part most directors I talk to have wrong. They worry about being sued because the AI was wrong. That is not where the real risk lives in 2026. The real risk is being sued because the AI was right and you ignored it.

The AMA's 2024 AI principles document lays the groundwork plainly: physicians remain liable under existing medical-liability theory, and selection of an AI tool — including use outside its intended population or without clinical validation evidence — is itself a discrete liability decision. More to the point, when AI directly impacts clinical decision-making, the use must be disclosed and documented.

In practice, that has translated into a discoverable record. Plaintiff attorneys in several high-verdict jurisdictions began routinely subpoenaing the full AI risk-score record from the EHR — not just the final human triage assignment, but every intermediate algorithmic output the system produced during the encounter. If the model flagged ESI-2 with a high deterioration risk score and the documentation shows the patient was treated as ESI-3 without any charted reasoning for the divergence, that gap is now a discovery item. I have reviewed three peer-department cases where this exact pattern drove settlement numbers materially upward.

The inverse is also true and underappreciated. If the model said low-acuity and the physician escalated based on clinical gestalt, the physician who documented the reasoning is substantially protected. The algorithm becomes a data input. Documented physician clinical reasoning supersedes algorithmic output when the two diverge.

My personal charting protocol now includes a single sentence on every chart where I override the AI in either direction:

"AI triage score reviewed; clinical assessment proceeds independent of algorithmic output."

That one line has saved more legal hair than I want to admit. The model is now a discoverable witness in your chart. Make it your ally, not your accuser.

One more pattern worth naming: do not let your nursing or physician staff treat the AI risk score as a clinical fact to be charted as observed. It is an algorithmic output. Chart it as such. "Patient assigned ESI-3 by AI triage tool; clinician reassessed and upgraded to ESI-2 based on examination findings as below" is a defensible note. "Patient is low acuity per AI" is not. The wording matters because the discovery process will weight your charted clinical reasoning against the model's output, and any chart that conflates the two collapses your defense before it begins. Train your team on this distinction once a quarter and audit a sample of charts to confirm it is sticking.

In Summary

AI triage is real and useful for throughput in 2026. The long-tail miss rate on rare high-acuity presentations is not yet acceptable for hands-off deployment, so human backstops are mandatory. Wrap every integration in three safeguards — a hard acuity floor the model cannot override, monthly drift audits, and a frictionless override pathway. Document your clinical reasoning every time you diverge from the AI in either direction.

The model is now a discoverable witness in every chart you write. Make it work for you.

AI Won't Wait. Neither Should You.

If you are running AI triage in your ED right now without a written safeguard framework, a calibration audit cadence, and a documentation standard, you are exposed. The course walks through the exact EHR override-protocol language we use in my ED, the calibration audit dashboard template, the medicolegal documentation phrases vetted against current AMA guidance, and the full vendor evaluation checklist. Consider enrolling in AI in Emergency Medicine: Becoming AI Bulletproof.

Learn more: AI in Emergency Medicine: Becoming AI Bulletproof.

If you're an emergency physician (or any clinician treating patients daily) trying to understand how AI will actually impact your clinical practice — not just the hype — I put together a free practical guide. You can download it here: AI in EM Survival Guide.

Dr. Chester "Chet" Shermer, MD, FACEP is a Professor of Emergency Medicine, TeleHealth, HEMS and Critical Care Transport, and State Surgeon for the Army National Guard. He is the founder of Global MedOps Command and the creator of AI in Emergency Medicine: Becoming AI Bulletproof.

Sources

JMIR Medical Informatics, "Artificial Intelligence Models for Predicting Triage in Emergency Departments," https://medinform.jmir.org/2026/1/e83318
Cureus, "Clinical Impact of Artificial Intelligence-Based Triage Systems in Emergency Departments," https://pmc.ncbi.nlm.nih.gov/articles/PMC12241827/
Clinical and Experimental Emergency Medicine, "Ethical considerations of artificial intelligence in emergency medicine triage," https://pmc.ncbi.nlm.nih.gov/articles/PMC12824544/
American Medical Association, "AMA Principles for Augmented Intelligence (AI) Development, Deployment, and Use," https://www.ama-assn.org/system/files/ama-ai-principles.pdf

Keep reading

AI Triage in the ED: A 2026 Field Report

What the Prospective Data Actually Shows

A Three-Layer Safeguard Framework

Layer One: A Hard, Non-Overridable Acuity Floor

Layer Two: Continuous Calibration Audit

Layer Three: Frictionless Override and Visible Concordance Tracking

The Medicolegal Pattern Most ED Leaders Miss

In Summary

AI Won't Wait. Neither Should You.

Sources

Related reading and your next step.

Translate the article into a repeatable framework

Practice the decision path under pressure

Browse more articles

Ambient AI Scribes in the ED: A Governance Checklist Before You Let the Note Drive the Care

EMS Simulation Training: Reps Before the Real Call

ED Observation Units: Fix Boarding Before It Breaks You