Uncategorized
A.I. Outperformed Doctors at Diagnosing Real-World E.R. Patients in a New Study. That Doesn’t Mean Computers Will Replace Clinicians

A.I. Outperformed Doctors at Diagnosing Real-World E.R. Patients in a New Study. That Doesn’t Mean Computers Will Replace Clinicians

ER Entrance

The A.I. model outperformed two doctors when presented with data from dozens of real E.R. patients.
Harrison Keely via Wikimedia Commons under CC BY 4.0

Since the 1950s, scientists have been comparing human doctors to computers to see if the machines’ algorithms can accurately diagnose complex health conditions. In a standard test, computers attempt to puzzle out challenging case studies from the New England Journal of Medicine.

Machines have recently improved at the task, primarily because of artificial intelligence built on large language models, like the one powering OpenAI’s ChatGPT. But so far, A.I. has done well only when it comes to curated case studies.

Now, researchers have put an A.I. model—a preview version of OpenAI’s o1—to the diagnostic test with real hospital records. Based on written documentation alone, the technology outperformed practicing physicians, according to a study published April 30 in the journal Science. The findings suggest that A.I. could be a powerful aid to health care workers during stressful, time-sensitive situations.

“This is the big conclusion for me: It works with the messy real-world data of the emergency department,” says study co-author Adam Rodman, a clinical researcher at Beth Israel Deaconess Medical Center in Boston, to NPR’s Will Stone. “It works for making diagnoses in the real world.”

In one test, Rodman and his colleagues presented the A.I. and two doctors with E.R. health records of 76 Beth Israel patients at three stages of care: initial triage with a health worker, first interaction with a doctor and admission to the hospital. The experiment had no impact on actual patient care.

These records typically include just a few sparse details, like vital signs, demographic information and a brief description written by staff, per the Guardian’s Robert Booth. Initial interactions are critical points in a patient’s care, as sometimes life-or-death decisions must be made quickly in chaotic situations.

Analyses revealed that the two physicians came up with exact or near-exact diagnoses in 50 percent and 55 percent of the cases, while the A.I. was close or exactly right 67 percent of the time.

This test was the “most important” of the study’s six experiments, says co-author Thomas Buckley, a computer scientist at Harvard Medical School, to Science’s Perri Thaler. A.I. did well in the others too—so well that the researchers feared people wouldn’t believe the results, Rodman tells the outlet.

Clinicians are increasingly incorporating A.I. into their daily work, using the programs for tasks like transcribing notes from patient interactions, reviewing health scans and detecting early signs of disease. The new findings suggest that A.I. reasoning models like o1, which can execute and explain step-by-step logical thinking, could soon regularly help doctors formulate diagnoses.

Did you know? A.I. in breast cancer detection

In a study published in January, researchers found that A.I. could help doctors find hard-to-detect signs of breast cancer. These can be missed during routine mammography screenings, which are generally recommended to take place once every one or two years.

The study is especially notable given that o1 was first released at the end of 2024. “That’s kind of like ancient history now in machine learning time,” Buckley tells Science.

Still, the researchers, as well as doctors and scientists not involved in the new study, caution that these promising results don’t mean physicians are about to be replaced by A.I. For one thing, purely logical reasoning excludes human aspects of a doctor’s work.

“When we say clinical reasoning, it doesn’t mean the same thing as moral reasoning,” Arya Rao, a biomedical informaticist at Harvard Medical School who wasn’t involved in the study, tells Science News’ Kathryn Hulick. “These models have been optimized to do this kind of sequential thought that we call reasoning, but it’s not at all the same thing as how we teach medical students to reason.”

Additionally, the researchers say the A.I. might not perform as well with larger amounts of patient data, such as from someone who’s admitted to the hospital for a days-long stay.

Going forward, the team plans to conduct clinical trials to figure out how best to integrate A.I. into patient care, reports Science News.

This matches the view of physician Nour Khatib, of Oak Valley Health in Canada, who was not involved in the study. “It’s just another tool to help us give the patient the highest quality care possible,” she tells the CBC’s Nick Logan.

Get the latest stories in your inbox every weekday.