Stanford AI CheXNeXt can spot 14 conditions on chest X-ray with accuracy, speed

November 27, 2018
by Thomas Dworetzky, Contributing Reporter
Stanford scientists have reported new success making diagnoses of 14 different conditions with their artificial intelligence algorithm, CheXNeXt.

The algorithm scans chest X-rays at high speed and checks for all the pathologies at the same time. with results that were nearly as good as human radiologists – in 10 diseases it was their equal, in three it did less well and in one condition it did better, researchers reported in the journal PLOS.

“Usually, we see AI algorithms that can detect a brain hemorrhage or a wrist fracture – a very narrow scope for single-use cases,” Dr. Matthew Lungren, assistant professor of radiology said in a Stanford report of the research. “But here we’re talking about 14 different pathologies analyzed simultaneously, and it’s all through one algorithm.”

Chest X-rays are “critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year,” stated the researchers, in their paper, observing that “this time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available.”

The challenge is huge. Citing the World Health Organization's estimates that more than 4 billion people lack access to medical imaging expertise, the researchers point out that even in those countries with advanced healthcare, automated chest X-ray interpretations “could be used for work-list prioritization, allowing the sickest patients to receive quicker diagnoses and treatment, even in hospital settings in which radiologists are not immediately available.”

The idea driving the present CheXNeXt efforts, said Lungren, is that, eventually, such a system could provide quality diagnostic support or “consultations” to healthcare providers without the interpretive expertise of a radiologist.

“We’re seeking opportunities to get our algorithm trained and validated in a variety of settings to explore both its strengths and blind spots,” graduate student Pranav Rajpurkar noted in the Stanford report. “The algorithm has evaluated over 100,000 X-rays so far, but now we want to know how well it would do if we showed it a million X-rays – and not just from one hospital, but from hospitals around the world.”

Lungren and co-senior author Andrew Ng, adjunct professor of Computer Science at Stanford, have been developing their diagnostic algorithm for over a year – the present iteration is built on earlier versions that beat radiologists when diagnosing pneumonia on X-ray.

In 2017, the pair unveiled their earlier effort, CheXNet, which could “detect pneumonia from chest X-rays at a level exceeding practicing radiologists,” in a paper appearing in the online research archive arXiv.

“The motivation behind this work is to have a deep-learning model to aid in the interpretation task that could overcome the intrinsic limitations of human perception and bias, and reduce errors,” Lungren explained in a Stanford report on his work at the time, adding, “more broadly, we believe that a deep-learning model for this purpose could improve health care delivery across a wide range of settings.”

Their latest algorithm, CheXNext, is a neural network trained using the dataset ChestX-ray14, a set of hundreds of thousands of X-rays released by the National Institutes of Health.

Of the set, roughly 112,000 X-rays were used to teach algorithm, and a set of 420 X-rays, not part of that group, were then used to test the algorithm for the 14 pathologies against experienced human radiologists.

“We treated the algorithm like it was a student; the NIH data set was the material we used to teach the student, and the 420 images were like the final exam,” Lungren recounted.

The comparison with the results of human diagnosticians, not just other AI approaches, was also of great importance, he stressed.

“That’s another factor that elevates this research,” he advised. “We weren’t just comparing this against other algorithms out there; we were comparing this model against practicing radiologists.”

Another important advantage of the algorithm beyond its accuracy: it took people an average of 240 minutes to read the X-rays. It took the machine 1.5 minutes.

Where CheXNeXt goes next has yet to be determined.

“I could see this working in a few ways. The algorithm could triage the X-rays, sorting them into prioritized categories for doctors to review, like normal, abnormal or emergent,” Lungren said.

It could also end up as a diagnostic helpmate to a primary care physician, alerting practitioners when it is time to call a radiologist.

“We should be building AI algorithms to be as good or better than the gold standard of human, expert physicians. Now, I’m not expecting AI to replace radiologists any time soon, but we are not truly pushing the limits of this technology if we’re just aiming to enhance existing radiologist workflows,” Lungren said. “Instead, we need to be thinking about how far we can push these AI models to improve the lives of patients anywhere in the world.”