Study reveals why AI models that analyze medical images can be biased

July 02, 2024

By Anne Trafton

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analyzing images such as X-rays. However, studies have found that these models don’t always perform well across all demographic groups, usually faring worse on women and people of color.

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient’s race from their chest X-rays — something that the most skilled radiologists can’t do.

Your Trusted Source for Sony Medical Displays, Printers & More!

Ampronix, a Top Master Distributor for Sony Medical, provides Sales, Service & Exchanges for Sony Surgical Displays, Printers, & More. Rely on Us for Expert Support Tailored to Your Needs. Email info@ampronix.com or Call 949-273-8000 for Premier Pricing.

That research team has now found that the models that are most accurate at making demographic predictions also show the biggest “fairness gaps” — that is, discrepancies in their ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” says Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science, a member of MIT’s Institute for Medical Engineering and Science, and the senior author of the study.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approach to “debiasing” worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data,” says Haoran Zhang, an MIT graduate student and one of the lead authors of the new paper. MIT graduate student Yuzhe Yang is also a lead author of the paper, which will appear in Nature Medicine. Judy Gichoya, an associate professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham Professor of Electrical Engineering and Computer Science at MIT, are also authors of the paper.



You Must Be Logged In To Post A Comment Sign In If you've already created an account, use your email address and password to sign in using the form below. Login Problems: Click here if you are having login issues. Email address: Password: Forgot your password? Login Problems? View our Legal Notice and Privacy Notice Register Registration is Free and Easy. Enjoy the benefits of The World's Leading New & Used Medical Equipment Marketplace. Register Now!