“For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely.”
Not always fairer
However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on; for example, only patients from the Beth Israel Deaconess Medical Center data set.

Ad Statistics
Times Displayed: 52195
Times Visited: 1560 Ampronix, a Top Master Distributor for Sony Medical, provides Sales, Service & Exchanges for Sony Surgical Displays, Printers, & More. Rely on Us for Expert Support Tailored to Your Needs. Email info@ampronix.com or Call 949-273-8000 for Premier Pricing.
When the researchers tested the models that had been “debiased” using the BIDMC data to analyze patients from five other hospital data sets, they found that the models’ overall accuracy remained high, but some of them exhibited large fairness gaps.
“If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location,” Zhang says.
This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.
“We found that even state-of-the-art models, which are optimally performant in data similar to their training sets, are not optimal. That is, they do not make the best trade-off between overall and subgroup performance, in novel settings,” Ghassemi says. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely.”
The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups that those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new data sets.
The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren’t giving inaccurate results for certain groups.
The research was funded by a Google Research Scholar Award, the Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program, RSNA Health Disparities, the Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.
Written by Anne Trafton, MIT NewsBack to HCB News