First public benchmark for AI radiology report generation launched by Rajpurkar Lab and Gradient Health
December 18, 2024
December 18, 2024 BOSTON, MA — The Rajpurkar Lab at Harvard Medical School's Department of Biomedical Informatics today announced the launch of ReXrank (rexrank.ai), the first standardized public leaderboard for evaluating AI models that generate radiology reports from chest X-rays.
With almost 1,000 FDA-cleared AI-enabled medical devices already in clinical use in the USA, it can be challenging for health systems, physicians, and researchers to differentiate and evaluate their performance against each other. To address this, the Rajpurkar Lab set out to create a public leaderboard and challenge that used objective, state-of-the-art assessments of performance, known as ReXrank. Gradient Health, the healthcare data company, supported this initiative by providing the ReXGradient dataset, the largest test dataset of its kind with 10,000 chest X-ray studies. To feature on the ReXrank leaderboard, creators of AI chest X-ray tools must test their performance on a series of large, public datasets, before submitting it for testing on the private ReXGradient, with the results being made available publicly. Initial benchmarking results show the MedVersa system, developed by the Rajpurkar Lab, achieving the top position on the leaderboard, significantly outperforming other models including OpenAI's GPT-4V, which ranked 15th. MedVersa demonstrated superior performance across multiple metrics, including RadCliQ, RadGraph, and RaTEScore.
"Many commercial and research groups are making claims about AI-powered radiology report generation without benchmarking against each other. Rather than rely on anecdotes, having standardized benchmarks allows us to separate scientific progress from hype," said Pranav Rajpurkar, Assistant Professor of Biomedical Informatics at Harvard Medical School. "The significant performance gap between specialized medical AI models and general-purpose AI reveals both the progress we've made and the challenges that remain in medical AI development."
"ReXrank provides the medical AI community with a much-needed standardized way to evaluate and compare automated report generation systems," said Xiaoman Zhang, lead researcher at the Rajpurkar Lab. "By incorporating multiple datasets and eight different evaluation metrics, we can better understand how these AI models perform in real-world scenarios."
"The ability to benchmark AI models against real-world, diverse radiology datasets is crucial for advancing the field," added Ouwen Huang, co-founder of Gradient Health. "ReXrank’s large-scale test set provides the comprehensive evaluation framework needed to validate these technologies before clinical deployment and accelerate the development of reliable AI systems."
The ReXrank challenge evaluates AI models on their ability to generate both findings sections and complete reports with impressions. The surprising performance gap between specialized medical AI models and general-purpose vision-language models like GPT-4V highlights the importance of domain-specific development in healthcare AI.
ReXrank will expand to include more AI solutions and encourage greater open and rigorous debate about the performance of this new era of medical technology.
About Gradient Health:
Gradient Health is a medical technology company headquartered in North Carolina, USA. They provide easy access to the foundation model scale medical imaging datasets needed to train and validate technologies, getting more equitable innovations to market faster.
Gradient was founded to power better medical research by accelerating AI development. Health innovators from around the world use Gradient Health’s platform to improve their products, without compromising speed, quality of research or data privacy.
About The Rajpurkar Lab:
The Rajpurkar Lab, part of the Harvard Medical School's Department of Biomedical Informatics, is committed to pioneering the development of advanced medical artificial intelligence. Their mission is to scale the expertise of top medical professionals globally through innovative AI solutions. They approach this scientific challenge with innovation in algorithmic development, large scale data curation, and clinician impact studies to transform radiology, emergency medicine and beyond.