by
Gus Iversen, Editor in Chief | November 19, 2024
A new study from researchers at the Icahn School of Medicine at Mount Sinai has outlined strategies for using large language models (LLMs), such as GPT-4, in healthcare systems while balancing cost efficiency and performance.
One key strategy identified in the study is grouping up to 50 clinical tasks — such as matching patients to clinical trials, extracting data for research, and identifying candidates for preventive health screenings — into a single batch. This approach allows models to handle tasks simultaneously without significant accuracy loss, reducing API costs by as much as 17-fold. For large health systems, this could translate to substantial annual savings.
The research team, led by Dr. Girish Nadkarni and Dr. Eyal Klang,
tested 10 LLMs using real patient data, conducting over 300,000 experiments to assess performance under increasing task loads. They explored how the models responded to a range of clinical questions and measured accuracy, adherence to clinical instructions, and operational stability.
“Our findings provide a road map for healthcare systems to integrate advanced AI tools to automate tasks efficiently, potentially cutting costs while ensuring stable performance under heavy workloads,” said Dr. Nadkarni, chief of the Division of Data-Driven and Digital Medicine at Mount Sinai.
The study also revealed challenges, such as cognitive limits in LLMs. Advanced models like GPT-4 showed signs of strain under high-demand scenarios, with occasional performance drops. By recognizing these thresholds, health systems can ensure reliable AI use while minimizing risks.
“This research has significant implications for how AI can be integrated into healthcare systems,” said Dr. David Reich, president of The Mount Sinai Hospital. "Grouping tasks not only reduces costs but also allows resources to be directed toward patient care."
The team plans to test these models in real-world clinical environments to further refine their strategies. Their goal is to establish a framework for healthcare AI that maximizes efficiency and accuracy while remaining cost-effective.
The study findings were published in npj Digital Medicine under the title “A Strategy for Cost-effective Large Language Model Use at Health System-scale”, and was supported by the National Institutes of Health.