Artificial intelligence (AI) is currently unable to pass one of the qualifying radiology examinations, suggesting that this promising technology is not yet ready to replace doctors, finds a study in the Christmas issue of The BMJ.
AI is increasingly being used for some tasks that doctors do, such as interpreting radiographs (x-rays and scans) to help diagnose a range of conditions.
But can AI pass the Fellowship of the Royal College of Radiologists (FRCR) examination, which UK trainees must do to qualify as radiology consultants?
To find out, researchers compared the performance of a commercially available AI tool with 26 radiologists (mostly aged between 31 and 40 years; 62% female) all of whom had passed the FRCR exam the previous year.
They developed 10 ‘mock’ rapid reporting exams, based on one of three modules that make up the qualifying FRCR examination that is designed to test candidates for speed and accuracy.
Each mock exam consisted of 30 radiographs at the same or a higher level of difficulty and breadth of knowledge expected for the real FRCR exam. To pass, candidates had to correctly interpret at least 27 (90%) of the 30 images within 35 minutes.
The AI candidate had been trained to assess chest and bone (musculoskeletal) radiographs for several conditions including fractures, swollen and dislocated joints, and collapsed lungs.
Allowances were made for images relating to body parts that the AI candidate had not been trained in, which were deemed “uninterpretable.”
When uninterpretable images were excluded from the analysis, the AI candidate achieved an average overall accuracy of 79.5% and passed two of 10 mock FRCR exams, while the average radiologist achieved an average accuracy of 84.8% and passed four of 10 mock examinations.
The sensitivity (ability to correctly identify patients with a condition) for the AI candidate was 83.6% and the specificity (ability to correctly identify patients without a condition) was 75.2%, compared with 84.1% and 87.3% across all radiologists.
Across 148 out of 300 radiographs that were correctly interpreted by more than 90% of radiologists, the AI candidate was correct in 134 (91%) and incorrect in the remaining 14 (9%).
In 20 out of 300 radiographs that over half of radiologists interpreted incorrectly, the AI candidate was incorrect in 10 (50%) and correct in the remaining 10.
Interestingly, the radiologists slightly overestimated the likely performance of the AI candidate, assuming that it would perform almost as well as themselves on average and outperform them in at least three of the 10 mock exams.