OAK BROOK, Ill. — Ensembles created using models submitted to the RSNA Pediatric Bone Age Machine Learning Challenge convincingly outperformed single-model prediction of bone age, according to a study published in the journal Radiology: Artificial Intelligence.
Ensemble learning is a method in machine learning in which different models designed to accomplish the same task are combined into a single model.
Model heterogeneity is an important aspect of ensemble learning. Ensembles tend to perform best when each of the individual models performs well in their own right, and the correlation among individual model predictions is relatively low.

Ad Statistics
Times Displayed: 112448
Times Visited: 6718 MIT labs, experts in Multi-Vendor component level repair of: MRI Coils, RF amplifiers, Gradient Amplifiers Contrast Media Injectors. System repairs, sub-assembly repairs, component level repairs, refurbish/calibrate. info@mitlabsusa.com/+1 (305) 470-8013
Because ensembles benefit from low correlation between model predictions, the greater the underlying differences in approach, the greater the improvement, as long as they achieve similar performance. In this respect, a competition, in which participants are encouraged to submit their best models, provides an ideal setting from which to ensemble high-performing models that use different techniques.
“Competitions provide a unique opportunity to study the effects of combining predictions from heterogenous models,” said study author Ian Pan, a medical student at The Warren Alpert Medical School of Brown University in Providence, R.I.
To investigate improvements in performance for automatic bone age estimation that can be gained through model ensembling, Pan and colleagues used 48 submissions from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge.
Participants were provided with 12,611 pediatric hand X-rays with bone ages determined by a pediatric radiologist to develop models for bone age determination. The final results were determined using a test set of 200 X-rays labeled with the weighted average of 6 ratings. The researchers evaluated the mean pairwise model correlation and performance of all possible model combinations for ensembles of up to 10 models using the mean absolute deviation (MAD). To estimate the true generalization MAD, they conducted a bootstrap analysis using the 200 test X-rays.
The estimated generalization MAD of a single model was 4.55 months. The best performing ensemble consisted of four models with a MAD of 3.79 months. The mean pairwise correlation of models within this ensemble was 0.47. In comparison, the lowest achievable MAD by combining the highest-ranking models based on individual scores was 3.93 months using eight models with a mean pairwise model correlation of 0.67.