Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set. (arXiv:2203.08807v1 [eess.IV])

Access to dermatological care is a major issue, with an estimated 3 billion
people lacking access to care globally. Artificial intelligence (AI) may aid in
triaging skin diseases. However, most AI models have not been rigorously
assessed on images of diverse skin tones or uncommon diseases. To ascertain
potential biases in algorithm performance in this context, we curated the
Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly
curated, and pathologically confirmed image dataset with diverse skin tones.
Using this dataset of 656 images, we show that state-of-the-art dermatology AI
models perform substantially worse on DDI, with receiver operator curve area
under the curve (ROC-AUC) dropping by 27-36 percent compared to the models’
original test results. All the models performed worse on dark skin tones and
uncommon diseases, which are represented in the DDI dataset. Additionally, we
find that dermatologists, who typically provide visual labels for AI training
and test datasets, also perform worse on images of dark skin tones and uncommon
diseases compared to ground truth biopsy annotations. Finally, fine-tuning AI
models on the well-characterized and diverse DDI images closed the performance
gap between light and dark skin tones. Moreover, algorithms fine-tuned on
diverse skin tones outperformed dermatologists on identifying malignancy on
images of dark skin tones. Our findings identify important weaknesses and
biases in dermatology AI that need to be addressed to ensure reliable
application to diverse patients and diseases.

Source: https://arxiv.org/abs/2203.08807


Related post