DDXPlus: A new Dataset for Medical Automatic Diagnosis. (arXiv:2205.09148v1 [cs.CL])

There has been rapidly growing interests in Automatic Diagnosis (AD) and
Automatic Symptom Detection (ASD) systems in the machine learning research
literature, aiming to assist doctors in telemedicine services. These systems
are designed to interact with patients, collect evidence relevant to their
concerns, and make predictions about the underlying diseases. Doctors would
review the interaction, including the evidence and the predictions, before
making their final decisions. Despite the recent progress, an important piece
of doctors’ interactions with patients is missing in the design of AD and ASD
systems, namely the differential diagnosis. Its absence is largely due to the
lack of datasets that include such information for models to train on. In this
work, we present a large-scale synthetic dataset that includes a differential
diagnosis, along with the ground truth pathology, for each patient. In
addition, this dataset includes more pathologies, as well as types of symtoms
and antecedents. As a proof-of-concept, we extend several existing AD and ASD
systems to incorporate differential diagnosis, and provide empirical evidence
that using differentials in training signals is essential for such systems to
learn to predict differentials. Dataset available at

Source: https://arxiv.org/abs/2205.09148


