Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer. (arXiv:2207.05749v1 [cs.LG])

Pathologists have a rich vocabulary with which they can describe all the
nuances of cellular morphology. In their world, there is a natural pairing of
images and words. Recent advances demonstrate that machine learning models can
now be trained to learn high-quality image features and represent them as
discrete units of information. This enables natural language, which is also
discrete, to be jointly modelled alongside the imaging, resulting in a
description of the contents of the imaging. Here we present experiments in
applying discrete modelling techniques to the problem domain of non-melanoma
skin cancer, specifically, histological images of Intraepidermal Carcinoma
(IEC). Implementing a VQ-GAN model to reconstruct high-resolution (256×256)
images of IEC images, we trained a sequence-to-sequence transformer to generate
natural language descriptions using pathologist terminology. Combined with the
idea of interactive concept vectors available by using continuous generative
methods, we demonstrate an additional angle of interpretability. The result is
a promising means of working towards highly expressive machine learning systems
which are not only useful as predictive/classification tools, but also means to
further our scientific understanding of disease.



Related post