Understanding invariance via feedforward inversion of discriminatively trained classifiers. (arXiv:2103.07470v1 [cs.LG])

A discriminatively trained neural net classifier achieves optimal performance
if all information about its input other than class membership has been
discarded prior to the output layer. Surprisingly, past research has discovered
that some extraneous visual detail remains in the output logits. This finding
is based on inversion techniques that map deep embeddings back to images.
Although the logit inversions seldom produce coherent, natural images or
recognizable object classes, they do recover some visual detail. We explore
this phenomenon further using a novel synthesis of methods, yielding a
feedforward inversion model that produces remarkably high fidelity
reconstructions, qualitatively superior to those of past efforts. When applied
to an adversarially robust classifier model, the reconstructions contain
sufficient local detail and global structure that they might be confused with
the original image in a quick glance, and the object category can clearly be
gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019),
with conditioning on logits instead of one-hot class labels. We use our
reconstruction model as a tool for exploring the nature of representations,
including: the influence of model architecture and training objectives
(specifically robust losses), the forms of invariance that networks achieve,
representational differences between correctly and incorrectly classified
images, and the effects of manipulating logits and images. We believe that our
method can inspire future investigations into the nature of information flow in
a neural net and can provide diagnostics for improving discriminative models.

Source: https://arxiv.org/abs/2103.07470


Related post