Catching Out-of-Context Misinformation with Self-supervised Learning. (arXiv:2101.06278v1 [cs.CV])

Despite the recent attention to DeepFakes and other forms of image
manipulations, one of the most prevalent ways to mislead audiences is the use
of unaltered images in a new but false context. To address these challenges and
support fact-checkers, we propose a new method that automatically detects
out-of-context image and text pairs. Our core idea is a self-supervised
training strategy where we only need images with matching (and non-matching)
captions from different sources. At train time, our method learns to
selectively align individual objects in an image with textual claims, without
explicit supervision. At test time, we check for a given text pair if both
texts correspond to same object(s) in the image but semantically convey
different descriptions, which allows us to make fairly accurate out-of-context
predictions. Our method achieves 82% out-of-context detection accuracy. To
facilitate training our method, we created a large-scale dataset of 203,570
images which we match with 456,305 textual captions from a variety of news
websites, blogs, and social media posts; i.e., for each image, we obtained
several captions.



Related post