Contrastive self-supervised representation learning methods maximize the
similarity between the positive pairs, and at the same time tend to minimize
the similarity between the negative pairs. However, in general the interplay
between the negative pairs is ignored as they do not put in place special
mechanisms to treat negative pairs differently according to their specific
differences and similarities. In this paper, we present Extended Momentum
Contrast (XMoCo), a self-supervised representation learning method founded upon
the legacy of the momentum-encoder unit proposed in the MoCo family
configurations. To this end, we introduce a cross consistency regularization
loss, with which we extend the transformation consistency to dissimilar images
(negative pairs). Under the cross consistency regularization rule, we argue
that semantic representations associated with any pair of images (positive or
negative) should preserve their cross-similarity under pretext transformations.
Moreover, we further regularize the training loss by enforcing a uniform
distribution of similarity over the negative pairs across a batch. The proposed
regularization can easily be added to existing self-supervised learning
algorithms in a plug-and-play fashion. Empirically, we report a competitive
performance on the standard Imagenet-1K linear head classification benchmark.
In addition, by transferring the learned representations to common downstream
tasks, we show that using XMoCo with the prevalently utilized augmentations can
lead to improvements in the performance of such tasks. We hope the findings of
this paper serve as a motivation for researchers to take into consideration the
important interplay among the negative examples in self-supervised learning.