Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect. (arXiv:2106.06596v1 [cs.LG])

The “cold posterior effect” (CPE) in Bayesian deep learning describes the
uncomforting observation that the predictive performance of Bayesian neural
networks can be significantly improved if the Bayes posterior is artificially
sharpened using a temperature parameter T<1. The CPE is problematic in theory
and practice and since the effect was identified many researchers have proposed
hypotheses to explain the phenomenon. However, despite this intensive research
effort the effect remains poorly understood. In this work we provide novel and
nuanced evidence relevant to existing explanations for the cold posterior
effect, disentangling three hypotheses: 1. The dataset curation hypothesis of
Aitchison (2020): we show empirically that the CPE does not arise in a real
curated data set but can be produced in a controlled experiment with varying
curation strength. 2. The data augmentation hypothesis of Izmailov et al.
(2021) and Fortuin et al. (2021): we show empirically that data augmentation is
sufficient but not necessary for the CPE to be present. 3. The bad prior
hypothesis of Wenzel et al. (2020): we use a simple experiment evaluating the
relative importance of the prior and the likelihood, strongly linking the CPE
to the prior. Our results demonstrate how the CPE can arise in isolation from
synthetic curation, data augmentation, and bad priors. Cold posteriors observed
“in the wild” are therefore unlikely to arise from a single simple cause; as a
result, we do not expect a simple “fix” for cold posteriors.

Source: https://arxiv.org/abs/2106.06596


Related post