Vector Quantized Models for Planning. (arXiv:2106.04615v1 [cs.LG])

Recent developments in the field of model-based RL have proven successful in
a range of environments, especially ones where planning is essential. However,
such successes have been limited to deterministic fully-observed environments.
We present a new approach that handles stochastic and partially-observable
environments. Our key insight is to use discrete autoencoders to capture the
multiple possible effects of an action in a stochastic environment. We use a
stochastic variant of emph{Monte Carlo tree search} to plan over both the
agent’s actions and the discrete latent variables representing the
environment’s response. Our approach significantly outperforms an offline
version of MuZero on a stochastic interpretation of chess where the opponent is
considered part of the environment. We also show that our approach scales to
emph{DeepMind Lab}, a first-person 3D environment with large visual
observations and partial observability.



