K-level Reasoning for Zero-Shot Coordination in Hanabi. (arXiv:2207.07166v1 [cs.AI])

The standard problem setting in cooperative multi-agent settings is self-play
(SP), where the goal is to train a team of agents that works well together.
However, optimal SP policies commonly contain arbitrary conventions
(“handshakes”) and are not compatible with other, independently trained agents
or humans. This latter desiderata was recently formalized by Hu et al. 2020 as
the zero-shot coordination (ZSC) setting and partially addressed with their
Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance
in the card game Hanabi. OP assumes access to the symmetries of the environment
and prevents agents from breaking these in a mutually incompatible way during
training. However, as the authors point out, discovering symmetries for a given
environment is a computationally hard problem. Instead, we show that through a
simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006,
synchronously training all levels, we can obtain competitive ZSC and ad-hoc
teamplay performance in Hanabi, including when paired with a human-like proxy
bot. We also introduce a new method, synchronous-k-level reasoning with a best
response (SyKLRBR), which further improves performance on our synchronous KLR
by co-training a best response.

Source: https://arxiv.org/abs/2207.07166


Related post