The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers. (arXiv:2303.16207v1 [cs.NE])

In the context of neuroevolution, Quality-Diversity algorithms have proven
effective in generating repertoires of diverse and efficient policies by
relying on the definition of a behavior space. A natural goal induced by the
creation of such a repertoire is trying to achieve behaviors on demand, which
can be done by running the corresponding policy from the repertoire. However,
in uncertain environments, two problems arise. First, policies can lack
robustness and repeatability, meaning that multiple episodes under slightly
different conditions often result in very different behaviors. Second, due to
the discrete nature of the repertoire, solutions vary discontinuously. Here we
present a new approach to achieve behavior-conditioned trajectory generation
based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains
the selection of solutions to those that are the most consistent in the
behavior space. Second, the Quality-Diversity Transformer (QDT), a
Transformer-based model conditioned on continuous behavior descriptors, which
trains on a dataset generated by policies from a ME-LS repertoire and learns to
autoregressively generate sequences of actions that achieve target behaviors.
Results show that ME-LS produces consistent and robust policies, and that its
combination with the QDT yields a single policy capable of achieving diverse
behaviors on demand with high accuracy.



Related post