Outcome-Driven Reinforcement Learning via Variational Inference. (arXiv:2104.10190v1 [cs.LG])

While reinforcement learning algorithms provide automated acquisition of
optimal policies, practical application of such methods requires a number of
design decisions, such as manually designing reward functions that not only
define the task, but also provide sufficient shaping to accomplish it. In this
paper, we discuss a new perspective on reinforcement learning, recasting it as
the problem of inferring actions that achieve desired outcomes, rather than a
problem of maximizing rewards. To solve the resulting outcome-directed
inference problem, we establish a novel variational inference formulation that
allows us to derive a well-shaped reward function which can be learned directly
from environment interactions. From the corresponding variational objective, we
also derive a new probabilistic Bellman backup operator reminiscent of the
standard Bellman backup operator and use it to develop an off-policy algorithm
to solve goal-directed tasks. We empirically demonstrate that this method
eliminates the need to design reward functions and leads to effective
goal-directed behaviors.

Source: https://arxiv.org/abs/2104.10190


Related post