Derivative-Free Reinforcement Learning: A Review. (arXiv:2102.05710v1 [cs.LG])

Reinforcement learning is about learning agent models that make the best
sequential decisions in unknown environments. In an unknown environment, the
agent needs to explore the environment while exploiting the collected
information, which usually forms a sophisticated problem to solve.
Derivative-free optimization, meanwhile, is capable of solving sophisticated
problems. It commonly uses a sampling-and-updating framework to iteratively
improve the solution, where exploration and exploitation are also needed to be
well balanced. Therefore, derivative-free optimization deals with a similar
core issue as reinforcement learning, and has been introduced in reinforcement
learning approaches, under the names of learning classifier systems and
neuroevolution/evolutionary reinforcement learning. Although such methods have
been developed for decades, recently, derivative-free reinforcement learning
exhibits attracting increasing attention. However, recent survey on this topic
is still lacking. In this article, we summarize methods of derivative-free
reinforcement learning to date, and organize the methods in aspects including
parameter updating, model selection, exploration, and parallel/distributed
methods. Moreover, we discuss some current limitations and possible future
directions, hoping that this article could bring more attentions to this topic
and serve as a catalyst for developing novel and efficient approaches.



Related post