Knowledge-Guided Exploration in Deep Reinforcement Learning. (arXiv:2210.15670v1 [cs.LG])

This paper proposes a new method to drastically speed up deep reinforcement
learning (deep RL) training for problems that have the property of state-action
permissibility (SAP). Two types of permissibility are defined under SAP. The
first type says that after an action $a_t$ is performed in a state $s_t$ and
the agent has reached the new state $s_{t+1}$, the agent can decide whether
$a_t$ is permissible or not permissible in $s_t$. The second type says that
even without performing $a_t$ in $s_t$, the agent can already decide whether
$a_t$ is permissible or not in $s_t$. An action is not permissible in a state
if the action can never lead to an optimal solution and thus should not be
tried (over and over again). We incorporate the proposed SAP property and
encode action permissibility knowledge into two state-of-the-art deep RL
algorithms to guide their state-action exploration together with a virtual
stopping strategy. Results show that the SAP-based guidance can markedly speed
up RL training.



