Online POMDP Planning via Simplification. (arXiv:2105.05296v1 [cs.AI])

In this paper, we consider online planning in partially observable domains.
Solving the corresponding POMDP problem is a very challenging task,
particularly in an online setting. Our key contribution is a novel algorithmic
approach, Simplified Information Theoretic Belief Space Planning (SITH-BSP),
which aims to speed-up POMDP planning considering belief-dependent rewards,
without compromising on the solution’s accuracy. We do so by mathematically
relating the simplified elements of the problem to the corresponding
counterparts of the original problem. Specifically, we focus on belief
simplification and use it to formulate bounds on the corresponding original
belief-dependent rewards. These bounds in turn are used to perform branch
pruning over the belief tree, in the process of calculating the optimal policy.
We further introduce the notion of adaptive simplification, while re-using
calculations between different simplification levels and exploit it to prune,
at each level in the belief tree, all branches but one. Therefore, our approach
is guaranteed to find the optimal solution of the original problem but with
substantial speedup. As a second key contribution, we derive novel analytical
bounds for differential entropy, considering a sampling-based belief
representation, which we believe are of interest on their own. We validate our
approach in simulation using these bounds and where simplification corresponds
to reducing the number of samples, exhibiting a significant computational
speedup while yielding the optimal solution.



Related post