Finding Counterfactually Optimal Action Sequences in Continuous State Spaces. (arXiv:2306.03929v1 [cs.LG])

Humans performing tasks that involve taking a series of multiple dependent
actions over time often learn from experience by reflecting on specific cases
and points in time, where different actions could have led to significantly
better outcomes. While recent machine learning methods to retrospectively
analyze sequential decision making processes promise to aid decision makers in
identifying such cases, they have focused on environments with finitely many
discrete states. However, in many practical applications, the state of the
environment is inherently continuous in nature. In this paper, we aim to fill
this gap. We start by formally characterizing a sequence of discrete actions
and continuous states using finite horizon Markov decision processes and a
broad class of bijective structural causal models. Building upon this
characterization, we formalize the problem of finding counterfactually optimal
action sequences and show that, in general, we cannot expect to solve it in
polynomial time. Then, we develop a search method based on the $A^*$ algorithm
that, under a natural form of Lipschitz continuity of the environment’s
dynamics, is guaranteed to return the optimal solution to the problem.
Experiments on real clinical data show that our method is very efficient in
practice, and it has the potential to offer interesting insights for sequential
decision making tasks.



Related post