ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models. (arXiv:2110.11960v1 [cs.LG])

The demand for explainable machine learning (ML) models has been growing
rapidly in recent years. Amongst the methods proposed to associate ML model
predictions with human-understandable rationale, counterfactual explanations
are one of the most popular. They consist of post-hoc rules derived from
counterfactual examples (CFs), i.e., modified versions of input samples that
result in alternative output responses from the predictive model to be
explained. However, existing CF generation strategies either exploit the
internals of specific models (e.g., random forests or neural networks), or
depend on each sample’s neighborhood, which makes them hard to be generalized
for more complex models and inefficient for larger datasets. In this work, we
aim to overcome these limitations and introduce a model-agnostic algorithm to
generate optimal counterfactual explanations. Specifically, we formulate the
problem of crafting CFs as a sequential decision-making task and then find the
optimal CFs via deep reinforcement learning (DRL) with discrete-continuous
hybrid action space. Differently from other techniques, our method is easily
applied to any black-box model, as this resembles the environment that the DRL
agent interacts with. In addition, we develop an algorithm to extract
explainable decision rules from the DRL agent’s policy, so as to make the
process of generating CFs itself transparent. Extensive experiments conducted
on several datasets have shown that our method outperforms existing CF
generation baselines.



