Synthesizing Pareto-Optimal Interpretations for Black-Box Models. (arXiv:2108.07307v1 [cs.LG])

We present a new multi-objective optimization approach for synthesizing
interpretations that “explain” the behavior of black-box machine learning
models. Constructing human-understandable interpretations for black-box models
often requires balancing conflicting objectives. A simple interpretation may be
easier to understand for humans while being less precise in its predictions
vis-a-vis a complex interpretation. Existing methods for synthesizing
interpretations use a single objective function and are often optimized for a
single class of interpretations. In contrast, we provide a more general and
multi-objective synthesis framework that allows users to choose (1) the class
of syntactic templates from which an interpretation should be synthesized, and
(2) quantitative measures on both the correctness and explainability of an
interpretation. For a given black-box, our approach yields a set of
Pareto-optimal interpretations with respect to the correctness and
explainability measures. We show that the underlying multi-objective
optimization problem can be solved via a reduction to quantitative constraint
solving, such as weighted maximum satisfiability. To demonstrate the benefits
of our approach, we have applied it to synthesize interpretations for black-box
neural-network classifiers. Our experiments show that there often exists a rich
and varied set of choices for interpretations that are missed by existing



Related post