Neural Networks beyond explainability: Selective inference for sequence motifs. (arXiv:2212.12542v1 [q-bio.GN])

Over the past decade, neural networks have been successful at making
predictions from biological sequences, especially in the context of regulatory
genomics. As in other fields of deep learning, tools have been devised to
extract features such as sequence motifs that can explain the predictions made
by a trained network. Here we intend to go beyond explainable machine learning
and introduce SEISM, a selective inference procedure to test the association
between these extracted features and the predicted phenotype. In particular, we
discuss how training a one-layer convolutional network is formally equivalent
to selecting motifs maximizing some association score. We adapt existing
sampling-based selective inference procedures by quantizing this selection over
an infinite set to a large but finite grid. Finally, we show that sampling
under a specific choice of parameters is sufficient to characterize the
composite null hypothesis typically used for selective inference-a result that
goes well beyond our particular framework. We illustrate the behavior of our
method in terms of calibration, power and speed and discuss its power/speed
trade-off with a simpler data-split strategy. SEISM paves the way to an easier
analysis of neural networks used in regulatory genomics, and to more powerful
methods for genome wide association studies (GWAS).



Related post