# Boosting Convolutional Neural Networks’ Protein Binding Site Prediction Capacity Using SE(3)-invariant transformers, Transfer Learning and Homology-based Augmentation. (arXiv:2303.08818v1 [q-bio.QM])

Figuring out small molecule binding sites in target proteins, in the
resolution of either pocket or residue, is critical in many virtual and real
drug-discovery scenarios. Since it is not always easy to find such binding
sites based on domain knowledge or traditional methods, different deep learning
methods that predict binding sites out of protein structures have been
developed in recent years. Here we present a new such deep learning algorithm,
that significantly outperformed all state-of-the-art baselines in terms of the
both resolutions$unicode{x2013}$pocket and residue. This good performance was
also demonstrated in a case study involving the protein human serum albumin and
its binding sites. Our algorithm included new ideas both in the model
architecture and in the training method. For the model architecture, it
incorporated SE(3)-invariant geometric self-attention layers that operate on
top of residue-level CNN outputs. This residue-level processing of the model
allowed a transfer learning between the two resolutions, which turned out to
significantly improve the binding pocket prediction. Moreover, we developed
novel augmentation method based on protein homology, which prevented our model
from over-fitting. Overall, we believe that our contribution to the literature
is twofold. First, we provided a new computational method for binding site
prediction that is relevant to real-world applications, as shown by the good
performance on different benchmarks and case study. Second, the novel ideas in
our method$unicode{x2013}$the model architecture, transfer learning and the
homology augmentation$unicode{x2013}$would serve as useful components in
future works.