An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries. (arXiv:2211.04468v1 [q-bio.QM])

Virtual, make-on-demand chemical libraries have transformed early-stage drug
discovery by unlocking vast, synthetically accessible regions of chemical
space. Recent years have witnessed rapid growth in these libraries from
millions to trillions of compounds, hiding undiscovered, potent hits for a
variety of therapeutic targets. However, they are quickly approaching a size
beyond that which permits explicit enumeration, presenting new challenges for
virtual screening. To overcome these challenges, we propose the Combinatorial
Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative
model represents such libraries as a differentiable, hierarchically-organized
database. Given a compound from the library, the molecular encoder constructs a
query for retrieval, which is utilized by the molecular decoder to reconstruct
the compound by first decoding its chemical reaction and subsequently decoding
its reactants. Our design minimizes autoregression in the decoder, facilitating
the generation of large, valid molecular graphs. Our method performs fast and
parallel batch inference for ultra-large synthesis libraries, enabling a number
of important applications in early-stage drug discovery. Compounds proposed by
our method are guaranteed to be in the library, and thus synthetically and
cost-effectively accessible. Importantly, CSLVAE can encode out-of-library
compounds and search for in-library analogues. In experiments, we demonstrate
the capabilities of the proposed method in the navigation of massive
combinatorial synthesis libraries.



Related post