Abstract: Recommender system has become a widely popular technique in social network and e-commerce services. In this work, we have applied recommender system in the drug discovery setting, and the first test case we tried was in virtual chemical biology. Chemical biology studies interactions between chemical matter and biological targets, typically proteins, with an eye toward answering questions around new biological pathways, their potential role in disease treatment, and their potential liabilities as drug targets. Most small molecule drugs are known to interact with multiple proteins, effectively a lack of fine-tuning with respect to the biological system at large. This phenomenon is sometimes referred to as polypharmacology. Unfortunately, experimentally available bioactivity data is often limited to only a few of the proteins.
First, we assembled a large knowledge base of small molecule–protein interactions that covers millions of compounds and thousands of proteins, yet despite our best efforts the resulting matrix is very sparse. By analogy to e-commerce, we can view small molecules and proteins as users and products, respectively, treating drug-target interaction (DTI) prediction as a recommendation task, and predicting billions of connections between drugs and possible targets within the DTI network.
The resulting hybrid model we came up with used content-based filtering to narrow down the search space, while collaborative filtering modeled the probability of interaction between a drug and a target through neighborhood-regularized logistic function of drug-specific and target-specific latent vectors that represented their properties. The model significantly outperformed our previous non-network-based benchmarks on holdout large scale data sets.
Bio: Ambrish is a data scientist at Vertex Pharmaceuticals, working with business partners to re-shape their analytic capabilities and implement innovative technology and solutions. Ambrish holds a Ph.D. in Bioinformatics and he has developed multiple scientific applications in the field of computational structural biology and chemistry.