Building Hypothesis-Driven Virtual Screening Pipelines for Millions of Molecules
Building Hypothesis-Driven Virtual Screening Pipelines for Millions of Molecules


This talk will focus on a novel, hypothesis-driven filtering strategy and open-source toolkit for virtual screening, Screenlamp, which has been successfully applied to a identify potent inhibitors of G-protein coupled receptor-mediated signaling in vertebrates.

The Screenlamp project allows researchers to test hypotheses about the importance of specific functional groups or pharmacophores (3D spatial relationships between functional groups) that lead to high ligand activity. Given a large database of arbitrary molecules, the probability of identifying a promising lead molecule increases with the number of molecules screened. Thus, one of our goals was to make the organization and screening of very large databases of millions of commercially available compounds accessible to a typical research lab using open source tools developed in Python. With Screenlamp, we developed a toolkit that achieves the best of both worlds – focusing on functional group matches that are likely to enhance activity, while also searching as many molecules as possible to discover those that match the volume and 3D distribution of charge in a known ligand.

While virtual screening makes it possible to analyze large databases of millions of molecules and select a subset likely to be active, the goal of this talk is to introduce the tools that surmount the computational management and analysis challenges posed on working with millions of molecules and allow researchers to build custom pipelines for hypothesis-driven experiments.

Going beyond the mere identification of potent protein inhibitors, this talk will present techniques to integrate the computational predictions with experimental knowledge. Leveraging experimental data, supervised feature selection and extraction techniques will be used to identify the discriminants of biological activity using open-source machine learning libraries such as scikit-learn.


Sebastian is the author of the bestselling book Python Machine Learning. As a Ph.D. candidate at Michigan State University, Sebastian Raschka is developing novel computational methods in the field of computational biology. Among others, his research activities include the development of new deep learning architectures to solve problems in the field of biometrics.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google