Is Machine Learning Necessary to Solve Problems in Biology


One of the fundamental goals of biology is to understand how the genetic information encoded in the DNA of the organism translates into its unique physical characteristics.

The fundamental unit of life is a cell, the smallest element capable of independent existence. A multicellular organism, such as human being, comprises of several organ systems, each one of the organ consisting of multiple tissues, and each of the tissues consists of several distinct cell-types. All the distinct cell-types that form the organism has the same genetic code, but differ in their gene expression profiles to accomplish their functions. The differences in gene expression levels are determined by another DNA based code called cis-regulatory code. The fundamental goal of biology then translates to deciphering the cis-regulatory code of the cell.

The cis-regulatory code is highly dependent on the cellular context, and epigenetic marks such as DNA methylation or histone modifications can alter the functional states of these codes. This is the mechanism than enables the distinct cell-types to have different gene expression levels and perform their diverse functions. The holy grail of computational biology is to decipher the cis-regulatory code to predict the gene expression levels in these distinct cell types accurately.

The mechanism of action of the cis-regulatory code is nonlinear. Minor changes in the regulatory element can lead to drastic changes in the targeted gene and vice versa. This nonlinearity manifests in the relationship between the elements of the cis-regulatory code and the transcriptional levels, and is described as a probabilistic and multi-step process. What is desired is an integrated quantitative model accounting for all these features, together with more extensive quantitative measurements of state of cis-regulatory elements in diverse contexts, should lead us toward a comprehensive and predictive model of transcriptional regulation.

From the above explanation, the fundamental goal of biology requires us to develop a computational model that is predictive. The dynamics of the regulatory mechanisms can be captured mathematically as nonlinear partial differential equations. The analytical solutions to these models can be hard as almost no general techniques exist that work for all such equations, and usually, each equation must be studied as a separate problem. A computational model of the cis-regulatory code may be an alternative approach to develop a predictive model, but the nonlinearity of the regulatory mechanisms introduces a fundamental limit in our quest to develop a predictive model. Machine learning has emerged as a promising alternative, as exemplified by several success in the recent times.

In this talk I will explain the challenges of developing computational models of nonlinear systems and examples of success stories using machine learning algorithms. There are lot of problems in biology and medicine that the practicing data scientists can address.


Joshy George is a bioinformatics researcher with a Ph.D. in Bioinformatics from the University of Melbourne, Australia, and a Master's in Computer Science from the Indian Institute of Science. With his background in data science and machine learning, Dr. George has co-authored over 100 peer- reviewed scientific articles, showcasing expertise in developing principled methods to solve complex biological problems. In his current role, he leads a team that is focused on building predictive models for cancer precision medicine and understanding the molecular mechanisms leading to diseases.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google