Sumit Mukherjee

Sumit Mukherjee

Staff Machine Learning Scientist at Insitro

    Sumit Mukherjee is a Staff Machine Learning Scientist at insitro. He holds a Ph.D. in Electrical & Computer Engineering from the University of Washington. At insitro, he is involved in the development of machine learning models to derive disease-relevant traits from clinical data and developing tools to evaluate the utility of such traits for drug discovery. Previously, he was a Senior Applied Scientist at Microsoft's AI for Good Research Lab, where he developed novel generative AI tools to enable privacy-preserving data sharing in healthcare.

    All Sessions by Sumit Mukherjee

    Day 2 04/24/2024
    3:30 pm - 4:00 pm

    Machine Learning Across Multiple Imaging and Biomarker Modalities in the UK Biobank Improves Genetic Discovery for Liver Fat Accumulation

    <span class="etn-schedule-location"> <span class="firstfocus">Machine Learning</span> </span>

    Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD), a condition where the liver contains more than 5.5% fat, is a major risk factor for chronic liver disease, affecting an estimated 30% of people worldwide. Although MASLD is a genetically complex disease, large- scale case-control cohort studies based on MASLD diagnosis have shown only limited success in discovering genes responsible for MASLD. This is largely due to the challenges in accurately and efficiently measuring the disease characteristics, which is often expensive, time-consuming, and inconsistent. In this study, we showcase the power of machine learning (ML) in addressing these challenges. We used ML to predict the amount of fat in the liver using three different types of data from the UK Biobank: body composition data from dual-energy X-ray absorptiometry (DXA), plasma metabolites, and a combination of anthropometric and blood-based biochemical markers (biomarkers). For DXA-based predictions, we used deep learning models, specifically EfficientNet-B0, to predict fat content from DXA scans. For predictions based on metabolites and biomarkers, we used a gradient boosting model, XGBoost. Our ML models estimated that up to 29% of participants in the UK Biobank met the criteria for MASLD, while less than 10% received the clinical diagnosis. We then used these estimates to identify regions of the genome associated with liver fat, finding a total of 321 unique regions, including 312 new ones, significantly expanding our understanding of the genetic determinants of liver fat accumulation. Our ML-based genetic findings showed a high genetic correlation with clinically diagnosed MASLD, suggesting that the genetic regions we identified are also likely to be relevant for understanding and diagnosing the disease in a clinical setting. This strong correlation underscores the potential of our approach to contribute to real-world medical applications. Our findings highlight the value of ML in identifying disease-related genes and predicting disease risk, demonstrating its potential to enhance our understanding of complex diseases like MASLD. This study highlights the potential of data science to help transform healthcare research and improve patient outcomes.

    Day 2 04/24/2024
    3:30 pm - 4:00 pm

    Machine Learning Across Multiple Imaging and Biomarker Modalities in the UK Biobank Improves Genetic Discovery for Liver Fat Accumulation

    <span class="etn-schedule-location"> <span class="firstfocus">Machine Learning</span> </span>

    Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD), a condition where the liver contains more than 5.5% fat, is a major risk factor for chronic liver disease, affecting an estimated 30% of people worldwide. Although MASLD is a genetically complex disease, large- scale case-control cohort studies based on MASLD diagnosis have shown only limited success in discovering genes responsible for MASLD. This is largely due to the challenges in accurately and efficiently measuring the disease characteristics, which is often expensive, time-consuming, and inconsistent. In this study, we showcase the power of machine learning (ML) in addressing these challenges. We used ML to predict the amount of fat in the liver using three different types of data from the UK Biobank: body composition data from dual-energy X-ray absorptiometry (DXA), plasma metabolites, and a combination of anthropometric and blood-based biochemical markers (biomarkers). For DXA-based predictions, we used deep learning models, specifically EfficientNet-B0, to predict fat content from DXA scans. For predictions based on metabolites and biomarkers, we used a gradient boosting model, XGBoost. Our ML models estimated that up to 29% of participants in the UK Biobank met the criteria for MASLD, while less than 10% received the clinical diagnosis. We then used these estimates to identify regions of the genome associated with liver fat, finding a total of 321 unique regions, including 312 new ones, significantly expanding our understanding of the genetic determinants of liver fat accumulation. Our ML-based genetic findings showed a high genetic correlation with clinically diagnosed MASLD, suggesting that the genetic regions we identified are also likely to be relevant for understanding and diagnosing the disease in a clinical setting. This strong correlation underscores the potential of our approach to contribute to real-world medical applications. Our findings highlight the value of ML in identifying disease-related genes and predicting disease risk, demonstrating its potential to enhance our understanding of complex diseases like MASLD. This study highlights the potential of data science to help transform healthcare research and improve patient outcomes.

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google