Methods to Derive Computable Information from Sparse Electronic Medical Record Data: a Guide for the Data Analyst in Biomedicine
Methods to Derive Computable Information from Sparse Electronic Medical Record Data: a Guide for the Data Analyst in Biomedicine


Biological and molecular studies can quickly identify the underlying causes of diseases on an industrial scale. Data is therefore made available for analysis of the genetic and cellular function of diseases in large populations. Our ability to create computable data to study the roots of disease in our species and even the species of microbiota living in and on us has become relatively cheap in the last twenty years. The data describing these analyses (or annotations) have also become uniform in results across genomic, molecular, and biological studies which makes resulting based on these complex tests scalable. This is in sharp contrast to our current ability to analyze the characteristics of populations and individuals (the phenotype) to further drive the translation of these genomic discoveries in the laboratory to actionable interventions in healthcare.

Data from patients’ medical records is rarely ready to be analyzed or understood in its raw form. Usually electronic medical record (EMR) data is derived from transactional systems that adhere to standards and aid in billing for an institution, hospital, or agency. Though this data is critical to define outcome measures for individualized medical interventions and make discoveries, it is limited in utility and expensive to obtain in complex biomedical research. Deriving information from the EMR also increases in difficulty as the study population decreases in size; therefore, obtaining machine computable information specifically in rare disease or smaller at-risk populations like children is almost impossible from source and also expensive to procure. This process is unfortunately not becoming cheaper as our computational abilities increase.

How do data professionals work in an environment with incredibly sparse data that comes with challenging integration issues across similar populations? How does a data professional answer questions from data sets in a medical domain in which they have no experience? How does a data professional have a respected seat at the table with medical experts? In this session, participants will hear about cases in pediatric rare disease research wherein steps are taken to establish repeatable processes in clinical data analysis. These processes are split into a threefold pipeline of exploratory data analysis, data transformation, and ontologically-based categorization to derive sequence and temporally-based descriptive and computable matrices from patient-to-patient. This session will cover the spectrum of the role of a data analyst in biomedicine and propose specific professional approaches and technical solutions to address the gap in EMR-driven phenotypic data features by utilizing large harmonized observational clinical data.


Alex leads a team of software developers and data analysts at the Department of Biomedical and Health Informatics at the Children’s Hospital of Philadelphia where he is focused on the development, implementation, and integration of applications designed to manage the complex nature of institutional and study specific biorepositories and multi-modal data projects in biomedical research. His team created the novel use of a tool kit for multidisciplinary data-intensive informatics initiatives that includes CHOP’s electronic honest brokering system. From 2008 through 2012, he served as the primary liaison for biomedical informatics in the Research Institute's Center for Childhood Cancer Research and the Division of Oncology where he directed the long-term development and implementation of a fully-integrated cancer research informatics infrastructure and implementation of the Cancer Center’s Clinical Trials Management System (CTMS) and its subsequent integration with hospital clinical systems. Alex’s research involves signals from large clinical data sets to lower the human data-entry burden in preparation to machine learning data analytics in rare diseases. Alex holds his undergraduate degree from Temple University and completed his Masters and Doctoral studies in Information Science from Drexel University.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google