Building Data Narratives: An End-to-End Machine Learning Practicum
Building Data Narratives: An End-to-End Machine Learning Practicum


This workshop will take one through the steps associated with an end-to-end machine learning campaign: data retrieval; data curation; model construction, evaluation, selection and interpretation; and reporting. Particular attention will be paid to reporting, i.e., building a narrative. Examples will be presented demonstrating how one might generate multiple output formats (e.g., HTML pages, presentation slides, PDF documents) starting with the same code base.

The practice of literate computing underpins this workshop. Literate computing weaves a narrative directly into an interactive computation. Text, code, and results are combined into a narrative that relies equally on textual explanations and computational components. Insights are extracted from data using computational tools. These insights are communicated to an audience in the form of a narrative that resonates with the audience. Literate computing lends itself to the practice of reproducible research. One may re-run the analyses; run the analyses with new data sets; modify the code for other purposes.

The workshop will have three main foci:
1. infrastructure: instantiating the computational environment; loading packages; loading data
2. computation: data curation, transformation, and analysis; model construction, evaluation, and interpretation
3. communication: creating tables, charts, and graphs; weaving all components into data narrative

As a specific example, a data narrative will be built showing how one might build predictive models for the toxicity of organic molecules. Reports will be presented as (1) an HTML file, (2) a PDF document (in a format acceptable for journal submission), and (3) a slide presentation.

While the workshop’s example comes from the field of cheminformatics, the computational tools used and the exercises presented are applicable to any field where an investigator is interested in building predictive models, and describing these models to colleagues and associates.

The R and Python ecosystems will be used throughout. All data, code, and text will be made available.

At the workshop’s conclusion attendees will have worked through exercises that may serve as templates to be used with their data as they build their data narratives.

● Jupyter notebooks;
● RStudio;


Paul Kowalczyk is a Senior Data Scientist at Solvay. There, Paul uses a variety of toolchains and machine learning workflows to visualize, analyze, mine, and report data; to generate actionable insights from data. Paul is particularly interested in democratizing data science, working to put data products into the hands of his colleagues. His experience includes using computational chemistry, cheminformatics, and data science in the biopharmaceutical and agrochemical industries. Paul received his PhD from Rensselaer Polytechnic Institute, and was a Postdoctoral Research Fellow with IBM’s Data Systems Division.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google