Statistics for Data Science and Measurement


Statistics and statistical inference form the core of making sense of data. Inference is what allows us to extrapolate and generalize from our data to the populations we are trying to understand. Below inference lies measurement, how we attach numbers to phenomena we'd like to understand. In this tutorial we consider measurement and inference, especially as it pertains to scientific repeatability. We pay particular focus on using artificial intelligence and machine learning as methods of measurement and the fundamental role that inference plays. Specifically, we focus on validation.

Session Outline:

1. What is statistical inference?
2. Measurement foundations.
3. Scientific repeatability and reproducibility.
4. Inference for ML and AI.

After this sessions students will have:
1. A basic understanding of statistical inference including generalizability.
2. A basic understanding of the role that inference plays in validating ML and AI.
3. Techniques to apply to understand reproducible and replicability.
4. Understand statistical sampling assumptions and how they factor into summary measures.

Background Knowledge:

Basic data science facility. Basic algebra and a small amount of statistics.


Brian Caffo, PhD is a professor in the Department of Biostatistics with a secondary appointment in the Department of Biomedical Engineering at Johns Hopkins University. He graduated from the University of Florida Department of Statistics in 2001. He has worked in statistical computing, statistical modeling, computational statistics, multivariate and decomposition methods and statistics in neuroimaging and neuroscience. He led teams that won the ADHD 200 prediction competition. He co-directs the SMART statistical group. With other faculty at JHU, he created and co-directs the Coursera Data Science Specialization, a 10 course specialization on statistical data analysis. He co-directs the JHU Data Science Lab, a group dedicated to open educational innovation and data science. He is the former director of the Biostatistics graduate programs and admissions committees. He is currently the co-director of the Johns Hopkins High Performance Computing Exchange super computing service center and past-president of the Bloomberg School of Public Health faculty senate.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google