Abstract: The emergence of data science as a discipline has impacted businesses in a range of different ways. One primary impact has been to elevate the use of data in decision-making by using statistical methods to assess the ever-growing datasets companies are collecting. This workshop will review and introduce statistical techniques and touch on more advanced methods for dealing with noisy data and applying real-world constraints to analyses. This workshop assumes a working knowledge of standard statistical methods and will aim to connect theory to practice using real-world examples.
Lesson 1: Descriptive statistics and exploring data statistically
- (Re)familiarize yourself with basic descriptive statistics
- Use simple data exploration techniques to identify problems and limitations of a new dataset
Lesson 2: Statistical analyses
- Review of statistical tests to compare datasets and groups within those data
- Assessments of correlations and other qualities of the data with an eye towards modeling
Lesson 3: More advanced analyses and methods
- Linear modeling and the statistical outputs thereof
- Stats -> ML: connections and methodologies
● Jupyter notebooks;
a setup doc: https://docs.google.com/document/d/1LjaQXflpNIKNOcbvXoB9AT-gq4mBgEryNE9rEFQdZd4/edit?usp=sharing, and a doc for the (eventual syllabus and links to other resources) https://docs.google.com/document/d/19PWp_GzrAa11bQ3E4Ge0kKyfQQNG4jPCRgzzSdJeVAY/edit?usp=sharing
Bio: Andrew is a Ph.D. Astrophysicist who made the switch from academia to data science (via the Insight Data Science program) in 2014. He was the first data scientist hired at Greenhouse Software where he has worked on many internal data science projects and a few customer-facing data-powered product features. Andrew lives in New Jersey with his wife and son.