Abstract: In the real world, data scientists and machine learning engineers don’t operate in a vacuum of data munging and model construction. There are many other significant challenges that they face, for example:
- Determining Whether to Use Machine Learning - Machine learning systems cost an enormous amount of effort to construct and maintain, but they are not necessary to achieve “good enough” performance on many tasks, especially ones where large performance improvements will not significantly impact business metrics.
- Curating Labelled data - For situations where high quality labeled data is hard to come by (such as predicting the next financial crash), practitioners of machine learning need to be creative in finding the right data to fit and evaluate their models.
- Determining Metrics - Data scientists and machine learning engineers frequently employ metrics that can directly measure model performance, such as ROC AUC, mean square error, r^2, etc. However, there is often a disconnect between these metrics and business metrics, such as click through rate, monthly active users or revenue.
- Maintaining Pipelines - The world changes over time, and data changes with it. World events, data collection improvements, user demographic draft and other changes in the distribution of model input data can significantly degrade the performance of machine learning models.
In this workshop, participants will learn how to frame business problems as machine learning problems by working through a set of case studies. For each case, participants will design strategies for collecting data, extracting features, constructing training pipelines, measuring performance, and detecting bugs. Participants will learn about common pitfalls such as a lack of labelled data/label uncertainty, prediction-intention mismatch, abstraction degradation and covariate shift.
Bio: Dan works at Twitter Cortex, where he develops tools and algorithms to make it easier for teams at Twitter to utilize Machine Learning. Previously, Dan was a Senior Data Scientist at TrueMotion, where he built machine learning algorithms that use smartphone sensors to understand and score driving behaviors. In addition, Dan regularly presents and publishes at industry and academic Machine Learning and Data Science conferences.