Abstract: If you've never heard of the 'good, fast, cheap' dilemma, it goes something like this: You can have something good and fast, but it won't be cheap. You can have something good and cheap, but it won't be fast. You can have something fast and cheap, but it won't be good. In short, you can pick two of the three but you can't have all three. If you've done a data science problem before, I can all but guarantee that you've run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, and cheap. Above all, we’re focusing on practical, meaningful recommendations! We'll start by visualizing the impact of missing data, then get into strategies for doing data science with missing data. We’ll discuss the advantages and disadvantages of each approach. We’ll cover the different types of missing data and why that influences which strategy you select. We will discuss how to best integrate these strategies into your existing workflow!
Bio: Matt currently leads instruction for GA’s Data Science Immersive in Washington, D.C. and most enjoys bridging the gap between theoretical statistics and real-world insights. Matt is a recovering politico, having worked as a data scientist for a political consulting firm through the 2016 election. Prior to his work in politics, he earned his Master’s degree in statistics from The Ohio State University. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, and/or cuddling with his pug.