Abstract: What can we learn about data science by watching data science competitions?
During a data science competition like the ones hosted by DrivenData and Kaggle, the leaderboard lists the teams that have submitted models and the scores the top models have achieved.
As the competition proceeds, the scores often improve quickly as teams explore a variety of models and then more slowly as they approach the limits of what’s possible.
Using 170,000 scores from more than 50 competitions hosted by DrivenData, we explore the aggregated behavior of the competing teams.
What patterns can we see?
Based on early returns, can we predict the limits?
What factors influence the time, and number of submissions it takes to reach the performance plateau?
Do models tend to overfit the data as the contest progresses?
And what guidance can we provide for deciding when to stop searching?
In this talk, we will answer these questions and share other observations from the other side of the leaderboard.
Bio: Isaac is a co-founder and Principal Data Scientist at DrivenData, Inc, where he leads client engagements and spearheads development of the data science competition platform. He holds a master's in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.