
Abstract: We present a novel format of machine learning competition where a user submits code that generates images trained on training samples, the code then runs on Kaggle, produces dog images, and user receives scores for the performance of their generative content based on quality, diversity, and memorization penalty. This style of competition targets the usage of Generative Adversarial Networks (GAN), but is open for all generative models. Our implementation features production-level inference from Neural Network. This design addresses overfitting by evaluating the same submissions separately on different Neural Networks as well as different “ground truth” dog image datasets for the public and private leaderboards. Furthermore, to prevent non-generative methods, we use an enclosed compute environment, “Kaggle Notebooks”, to run participants’ code on our cloud environment. In this talk, I will go through both the algorithmic and system design of our competition, as well as sharing our lessons learned from running this competition in July 2019 with 900+ teams participating and over 37,000 submissions and their code received. We will, of course, show some impressive cute puppy images generated by our participants.
Bio: Wendy is a senior data scientist/engineer at Kaggle (a Google Cloud company). She has worked extensively on innovative machine learning competitions, and is currently focusing on data infrastructure in-house. She has a PhD in Biomedical Engineering and Bachelor’s and Master’s in Electrical Engineering

Wendy Chih-wen Kan, PhD
Title
Senior Data Scientist/Engineer | Kaggle
Category
advanced-w19 | beginner-w19 | intermediate-w19 | machine-learning-w19 | open-source-w19 | talks-w19
