Target Leakage in Machine Learning
Target Leakage in Machine Learning

Abstract: 

Target leakage is one of the most difficult problems in developing real-world models. It occurs when training data gets contaminated with information that will not be known at prediction time. Data collection, feature engineering, partitioning, and model validation are all potential sources of data leakage. This talk offers real-life examples of data leakage at different stages of data science projects, discusses countermeasures, and lays out best practices for model validation.

Bio: 

Yuriy Guts is a Machine Learning Engineer at DataRobot with over 10 years of industry experience in data science and software architecture. His primary interests are productionalizing data science, automated machine learning, time series forecasting, and processing spoken and written language. He teaches AI and ML at UCU, competes on Kaggle, and has led multiple international data science and engineering teams.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google