Automatic DataFrame Profiling and Visualization for Machine Learning

Abstract: 

When faced with a new dataset for a Machine Learning task, there are common questions that every data scientist will ask themselves about the data, and common preprocessing and cleaning operations to be performed. This can be laborious and time consuming using pandas alone, and looking at all columns and their interactions can become infeasible for larger datasets. In this talk, we'll see how the dataset on-boarding process for machine learning can be greatly simplified by using the dabl library in Python, which provides interactive suggestions for data cleaning and

Bio: 

Andreas Mueller is a Principal Research SDE at Microsoft (previously Columbia, NYU, Amazon), and author of the O'Reilly book "Introduction to machine learning with Python", describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. Andreas is also a Software Carpentry instructor.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google