Abstract: When faced with a new dataset for a Machine Learning task, there are common questions that every data scientist will ask themselves about the data, and common preprocessing and cleaning operations to be performed. This can be laborious and time consuming using pandas alone, and looking at all columns and their interactions can become infeasible for larger datasets. In this talk, we'll see how the dataset on-boarding process for machine learning can be greatly simplified by using the dabl library in Python, which provides interactive suggestions for data cleaning and
Bio: Andreas Mueller is a Principal Research SDE at Microsoft (previously Columbia, NYU, Amazon), and author of the O'Reilly book "Introduction to machine learning with Python", describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. Andreas is also a Software Carpentry instructor.