Introduction to scikit-learn: Machine Learning in Python


Build your first Machine Learning models in Python using scikit-learn. Become well-versed in the entire scikit-learn suite to fit models, score models, make predictions from models, and fine-tune models. Many algorithms including Linear Regression, Logistic Regression, Decision Trees, Random Forests, and XGBoost are sampled. Use train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV, and feature_importances_ to improve your life. A brief introduction to pandas is included to load, visualize, and prepare your data.

All code is presented in Python via Jupyter Notebooks on GitHub. Datasets are provided, but attendees are encouraged to bring their own CSV files for real-world practice. The only prerequisite is proficiency in Python. If you have never built a Machine Learning model, this is an excellent place to start. By the end of this workshop you will confidently build, score, fine-tune, and make predictions from Machine Learning models in scikit-learn.

Session Outline:

Module 1: Preparing data for Machine Learning with pandas
It's essential to load data properly to build successful Machine Learning models in scikit-learn. We cover loading data into pandas DataFrames, clearing null values, transforming categorical columns into numerical columns, and choosing target and predictor columns.

Module 2: Supervised learning with scikit-learn
Scikit-learn's API for splitting data, training models, scoring models, and making predictions is very user-friendly. Scikit-learn includes hundreds of Machine Learning algorithms to choose from. We sample Linear Regression, Logistic Regression, Decision Trees, Random Forests, and XGBoost, as you build your first Machine Learning models.

Module 3: Cross-Validation with scikit-learn
Overfitting data often occurs with strong Machine Learning models. Scikit-learn provides excellent cross-validation options to split your data into multiple training and test sets. We cover essential cross-validation practices such as K-Fold cross-validation and stratifying your data.

Module 4: Fine-tuning models with scikit-learn
Optimizing Machine Learning models requires understanding the ranges of hyperparameters, and finding the best possible combinations. Scikit-learn includes powerful modules for full grid searches and random searches to find unique combinations best suited to your data. We focus on fine-tuning tree ensembles with XGBoost.

Module 5: Finding the Most Influential Columns with scikit-learn
Many businesses want to know the most influential attributes (columns) in predicting certain outcomes. Several scikit-learn algorithms including Random Forests and XGBoost include an attribute called feature_importances_ that ranks all columns by their numerical influence in making predictions. This is a great bonus that scikit-learn provides.

Background Knowledge:

Python proficiency


Corey Wade, MS Mathematics, MFA Writing & Consciousness, is the director and founder of Berkeley Coding Academy, an online program with live classes where teenagers learn Python Programming, Data Analytics, and Machine Learning. Author of Hands-on Gradient Boosting with XGBoost and scikit-learn, and lead author of The Python Workshop, Corey also teaches Math, Programming, and Data Science at Berkeley Independent Study. Corey has published iPhone apps with students, designed classes to build websites, and run after-school coding programs to support girls and underserved students. A Springboard Data Science graduate and multiple grant award-winner, Corey has also worked in industry developing Data Science curricula for Pathstream and Hello World while contributing articles for Towards Data Science. When not coding or teaching, Corey reads poetry and studies the stars.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google