Machine Learning in R Part I: Penalized Regression and Boosted Trees
Machine Learning in R Part I: Penalized Regression and Boosted Trees


In this workshop we go over two fundamentals machine learning methods: Penalized regression and boosted trees. Learn the theory behind the models and get practical hands-on experience using {glmnet} and {xgboost} in R. You'll fit fit the models, assess model fit, tune hyperparameters and make predictions.

Session Outline
- See the curse of dimensionality
- Understand the importance of variable selection
- Learn about penalized regression
- L1 (Lasso)
- L2 (Ridge)
- L1 + L2 (Elastic Net)
- Perform feature engineering with {recipes}
- Fit a lasso regression using {glmnet}
- Choose the best amount of penalty
- Learn the different ways to define """"best""""
- Visualize the coefficients/weights with {coefplot}
- Fit a ridge regression using {glmnet}
- Fit an Elastic Net regression using {glmnet}
- Make predictions using these models

- Learn about decision trees
- Perform feature engineering with {parsnip}
- Fit a decision tree using {xgboost}
- Visualize the tree
- Learn about boosting
- Fit a boosted tree using {xgboost}
- Learn about variable importance
- Learn about the different {xgboost} tuning parameters
- Fit a boosted pseudo random forest using {xgboost}
- Make predictions using these models

Background Knowledge
Basic knowledge of R


Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fundraising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google