Abstract: This presentation will introduce the machine learning algorithm, Multivariate Adaptive Regression Splines, using the Earth package available in open source R code. We will demonstrate the power and flexibility of this algorithm by identifying relationships between global temperature change and concentrations of various greenhouse gasses. We utilize average global temperature data from 1880 through 2016 from NASA’s global climate division in conjunction with global average concentrations of over 40 chemical species. In our modeling, we consider the “usual suspects” like carbon dioxide, methane, and nitrous oxide, as well as ozone-depleting substances, such as chlorofluorocarbons, and other chemical species.
We will show how this algorithm can capture non-linear effects and interactive effects. This powerful approach incorporates “feature reduction” when many variables are considered. Our climate change example demonstrates the selection from over 40 variables to be included in trend models.
This approach is a strong competitor to neural network modeling. The spline algorithm provides a clear predictive model by capturing the growth/decline effect of each greenhouse chemical. We will show that this clarity is an advantage over neural networks and other black box methods.
The additional benefits of this spline approach are the “Partial Dependence” graphs that capture the effect of changing one or two variables at a time on the target variable. In this example, the target is the temperature change. The two-way “Partial Dependence” graphs are especially helpful when interactive effects of the predictor variables need to be represented.
Bio: Remy is a master's student at the University of Illinois at Urbana-Champaign and is currently in the department of atmospheric sciences. His research focuses on the spatiotemporal variation of particulate matter in the United States.