Abstract: Forecasting is both a fascinating subject to study and an important technique applied in industry, government, and academic settings. Example applications include demand and inventory planning, marketing strategy planning, capital budgeting, pricing, machine predictive maintenance, macroeconomic forecasting, and supply chain forecasting.
Forecasting typically requires time series data, and time series data is ubiquitous nowadays, both within and outside of the data science field: weekly initial unemployment claims, tick-level stock prices, weekly company sales, daily number of steps taken recorded by wearables, machine performance measurements recorded by sensors, key performance indicators of business functions, just to name a few.
However, time series data differs from cross-sectional data in that time series data has temporal dependence, and this dependence can be leveraged to forecast future values of the series. Some of the most important and commonly used data science techniques to analyze time series data and make forecasts based on them are those developed in the field of statistics and machine learning. For this reason, time series statistical and machine learning models should be included in any data scientists’ toolkit.
Unlike some of my prior presentations and tutorials that covered both statistical and neural network-based models for time series analysis, this talk will be introductory in nature and will focus on the discussion of a couple of workhorse statistical time series models that are frequently applied to solving time series forecasting problems.
Specifically, I will sketch the family Autoregressive Integrated Moving Average (ARIMA) models (with and without seasonal components) and the class of Vector Autoregressive (VAR) Models, including a discussion of the advantages and disadvantages when using each of these models in time series forecasting scenarios. Both real-world and simulated time series will be used to illustrate the application of these techniques in Python. Exploratory time series data analysis will also be included in the presentation.
This presentation is suitable for anyone who is not familiar with statistical time series modeling and want to learn the basics of statistical time series analysis and modeling, and it may include data scientist, data engineers, and data science/engineer VP/Director/Manager who were not trained in statistics and econometrics and did not have much exposure to statistical time series modeling.
Bio: Jeffrey is a VP of Data Science, Data Engineering, and Platform Engineering at the Store Associate Technology of Walmart Global Technology. His prior roles include the Chief Data Scientist at AllianceBernstein, a global asset-management firm that managed nearly $700 billion, Vice President and Head of Data Science at Silicon Valley Data Science, and senior leadership position at Charles Schwab Corporation and KPMG. He has also taught econometrics, statistics, and machine learning at UC Berkeley, Cornell, NYU, University of Pennsylvania, and Virginia Tech. Jeffrey is active in the data science community and often speaks at data science conferences and local events. He has many years of experience in applying a wide range of econometric and machine learning techniques to create analytic solutions for financial institutions, businesses, and policy institutions. Jeffrey holds a Ph.D. and an M.A. in Economics from the University of Pennsylvania and a B.S. in Mathematics and Economics from UCLA.
Jeffrey Yau, PhD
VP of Data Science, Data Engineering, and Platform Engineering | Store Associate Technology of Walmart Global Technology