
Abstract: In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for tabular data analysis. We start by learning the core Pandas data structures, the Series and DataFrame. From these foundations, we will learn to use the split-apply-combine paradigm for grouped computations, manipulate time series, and perform advanced joins between datasets. Specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals and advanced features of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.
Session Outline
Our course of study begins with the fundamentals of Pandas and builds up from there to more advanced topics. In this session, we will complete three lessons. We start by learning to work with univariate data in Pandas using the Series data structure. Learning how to create, index, and filter Series will prepare us perform these operations in multiple dimensions later. Next, we'll learn about the workhorse of Pandas, the DataFrame, in particular, how they're structured, how to select from, them, and how to filter from them. In the third lesson, we'll learn how to apply grouped computations on DataFrames and Series using the split-apply-combine paradigm.
The full course curriculum is available at https://github.com/dgerlanc/programming-with-data/
Background Knowledge
Intermediate programming experience in Python
Bio: Daniel Gerlanc has worked as a data scientist for more than decade and been writing sofware for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.

Daniel Gerlanc
Title
Sr. Director - Data Science & ML Engineering | Ampersand
