Abstract: In many data science ecosystems data frame is a pivotal object. It is not only very useful conceptually, but also ensures that data transformation operations can be performed efficiently. Therefore packages like data.table in R or pandas in Python are star players.
With the Julia language the situation is different because it gives you the speed out of the box. Therefore the DataFrames.jl package is designed to be a sidekick that conveniently supports your core data analysis pipeline. It has a more focused functionality than e.g. pandas, but at the same time it seamlessly integrates with the whole Julia data science ecosystem.
During this workshop, using hands-on examples, I will discuss the design principles behind DataFrames.jl and walk you through key functionalities provided by this package. All presented materials will be made available before the workshop in a blog post on https://bkamins.github.io/.
By completing this workshop you should have a basic knowledge of key functionalities that DataFrames.jl provides and how it is integrated with the data science ecosystem in Julia.
In the session we are going to cover the following topics:
* basic operations on data frames
* reading and writing CSV and Apache Arrow files
* aggregation using split-apply-combine
* integration with the ecosystem: plotting data, building basic predictive models
I assume knowledge of at least one data manipulation framework like e.g. pandas or data.table. I also assume an entry level ability to read Julia code.
Bio: Bogumił Kamiński is Head of Decision Analysis and Support Unit at Warsaw School of Economics, Poland, and Adjunct Professor at Data Science Laboratory, Ryerson University, Canada. His research interests are techniques of large scale mathematical modelling of complex systems combining simulation, optimization, and machine learning. A particular area of his expertise are agent based simulation and modeling and analysis of complex networks.
Bogumił is one of the core developers of DataFrames.jl package for data wrangling in the Julia language. He is also a top answerer for the [julia] tag on StackOverflow and regularly discusses a wide range of data science related topics on his blog https://bkamins.github.io/.