Abstract: Should a data scientist use R or should they use Python? The answer to this rather delicate question is, of course, they should know a bit each and use the most appropriate language for the task in hand. In this tutorial, we’ll take participants through the R tidyverse, one of the (many!) areas that R shines. The tidyverse is essential for any data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. We'll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, graphics and the concept of tidy data.
Goals: by the end of the tutorial participants will be able to
* install and load R packages;
* import and export data into R;
* construct graphics using the ggplot2 package;
* use the open-source RStudio IDE;
* identify strengths and weaknesses in R compared to Python;
* manipulate data into a tidy format using the tidyverse suite of R package;
* connect directly with databases using dplyr.
This tutorial assumes no prior knowledge of R, but we do assume prior knowledge of another programming language. The course will highlight the similarities and differences between R and Python, allowing participants to build on their existing Python knowledge while avoiding gotchas.
Participants should pre-install R and RStudio before the course.
Bio: Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.