Abstract: R is one of the most popular programming languages in the world of data science, and is used in almost every industry for comprehensive statistical analysis. R also offers rapid visualization and deployment features, as well as extensive packages tailored to big data analysis.
In this workshop, you will learn techniques for handling data at scale using open source R packages. You will practice generating interactive plots with dynamic data filtering and explore how to take advantage of R’s native parallelization features to deliver highly scalable solutions. You will also learn how to design, run, and deploy R-based web applications in minutes.
Lesson 1: Foundational Analysis Tools
(Re)familiarize yourself with R syntax, data structures, and coding conventions. You will learn how to use packages such as ‘data.table’ and ‘dplyr’ to create dataframes in preparation for visualization or more complex statistical analysis.
Lesson 2: ETL for Big Data Analysis
Efficiently extract, transform, and load high volumes of data using R. At the end of this lesson, you will be able to parallelize R code for faster data ingestion.
Lesson 3: Scalable Data Visualizations
You will practice generating heat maps, box plots, and graphs for trace data using open source packages. In this lesson, you will learn how to quickly render interactive plots with thousands of data points.
Lesson 4: Web Apps
At the end of this lesson, you will be able to create web applications in minutes using Rmarkdown, Rshiny, and flexdashboard. You will learn how to render interactive and downloadable plots and data tables, as well as the benefits and limitations of shiny deployment.
Lesson 5: Modularization
In this lesson, you will learn when it is most appropriate to use R for big data, and how to modularize R code in order to leverage features in other tools or languages. By the end of this lesson, you will be able to containerize and call your R code.
Basic knowledge of R
Bio: Ysis Tarter is a senior data engineer at Absci, where deep learning AI and synthetic biology are harnessed to translate ideas into drugs. She leads the development of data platforms and pipelines for high-throughput biological data, as well as scientific tools for data analysis. Ysis is also the co-tech lead of the Bay Area chapter of Black Girls Code and teaches data visualization and analytics. She has lectured at several institutions, including Columbia, USC, and UC Berkeley. Tarter holds an MS in Applied Biomedical Engineering from Johns Hopkins University and a BS in Computer Science from Stanford University where she specialized in biocomputation. She has published peer-reviewed articles in the fields of scalable neuroscience and synthetic biological design.