Abstract: To really do low latency analytics right we need to take a full-stack approach to the problem. To get the best user experience, the front end, data processing and storage all need to work together but the same time should still allow for a flexible system.
In this workshop, we will take a look at each of the parts of a real-time analytics pipeline to understand the options available and trade-offs associated with different technologies and techniques in a modern data pipeline.
Module 1: An introduction to low latency processing pipelines
In this module we will cover some popular options for real-time analytics pipelines such as Kafka, Redis in the Open Source space as well as a brief overview of some competitive commercial options.
Module 2: Data storage for low latency analytics
When the going gets tough, the tough cheat. Processing high volumes of data very quickly can tough problem to solve, especially if you need to operate on a budget. In this second module we'll be looking at options for low latency data storage with a particular emphasis on approximate data structures to deliver good results with high performance.
Module 3: Computing and presentation
There's not much point in low latency processing if it takes five minutes to load a dashboard. In this last module we will focus on delivering visualization and analysis of our low latency data sources to the user. In this section we'll be using modern streaming data techniques to provide high performance visualization and "online" approaches to model fitting to help the user derive data insight in real-time.
Bio: Byron has developed large scale data pipelines and processing systems across a variety of industries including Life Sciences, Advertising and Enterprise Software systems. In particular he focuses on distributed systems with low latency requirements for both read and write workloads. Trained as a Statistician with a focus on statistical computing he is also the author of Real-time Analytics published by John Wiley and Sons, which describes both the operational and computational aspects of delivering these systems at scale.