Abstract: The speakers offer an introduction to Apache Kafka, covering basic concepts, the data lifecycle, features, and APIs. The speaker then walks you through at a high level using Kafka’s data pipeline, stream processing, database replication, and CDC, explains how to evaluate if a Kafka cluster is healthy and how to analyze the performance of a Kafka cluster, and shares LinkedIn’s operation tools for Kafka.
LinkedIn has been using Kafka and Samza to build a stream processing platform that processes 1 trillion messages per day. With the above introduction to Kafka, we will first share in-depth knowledge on how we use Kafka to support realtime data ingestion, database change capture, and data re-processing. We will also share how we use Samza to process the data from Kafka to produce the realtime analytical results. Finally, we will talk about how the results are sent to our online serving stores.
Bio: Prateek Maheshwari is an Apache Samza committer and a Staff Software Engineer at LinkedIn. He's passionate about building scalable, performant and easy to use systems that empower developers to create awesome products. He has recently been working on Samza's new High Level programming API that offers simple and powerful abstractions for creating complex real-time data processing pipelines.
Prateek joined LinkedIn in 2013 with a Master's degree in Computer Science from UT Austin and in Physics from BITS Pilani. Before joining the streams infrastructure team, he worked on data analytics, frontend and backend services for new member sign-ups, and service infrastructure at LinkedIn.