Abstract: The speakers offer an introduction to Apache Kafka, covering basic concepts, the data lifecycle, features, and APIs. The speaker then walks you through at a high level using Kafka’s data pipeline, stream processing, database replication, and CDC, explains how to evaluate if a Kafka cluster is healthy and how to analyze the performance of a Kafka cluster, and shares LinkedIn’s operation tools for Kafka.
LinkedIn has been using Kafka and Samza to build a stream processing platform that processes 1 trillion messages per day. With the above introduction to Kafka, we will first share in-depth knowledge on how we use Kafka to support realtime data ingestion, database change capture, and data re-processing. We will also share how we use Samza to process the data from Kafka to produce the realtime analytical results. Finally, we will talk about how the results are sent to our online serving stores.
Bio: Dong Lin is a Sr. Software Engineer at Linkedin where he works on all things Kafka, the Apache distributed stream processing platform. As part of his work he helps support Kafka's operation at LinkedIn, develop new features in Apache Kafka to better serve its users, and participate in the open source community discussion on the evolution of Kafka's design and implementation. Dong completed his PhD from U. Penn in 2015 and has been at Linkedin since then. Dong is interested in distributed systems and various aspects of stream processing and actively contributes to the open source community and forums.