
Abstract: The speed in which intelligent decision making can take place is changing the very fabric of the modern industry. From autonomous routing decisions for real-time traffic avoidance to intelligent integrated systems for customer service and assisted analysis or forecasting, the time to correct decision can be a huge differentiator.
During the course of this workshop, you will learn how to harness the power of Apache Spark to build up a data ingestion pipeline that can process and learn from streaming data. These learnings can then be applied to make simple decisions using SparkSQL in a parallel streaming system.
Skills Acquired:
1. How to write Apache Spark Structured Streaming applications
2. How to architect collaborative streaming applications that can run 24/7/365
3. How to model event data / best practices for making your data work for you.
The workshop material has been created based on the learnings of building mission-critical real-time analytics systems at Twilio.
● Docker;
● Apache Spark;
● Apache Zeppelin;
● Kafka;
https://bit.ly/odsc-2020-streaming-di
https://drive.google.com/open?id=1Ay8tngAQHqNmCxjQ-CMZ8SliMW6ItrGw
Bio: Scott Haines is a full-stack engineer with a current focus on real-time, highly available, trustworthy analytics systems. He is currently working at Twilio as a Software Architect and previously worked as Principal Software Engineer on the Voice Insights team, where he helped drive spark adoption and streaming pipeline architectures, and build out a massive stream-processing platform.
Prior to Twilio, he worked on writing the backend Java APIs for Yahoo! Games, as well as the real-time game ranking/rating engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo! working for Flurry Analytics where he wrote the alerts/notifications system for mobile.