Abstract: In this workshop we will dive deep into what it takes to build and deliver an always-on "real-ish time" predictive analytics pipeline with Spark Structured Streaming.
The core focus of the workshop material will be on how to solve a common complex problem in which we have no labeled data in an unbounded timeseries dataset and need to understand the substructure of said chaos in order to apply common supervised and statistical modeling techniques to our data in a streaming fashion.
The example problem for the workshop will come from the telecommunications space but the skills you will leave with can be applied to almost any domain as long as you sprinkle in a little creativity and inject a bit of domain knowledge.
1. Structured Streaming experience with Apace Spark.
2. Understand how to use supervised modeling techniques on unsupervised data (caveat: requires some domain knowledge and the good ol human touch).
3. Have fun for 90 minutes.
Bio: Scott Haines is a full stack engineer with a current focus on real-time, highly available, trust-worthy analytics systems. He is currently working at Twilio (as Principal Engineer / Tech Lead of the Voice Insights team) where he helped drive spark adoption and streaming pipeline architectures. Prior to Twilio, he worked writing the backend java API’s for Yahoo Games, as well as the real-time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.