Data Science with Spark: Beyond the Basics
Data Science with Spark: Beyond the Basics


This class is aimed at practitioners who are already familiar with the basics of Apache Spark and are have tried the machine learning samples in the Spark docs or some of the ML tutorial examples online. We'll start from there and work to advance our knowledge of Spark ML. After briefly reviewing some fundamentals of Spark, DataFrames and Spark ML APIs, the class will then explore: - Performing feature preparation/transformation beyond the Spark built-in tools - "Borrowing" functionality from scikit-learn to help us pre-process features in Spark - Converting DataFrame data to access legacy (RDD) mllib features that are not yet exposed in the SparkML DataFrame API - Implementing data prep operations as reusable components by implementing new Transformers and Estimators - Adding a reusable parallel machine learning algorithm to Spark, by creating our own Estimator and Model classes - Sharing our reusable components with our Python data science colleagues by creating Python wrappers like those built into Spark


Adam Breindel consults and teaches widely on Apache Spark and other technologies. Adam's experience includes work with banks on neural-net fraud detection, streaming analytics, cluster management code, and web apps, as well as development at a variety of startup and established companies in the travel, productivity, and entertainment industries. He is excited by the way that Spark and other modern big-data tech remove so many old obstacles to system design and make it possible to explore new categories of interesting, fun, hard problems.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google