
Abstract: In this training session you will get hands on experience with developing neural network using Intel BigDL and Analytics Zoo on Apache Spark. You will learn how you can use Spark DataFrames and build deep learning pipelines through implementing some practical examples.
Target Audience: AI developers and aspiring data scientists who are Experienced in Python and Spark. Also big data and analytics professionals interested in neural networks
Prerequisites:
• Experience in Python programming
• Entry level knowledge of Apache Spark
• Basic knowledge of deep learning and techniques in deep learning
Training outline:
Introduction to Deep Learning on Spark, BigDL and Analytics Zoo - 25 minutes
We will begin with a brief introduction to Apache Spark and the Machine Learning/Deep Learning ecosystem around Spark. Then we will introduce Intel BigDL and Analytics Zoo, two deep learning libraries for Apache Spark. We will go into the architectural details of how distributed training happens in BigDL. We will cover the model training process, including how the model, weights and gradients are distributed, calculated, updated and shared with Apache Spark.
Setting Up Sample Environment - 10 minutes
The instructors will highlight the major components of our demonstration environment, including the dataset, docker container and example code along with the public location of these resources and how to set them up.
Exercise 1 - Quick and simple image recognition use case with BigDL - 45 minutes
We will work through a simple image recognition use case that trains a CNN. The goal of this exercise is a simple introduction in to using BigDL with image datasets. Participants will get exposure to:
• How to use read images into Spark data frames
• Building transformation pipelines for images with Spark
• How to train a deep learning model using estimators
Exercise 2 - Transfer Learning for Image Classification Models - 45 minutes
Participants will get exposure to:
• How to build a pipeline in Spark to preprocess images
• How to import a model a trained model from other frameworks like TensorFlow
• How to implement transfer learning on the imported model with the preprocessed images
Quick break: Answer questions or help out anyone who is having trouble - 10 minutes
Exercise 3 - Anomaly Detection or Recommendation system with Intel Analytics Zoo - 30 minutes
In this exercise we will show participants
• How to build an initial pipeline for feature transformation
• How to Build a recommendation model in BigDL/Analytics Zoo
• How to perform training and inference for this use case
Exercise 4 - Model Serving - 15 minutes
In this exercise we will show participants to how to build an end to end pipeline and put their model to production. They will get exposure to:
• Model serving using POJO API
• Integration into web services and streaming services like Kafka for model inference
• Distributed model inference
Practical Knowledge - Discussion of practical experience using Spark and Hadoop for machine learning and deep learning projects - 15 minutes
We will have a discussion on the following topics:
• Spark parameters and how to set them: How to allocate the right amount of executors, cores and memory
• Performance Monitoring
• Tensorboard with BigDL
• Collaboration and reproducing experiments with a data science workbench tool.
Wrapping up / Questions - 15 minutes
Bio: Andrew is a data scientist at Dell where he explores how machine learning and deep learning techniques are used in spark. His experience includes time series analysis and prediction of pharmaceutical drug sales and usage, real estate valuation using machine learning, and medical data classification using deep learning. Andrew’s interests involve applying machine learning and deep learning to solve new problems and improve old solutions.