Abstract: Note: A GPU-based cloud instance will be provided to each attendee as part of this event
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance. At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
More Info including GitHub and Docker Repos
Bio: Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.