Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds


Cloud storage brings great flexibility in management and cost-efficiency to data scientists, but also introduces new challenges related to data accessibility and data locality for machine learning applications. For instance, when the input data is stored in a remote cloud storage like AWS S3 or Azure blob storage, direct data access is often slow and expensive; but manually moving data to the training clusters can be time-consuming, complicated and often require data engineering or ETL pipelines.

This session is designed for data scientists or data engineers who work with remote and possibly multiple data sources in hybrid or multi-cloud environments. We will guide the audience to use Alluxio to greatly simplify the data preparation in these environments, covering the following topics:
- How to setup and create POSIX endpoint for Alluxio service to unify the file system data access to S3, HDFS and Azure blob storage;
- How to run Apache Spark to read input from and write output to remote storage with Alluxio as the distributed data caching layer;
- How to run TensorFlow to train models backed by accessing remote input data like access local file system.


Bin Fan is the PMC maintainer of Alluxio open source. Prior to joining Alluxio as a founding engineer, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google