Abstract: The PyData ecosystem has grown to millions of data science users, who appreciate its ease of use, consistent syntax, and breadth of features. Traditionally, PyData frameworks were only executable on CPUs, making it difficult for users to take advantage of the increasingly-powerful GPUs that have already revolutionized deep learning and related fields. In this talk, we'll introduce RAPIDS, an open source framework that brings transparent GPU backends to popular Python APIs, such as those from Pandas, scikit-learn, and NetworkX. We'll show how you can port a huge range of existing workloads to GPU in a matter of minutes and get speedups on the order of 40x or more for common workloads.
The talk will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystem.
Bio: Corey Nolet is a Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA, where he focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed. Prior to working at NVIDIA, Corey spent over a decade building massive-scale analytics & data science platforms for HPC environments for the defense industry. Corey currently holds Bs. & Ms. degrees in Computer Science and is pursuing his Ph.D. in the same discipline, focused on scaling machine learning algorithms in distributed architectures. Corey has a passion for using data to make better sense of the world.