Abstract: The best machine learning, and advanced analytics projects are often stopped when it comes time to move into large scale production, preventing them from ever impacting the business in a meaningful way. Hundreds of hours of work may never get put to use.
Python is rapidly becoming the language of choice for scientists and researchers of many types to build, test, train and score models. But when data science models need to go into production, challenges of performance and scale can be a huge roadblock.
By combining a Python application with an underlying massively parallel (MPP) database, Python users can achieve a simplified path to production. An MPP database also allows you to do data preparation and data analysis at far greater speeds, accelerating development and testing as well as production performance. It also allows greater numbers of concurrent jobs to run, while also continuously loading data for IoT or other streaming use cases. Analyze data in the database where it sits, rather than first moving it to another framework, then analyzing it, then moving the results, taking multiple performance hits from both CPU and IO for every move and transformation.
In this talk, you will learn about combination architectures that can get your work into production, shorten development time, and provide the performance and scale advantages of an MPP database with the convenience and power of Python. Real use case examples use the open source Vertica-Python project created by Uber with contributions from Twitter, Palantir, Etsy, Kayak and Gooddata.
Bio: Paige Roberts (@RobertsPaige) has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 24 years. She has built data engineering pipelines and architectures, documented and tested open source analytics implementations, spun up Hadoop clusters, picked the brains of stars in data analytics, worked with different industries, and questioned a lot of assumptions. She's worked for companies like Data Junction, Pervasive, Bloor Group, Hortonworks, Syncsort, and Vertica. Now, she promotes understanding of Vertica, distributed data processing, open source, high scale data engineering architecture, and how the analytics revolution is changing the world.