Abstract: Traditional Python Data Science Libraries such as Pandas and Scikit-Learn are limited by memory - they load the whole dataset into working memory before any data analysis. In the era of Big Data, its not practical be constantly moving and loading large volumes of data. But what if we still want to work with Pandas like code and DataFrames? This workshop shows you how to create end to end Machine Learning pipelines in Python on large volumes of data without the data ever leaving the database. This results in not just faster load times but also faster model training and deployment.
This workshop will use open source tools VerticaPy and Jupyter notebooks to showcase examples of Data Ingestion, Preparation, Analysis, and Modeling in a Pandas-like way. The data will be stored in a free community edition of Vertica which will do all the heavy computations, utilizing its MPP architecture for blazingly fast speed and results. We will cover different modeling and data prep techniques, and using Vertica's AutoML for model selection. Vertica helps bring the analytics to the data, rather than the other way around.
Bio: Pranjal Singh is a Data Science Solutions Architect at Vertica with a focus on Machine Learning. Pranjal works with customers to understand their business needs and data to design and implement solutions using Vertica. He received his Bachelor's degree in Data Science from Northeastern University in Boston, MA with a minor in Mathematics. Pranjal has a passion for problem solving using Predictive Analytics, and helping organizations make better decisions with data. He's an avid sports fan, with a special interest in fantasy sports, analytics, and advanced metrics.