Abstract: This talk will tell a story of how changing business objectives are driving interest in production-level code; what software principles data engineers and data scientists should consider applying to their code to make it easier to deploy into the production environment; and, how they can use an open source Python library, called Kedro, to simplify their workflow.
We will deep dive into Kedro internals, focusing on how configuration, nodes, and modular pipelines can simplify and scale your data science workflow. We will discuss how the framework can plug into existing toolkits, such as MLFlow, DataBricks, and DBT. In terms of new functionality, we will showcase Kedro's experiment tracking component. Finally, we will present an impactful real-world case study and explain how joining the Linux Foundation will benefit the wider community!
We hope the audience of data engineers and data scientists will not only walk out of the session understanding the importance of developing maintainable and modular Data Science code, but also have the fundamental understanding to start working with Kedro immediately.
Bio: Sanjay Hariharan is a Principal Data Scientist with QuantumBlack Labs, AI by McKinsey, where he serves as a technical Data Science leader. His expertise lies in the development, assetization, and deployment of analytics solutions for Life Sciences and Healthcare problems, including measuring drug efficacy, improving patient access, segmenting physician populations, and expanding indications. These problems combine domain expertise broad applications of Machine Learning and Optimization. As part of his previous role, Sanjay managed the data science workstream for many client facing engagements, working in industries including manufacturing, retail, public sector, and healthcare. Sanjay holds a B.A. in Mathematics from the University of Pennsylvania and an M.S. in Statistical Science from Duke University. In his free time, he enjoys playing tennis, traveling, and exploring New York City!