Dynamicio (a pandas I/O wrapper); Why you Should Start your ML-Ops Journey with Wrapping your I/O

Abstract: 

Python's versatility, efficiency, reliability, and speed have rightfully established it as the default language for big data processing, exploratory data analysis, machine learning and cloud computing. An inevitable consequence of this is the development of a variety of open-source general purpose packages widely accessible to ML researchers for the development of their complex systems. In an era where ML-Ops was not even a thing, data scientists could quickly bundle together code to generate their ML-models or sophisticated ETL pipelines, leveraging many different packages through simple notebooks and deploy them, even, in production. A repercussion of this is the generation of thousands of lines of glue-code used to get data into and out of these general-purpose packages. However, as organisations mature, anti-patterns like these need to be effectively dealt with, else they can freeze systems to their peculiarities.

This talk is a story about how we dealt with this problem, particularly related to how we used pandas in many of our systems, by actively combating glue-code through wrapping pandas-I/O operations (which are indirectly using libraries like `pyarrow`, `fastparquet`, `s3fs`, `boto3` and `aws-cli`), into a common API under the name dynamic(i/o). By packaging up these libraries we were able to promote good practices, write reusable code that was easy to read, better structure our repos and promote a consistent template of work, define data expectations that would be validated by our wrapper, generate structured metrics picked up by our monitoring systems and even define interfaces between our developer teams, to enable a smoother communication of both data and knowledge.

(We intend to open-source this library on the day of the talk).

Bio: 

Christos has a PhD in Computing and has worked for many years as an ML consultant for many companies covering different domains (telcom, finance, gaming). For the last 3 years, he has been focussing on ML-Ops, defining and curating the ML-Development Lifecycle for the companies that hire him. He has recently embarked on a new adventure with Vortexa Ltd, working as a Lead ML Engineer and helping the company scale technically as it grows.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google