Abstract: Python's versatility, efficiency, reliability, and speed have rightfully established it as the default language for big data processing, exploratory data analysis, machine learning and cloud computing. An inevitable consequence of this is the development of a variety of open-source general purpose packages widely accessible to ML researchers for the development of their complex systems. In an era where ML-Ops was not even a thing, data scientists could quickly bundle together code to generate their ML-models or sophisticated ETL pipelines, leveraging many different packages through simple notebooks and deploy them, even, in production. A repercussion of this is the generation of thousands of lines of glue-code used to get data into and out of these general-purpose packages. However, as organisations mature, anti-patterns like these need to be effectively dealt with, else they can freeze systems to their peculiarities.
This talk is a story about how we dealt with this problem, particularly related to how we used pandas in many of our systems, by actively combating glue-code through wrapping pandas-I/O operations (which are indirectly using libraries like `pyarrow`, `fastparquet`, `s3fs`, `boto3` and `aws-cli`), into a common API under the name dynamic(i/o). By packaging up these libraries we were able to promote good practices, write reusable code that was easy to read, better structure our repos and promote a consistent template of work, define data expectations that would be validated by our wrapper, generate structured metrics picked up by our monitoring systems and even define interfaces between our developer teams, to enable a smoother communication of both data and knowledge.
(We intend to open-source this library on the day of the talk).
Bio: Christos has a PhD in Computing and has worked for many years as an ML consultant for many companies covering different domains (telcom, finance, gaming). For the last 3 years, he has been focussing on ML-Ops, defining and curating the ML-Development Lifecycle for the companies that hire him. He has recently embarked on a new adventure with Vortexa Ltd, working as a Lead ML Engineer and helping the company scale technically as it grows.