From Chaos to Control: Mastering Machine Learning Reproducibility at Scale


Machine learning workflows are not linear, where experimentation is an iterative & repetitive to and fro process between different components. What this often involves is experimentation with different data labeling techniques, data cleaning, preprocessing and feature selection methods during model training, just to arrive at an accurate model.

Quality ML at scale is only possible when we can reproduce a specific iteration of the ML experiment–and this is where data is key. This means capturing the version of training data, ML code and model artifacts at each iteration is mandatory. However, to efficiently version ML experiments without duplicating code, data and models, data versioning tools are critical. Open-source tools like lakeFS make it possible to version all components of ML experiments without the need to keep multiple copies, and as an added benefit, save you storage costs as well.

In this one-hour workshop, you'll achieve the following:

Master ML Reproducibility: Gain practical experience to achieve full reproducibility for your ML experiments. Learn how to track changes to data, code, and models, allowing you to easily revisit and refine past experiments. This ensures you can recreate past successes and identify potential issues.
Enhance Your Existing Stack: Discover how to integrate ML reproducibility seamlessly into your current ML experimentation tools, such as MLflow. Leverage open-source software to create a holistic version control experience, streamlining your workflow and ensuring the reliability of your ML pipelines.


Amit heads the solution architecture group at Treeverse, the company behind lakeFS, an open-source platform that delivers a git-like experience to object-storage based data lakes. Amit has 30+ years of experience as a technologist working with Fortune 100 companies as well as start-ups. Designing and implementing technical solutions for complicated business problems. As an entrepreneur, he launched a cloud offering to provide Data Warehouse as a Service. Amit holds a Master’s certificate in Project Management from George Washington University and a bachelor’s degree in Computer Science and Technology from Indian Institute of Technology (IIT), India. He is the inventor of the patent: System and Method for Managing and Controlling Data.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google