Abstract: Project Thoth is focused on helping developers (including data scientists) in their daily tasks. One important requirement when creating code or experiments is reproducibility. Especially in ML nowadays it is important to allow others to re-run experiments in the same environment used by the creator. One of the first tasks is the selection of the dependencies (e.g. pandas for data exploration and manipulation or TensorFlow for training our model).
In this talk, you will discover ways that do not allow reproducibility and why.
To guarantee reproducibility, you must account for all dependencies with specific version numbers for direct and transitive dependencies, including all hashes used to verify the provenance of the packages for security reasons (check these docs to know more about security in software stacks). To be even more precise, the Python version, operating system, and hardware all influence the code’s behavior. You should share all of this information so other users can experience the same behavior and obtain similar results.
Project Thoth wants to help data scientists in dealing with this task in the easiest and smartest way possible so that they can focus on more important and challenging tasks. Today you are going to learn about jupyterlab-requirements library for dependency management.
Bio: Francesco is a senior data scientist/software engineer at Red Hat and he is part of the AI Centre of Excellence (AICoE) and Office of the CTO. He works with the Thoth team, where they created a recommender system to help developers (including data scientists) to focus on important problems offloading many tasks that are automated and performed by pipelines and bots. He has a passion for AI, software and space. He previously worked on a research project with the European Space Agency (ESA) and industrial partners mixing AI and space to create a recommender system for the design of satellites. He loves to read, travel and learn languages.