Abstract: As an industry, we have about forty years of experience forming best practices and tools for storing, versioning, collaborating, securing, testing, and building software source code – but only about four years doing so for AI models. This talk will catch you up on current best practices and freely available tools so that your team can go beyond experimentation to successfully deploy models.
The software industry has gone through multiple generations of tools and methodologies for software code governance: configuration control, collaboration, test processes, build processes, code repositories, and metadata management. In contrast, we’re just starting to explore these issues for machine learning models – which is already hurting the productivity of data science teams and preventing the safe deployment and operation of models in production.
This talk summarizes the current best practices for model governance, along with freely available tools you can use today to apply them. Covered topics include:
- Storing modeling assets in a searchable catalog – including notebooks, datasets, resulting measurements, hyper-parameters, and other metadata.
- Enabling reproducibility and sharing of experiments across data science team members.
- Versioning models that have advanced beyond an experiment to a release candidate.
- Testing models that are candidates for production use for accuracy, bias, and stability.
- Validating models before launching in new geographies or populations.
- Building and versioning the inference services of a model, including all underlying libraries and dependencies, as part of a standard CI/CD pipeline.
- The ability to release a model: roll out, roll back, or have multiple live versions
- Security, role-based access control, and an approval workflow for model release
- Storing and providing all metadata needed for a full audit trail
This talk is intended for software & data science leaders who need to safely build, deploy and operate AI models today.
Bio: David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a Ph.D. in computer science and master’s degrees in both computer science and business administration.