Putting AI models into production is notoriously difficult—and research reflects that. A NewVantage Partners survey found that the percentage of firms investing greater than $50 million in Big Data and AI initiatives is up to 64.8%, with a total of 98.8% of firms investing. Despite this, only 14.6% report that they have deployed AI capabilities into widespread production. So what’s actually holding enterprises back from realizing the full capabilities of their AI and machine learning investments?

For starters, this is relatively new territory. As an industry, we’ve had approximately 40-years of experience creating best practices and tools for storing, versioning, collaborating, securing, testing, and building software source code. Alternatively, we’ve only had about four years doing so for AI models. This is a significant gap—and one we need to bridge quickly in order for AI innovation to accelerate as it has over the last several years.

This isn’t anything we haven’t done before. The software industry has gone through multiple generations of tools and methodologies for software code governance, from configuration control, collaboration, test processes, and build processes, to code repositories, and metadata management. Now, we’re just starting to explore these issues for machine learning models, and like any new technology, we’re already experiencing the growing pains. Lack of AI governance is already hurting the productivity of data science teams and preventing the safe deployment and operation of models in production.


One of the main challenges in productionizing AI and ML models is accommodating the unique and sometimes complex environments in which they operate. Organizations also need to be cognizant of legal and compliance challenges that come along with implementing AI technologies. Just this week, The European Commission proposed measures that would ban certain high-risk AI applications in the EU, while others will face stricter constraints that threaten hefty fines for companies that don’t comply (Bloomberg). As AI continues to proliferate, we’ll see more legislation regarding governance, and this will need to be part of the equation in productionizing AI models.

Speaking with direct experience in the healthcare and life sciences space, adhering to strict industry standards and requirements becomes even more important when human lives and health are at stake. Fortunately, there are several widely accepted best practices for model governance, along with freely available tools available to apply them. These tools can help empower teams to go beyond experimentation to successfully deploy models, and I’ll cover what these are in the upcoming session, “Model Governance: A Checklist for Getting AI Safely to Production,” at ODSC Europe in June. 

One important component I’ll discuss is the need for better searchability and collaboration among the AI community. For example, storing modeling assets in a searchable catalog, including notebooks, datasets, resulting measurements, hyper-parameters, and other metadata. Enabling reproducibility and sharing of experiments across data science team members is another area that will be advantageous to those trying to get their projects to production-grade.

Another touchstone of AI governance is rigorous testing and retesting to ensure models behave the same in production as they do in research. Versioning models that have advanced beyond an experiment to a release candidate, testing those candidates for accuracy, bias, and stability, and validating models before launching in new geographies or populations, are several best practices that all organizations productionizing AI should be thinking about.

Security and compliance should be baked into a successful AI strategy from the beginning, and this is another important area I’ll discuss in my upcoming presentation. 

Role-based access control and an approval workflow for model release, and storing and providing all metadata needed for a full audit trail are just a few of the security measures that should be put into place before a model is considered production-ready. This is especially important in highly-regulated industries, such as healthcare and finance.

The pressure to get AI models into production is real—financially, competitively, operationally—and as the AI arms race continues, it’s only increasing. Despite this, enterprises must keep responsible AI practices in place to mitigate harmful and potentially dangerous biases and inaccuracies. To learn more about best practices and valuable tools to help safely govern AI and get models into production safely, register for ODSC Europe and be sure to check out my session!

David Talby, PhD, is the founder and CTO of John Snow Labs, the AI and NLP for healthcare company and developer of the Spark NLP library. He has dedicated his career to helping companies build real-world AI systems, turning recent scientific advances into products and services. He specializes in applying machine learning, deep learning, and natural language processing in healthcare.