AI is no longer exclusive to digital native companies like Amazon, Netflix or Uber. Dow Chemical Company recently used machine learning to accelerate the R&D process for polyurethane formulations by 200,000x – from 2-3 months to just 30 seconds. And Dow is not alone. A recent index from Deloitte shows how companies across industries are operationalizing AI to increase business value. Unsurprisingly, Gartner predicts that more than 75% of organizations will transition from testing AI technologies to operationalizing them by the end of 2024 — which is where the real challenges begin.
AI is most valuable when operationalized at scale. For business leaders looking to maximize business value using AI, scale refers to how deeply and broadly AI is integrated into an organization’s core product or service and business processes.
Unfortunately, scaling AI in this sense is not easy. Putting one or two AI models into production is very different from running an entire company or product on AI. And since AI scales, problems can (and often do) scale. For example, a financial company lost $20,000 in 10 minutes because one of its machine learning models started misbehaving. With no view of the core problem — and no way to even identify which of its models malfunctioned — the company had no choice but to pull the plug. All models were rolled back to many previous iterations, which severely degraded performance and wiped out weeks of effort.
Organizations that take AI seriously have begun to adopt a new discipline loosely defined as “MLOps” or Machine Learning Operations. MLOps is committed to establishing best practices and tools to enable rapid, secure and efficient development and operationalization of AI. When implemented properly, MLOps can significantly accelerate the speed to market. Implementing MLOps requires investing time and resources in three key areas: processes, people and tools.
Processes: Standardize how you build and operationalize models.
Building the models and algorithms that power AI is a creative process that requires constant iteration and refinement. Data scientists prepare the data, create functions, train the model, tune the parameters and validate that it works. When the model is ready to be deployed, software engineers and IT operationalize the model, continuously monitoring output and performance to ensure the model works robustly in production. Finally, a governance team must oversee the entire process to ensure that the AI model being built is sound from an ethical and compliance standpoint.
Given the complexity involved, standardization is the first step to make AI scale: a way to build models in a repeatable way and a well-defined process to operationalize them. In this way, creating AI is very similar to manufacturing: the first widget a company creates is always custom made; scaling up production to produce many widgets and then continuously optimizing their design is where a repeatable development and manufacturing process becomes essential. But with AI, many companies struggle with this process.
It’s easy to see why. Custom processes are (by nature) fraught with inefficiency. Yet many organizations fall into the trap of reinventing the wheel every time they operationalize a model. In the case of the financial company discussed above, the lack of a repeatable way to monitor model performance caused expensive and slow-to-fix errors. One-off processes like these can cause major problems once research models are put into production.
MLOps’ process standardization piece helps streamline model development, implementation, and refinement, enabling teams to build AI capabilities quickly but responsibly.
To standardize, organizations must collectively define a “recommended” process for AI development and operationalization, and provide tools to support the adoption of that process. For example, the organization can develop a standard set of libraries to validate AI models, encouraging consistent testing and validation. Standardization at transition points in the AI lifecycle (e.g. from data science to IT) is particularly important, as it allows different teams to work independently and focus on their core competencies without worrying about unexpected, disruptive changes.
MLOps tools such as Model Catalogs and Feature Stores can support this standardization.
People: Let teams focus on what they do best.
AI development used to be the responsibility of an AI “data science” team, but building AI at scale cannot be produced by a single team – it requires a variety of unique skills, and very few individuals own them all. For example, a data scientist creates algorithmic models that can accurately and consistently predict behavior, while an ML engineer optimizes, packages and integrates research models into products and continuously monitors their quality. One person will rarely perform both roles well. Compliance, governance and risk require an even clearer set of skills. As AI scales, more and more expertise is needed.
To successfully scale AI, business leaders must build and strengthen specialized, dedicated teams that can focus on high-value strategic priorities that only their team can achieve. Let data scientists do data science; let engineers do the engineering; let IT focus on infrastructure.
Two team structures have emerged as organizations scale their AI footprint. First, there is the “pod model”, where AI product development is performed by a small team made up of a data scientist, data engineer, and ML or software engineer. The second, the “Center of Excellence” or COE model, is when the organization “pools” all data science experts who are then assigned to different product teams depending on requirements and resource availability. Both approaches have been successfully implemented and have different advantages and disadvantages. The pod model is best suited for fast execution, but can lead to knowledge silos, while the COE model has the opposite compromise. Unlike data science and IT, governance teams are most effective when they are outside the pods and COEs.
Tools: Choose tools that support creativity, speed and security.
Finally, we come to resources. Since trying to standardize the production of AI and ML is a relatively new project, the ecosystem of data science and machine learning tools is highly fragmented: to build a single model, a data scientist works with about a dozen different, highly specialized tools and stitches. Their together. On the other hand, IT or governance uses a very different set of tools, and these different toolchains don’t talk to each other easily. As a result, it’s easy to do one-time work, but building a robust, repeatable workflow is difficult.
Ultimately, this limits the speed at which AI can scale in an organization. A scattered set of tools can take a long time to market and build AI products without adequate supervision.
But as AI scales within an organization, collaboration becomes more fundamental to success. Faster iteration requires ongoing stakeholder contributions throughout the model lifecycle, and finding the right tool or platform is an essential step. Tools and platforms that support AI at scale should support creativity, speed and security. Without the right tools, a company will struggle to maintain them all at once.
When choosing MLOps tools for your organization, a leader should consider:
More often than not, there will already be some existing AI infrastructure. To reduce the friction of adopting a new tool, choose one that will work with the existing ecosystem. On the production side, model services should work with DevOps tools already approved by IT (e.g. logging, monitoring, governance tools). Ensure that new tools work with the existing IT ecosystem or can be easily expanded to provide this support. For organizations moving from on-premise infrastructure to the cloud, you need to find tools that work in a hybrid environment, as cloud migration often takes several years.
Whether it is friendly to both data science and IT.
Tools to scale AI have three primary user groups: the data scientists who build models, the IT teams who maintain the AI infrastructure and run AI models in production, and the governance teams who oversee the use of models in regulated scenarios.
Of these, data science and IT often have opposing needs. In order for data scientists to do their best work, a platform has to get out of the way – giving them flexibility to use libraries of their choice and work independently without constant IT or technical support. On the other hand, IT needs a platform that enforces constraints and ensures that production deployments follow predefined and IT-approved paths. An ideal MLOps platform can do both. Often this challenge is solved by choosing one platform to build models and another platform to operationalize them.
As described above, AI is a multi-stakeholder initiative. As a result, an MLOps tool should make it easy for data scientists to collaborate with engineers and vice versa, and for both personas to work with governance and compliance. In the year of the Great Retirement, knowledge sharing and ensuring business continuity in light of employee turnover are crucial. In AI product development, while the speed of collaboration between data science and IT determines the speed to market, governance collaboration ensures that the product being built is one that needs to be built in the first place.
With AI and ML, governance becomes much more critical than with other applications. AI Governance is not just limited to security or access control in an application. It is responsible for ensuring that an application aligns with an organization’s code of ethics, that the application is not biased towards a protected group, and that decisions made by the AI application can be trusted. As a result, it becomes essential for any MLOps tool to adopt responsible and ethical AI practices, including capabilities such as “pre-launch” responsible AI use checklists, model documentation, and governance workflows.
† † †
In the race to scale AI and realize more business value through predictive technology, leaders are always looking for ways to lead the way. AI shortcuts such as pre-trained models and licensed APIs can be valuable in their own right, but scaling AI for maximum ROI requires organizations to focus on how they operationalize AI. The companies with the best models or smartest data scientists aren’t necessarily the best; success goes to the companies that can smartly deploy and scale to unlock the full potential of AI.
This post How to scale AI in your organization was original published at “https://hbr.org/2022/03/how-to-scale-ai-in-your-organization”