You probably have known the DevOps methodology which is very commonly used in the software industry, which basically deploy your infrastructure programmatically, do continuous integration, test your code automatically, deploy the software continuously in production. Overall the software is constantly being updated with new features and then you have the ability to make changes in an incremental way.



Data systems are becoming more and more important as organizations are using that data to make things. DataOps (data operations) is really inheriting from the history of DevOps and applying it to data systems so that you’re constantly making improvements.

MLOps further inherits the history of this evolution of being able to rapidly change things. MLOps is more like a combination of DevOps, DataOps model improvement, and framing business requirements.

MLOps Workflows

There are a couple of different types of MLOps workflows, the first type is light and static. Because the model is kinda small and easy to load, you would build the workflow even with a cloud native build system, say Google.

DevOps
  โ†‘
  | Github
  |
  | Google Cloud build
  |
  | Google App Engine
  |
  | Prediction services or API
  โ†“
 MLOps 

In more sophisticated workflows, there’s really a data-centric type approach (you don’t even do any modeling) v.s. a model-centric type approach:

Data-centric
  โ†‘
  | AutoML, Data labeling
  | Data management, Pipeline, Containers
  | Data drift detection
  | Feature stores
  | Model registry, Explainable AI
  | Model monitoring
  | Prediction, Experiment tracking
  โ†“
Model-centric

Data drift is the fact that in real world, data is constantly changing, you may not actually have the information you think you have. This is why retraining the model constantly in many scenarios is the best situation to address this concept of the ever-changing underlying data structure.

Data Warehouse v.s. Feature Store

Data warehouse and feature store have different characteristics.

Low quality data
  โ†‘
  | Streaming data           Batch data
  |               \         /
  |                Data lake
  |               /         \
  |             ETL         Featurization
  |            /                \
  |  Data Warehouse           Feature Store
  โ†“
High quality data

Data lake could be a combination of streaming data, batch data, unstructured data, structured data. You really have to do a lot of work to clean this up:

High-quality inputs forSave data inData used for
FeaturizationMachine learningFeature StoreTraining, prediction, auditing, automatic correlations, etc.
ETLBusiness intelligenceData WarehouseReporting, dashboards, business intelligence, etc.


Automation Hierarchy

The needs in MLOps could also be presented in a hierarchy, it is like a pyramid and that at the bottom is DevOps. From the bottom up, this really is going through step-by-step and automating the stacks: from the software stack, to the data stack, to the platform stack, to ultimately the MLOps stack.

MLOpsBusiness ROI, Problem framing, Forecasting, Predictions, Pattern discovery
Automation
(platform stack)
Feature stores, Model serving, Experiment tracking, Data drift detection
DataOps
(data stack)
Data management platforms, Data jobs and task, Serverless query and visualization
DevOps
(software stack)
Infrastructure-as-Code, Continuous delivery, Build system

When you are doing MLOps, even if you don’t use cloud MLOps platform, you’re going to most likely use something in the cloud:

  • Cloud development environments: CloudShell, Cloud IDE, Storage query tools and dashboard, Jupyter Notebook, etc.
  • Elastic compute and storage system: near infinite disk I/O, storage and CPU, GPU, etc.
  • Serverless and containerized managed services
  • Cloud integrated tools and SDKs
  • 3rd party vendor integration

Microservices

You may think about a microservice as the concept of logic that you push live. It is small and it’s reusable code. It’ll go through the steps of linting, testing, compiling, containerizing, and then finally deploying through Infrastructure-as-code. You can create any number of environments, for testing, for staging, for production.

MLOps Maturity Models

All major vendors have the concept of an MLOps maturity model, which means there are several different phases of going from a place where you can barely get things into production, all the way to where there are very sophisticated system that is ready for end-to-end automation.

First let us see how AWS platform thinks about MLOps maturity:

InitialEstablish the experimentation environment
RepeatableStandardize code repositories and ML solution deployment
ReliableIntroduce testing, monitoring, and multi-account deployment
ScalableTemplatize and productionize multipl ML solutions

Microsoft also has its own maturity model, which is quite straight forward:

  • No MLOps
  • DevOps but no MLOps
  • Automated training
  • Automated model deployment
  • Full MLOps automated operations

Google’s view of the world is like this:

  • MLOps level 0: manual process
  • MLOps level 1: ML pipeline automation
  • MLOps level 2: CI/CD pipeline automation

The concept of continuous integration and continuous delivery through automation is really the key with three of these vendors. You will need to have a production-first mindset.

One of the core concepts of DevOps is that you have the ability to automatically test your code. Continuous integration is the ability to know whether your code works in production environment:

First you need to replicate dev environment inside of the production environment. Second you’re going to need to have some form of testing. Finally in terms of the deployment, you’re going to have to put all these together and then deploy them into production environment.

Simulations vs MLOps Experiment Tracking

The classic simulation and the MLSOps experiment tracking have a lot in common:

You don’t always need to use machine learning to solve problems. In fact, there are many other techniques that are useful, say simulations. The most classic optimization problems that can use simulations include: Traveling salesman problem (TSP), Roulette wheels, Poker, VC portfolio. The idea behind simulation is that you run multiple times through heuristics or some greedy algorithms and solve for either maximum revenue or minimum costs.

On the other hand, machine learning operations experiment tracking, it’s a very similar process. You look through experiments you want to minimize error across a bunch of different experiments or optimize for some metric, you potentially want to do a bunch of hyper parameter tuning and see which one is the best tuning job.



My Certificate

For more on Introduction to MLOps, please refer to the wonderful course here https://www.coursera.org/learn/devops-dataops-mlops-duke


I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!