#1 Machine Learning Engineering for Production (MLOps) Specialization [Course 1, Week 1, Lesson 1]

DeepLearningAI
20 Apr 202209:43

Summary

TLDRThis video introduces the practical challenges of deploying machine learning models in production environments. Andrew Ng discusses the importance of moving beyond just training models to understanding the full lifecycle of machine learning projects. He explains the significance of production deployments, using real-world examples like automated visual defect inspection in manufacturing. Ng highlights common challenges like data drift and the 'proof of concept to production gap,' emphasizing that deploying machine learning systems requires much more than just model training. The course focuses on practical skills for building production-ready systems that create real value.

Takeaways

  • 🚀 Deploying machine learning models into production maximizes their value and is essential for many roles in machine learning.
  • 👨‍🏫 The course is led by Andrew Ng and Robert Crowe, providing practical, hands-on skills for deploying machine learning models.
  • 📱 Example: A computer vision model inspecting smartphones for defects shows how machine learning models are applied in real-world manufacturing.
  • 🏭 Deployment of machine learning in factories often involves edge devices to avoid downtime when internet access is unavailable.
  • 🌩️ Prediction servers can be deployed in the cloud or at the edge, depending on the application needs and infrastructure limitations.
  • ⚠️ Real-world challenges like concept drift or data drift, where the production environment differs from training data, can impact model performance.
  • 💻 Machine learning engineers need to handle various problems, including data shifts, to ensure models perform well in production.
  • 🔧 Moving from a proof of concept to production involves more than just the machine learning code—much of the work is in other system components.
  • 📊 Beyond model training, data collection, verification, feature extraction, and system monitoring are critical for a full production deployment.
  • 📆 A systematic approach to managing the life cycle of machine learning projects is crucial for successful deployment in real-world environments.

Q & A

  • What is the main focus of the course 'Machine Learning Engineering for Production'?

    -The course focuses on teaching learners how to not only train machine learning models but also deploy them in production environments. It covers practical, hands-on skills and techniques for managing the full life cycle of machine learning projects.

  • Why is it important to deploy machine learning models in production?

    -Deploying machine learning models in production is crucial for creating real-world value. While training models is important, the full potential of machine learning is realized when these models are put into production and can be used to make automated decisions.

  • What is an edge device in the context of machine learning deployment?

    -An edge device is a device that operates at the site where data is generated, such as a factory in the example of smartphone manufacturing. It can run machine learning models to make real-time decisions, even if the internet connection to the cloud is unavailable.

  • What is the role of the prediction server in a machine learning production environment?

    -The prediction server receives API calls with input data (like images of smartphones) and processes the data using a machine learning model to return a decision, such as whether a phone is defective or not. This enables real-time decision-making in production.

  • What is concept drift or data drift in machine learning deployment?

    -Concept drift or data drift refers to changes in the data distribution over time, such as differences in lighting conditions that affect the quality of images in production. These changes can cause machine learning models to perform poorly if not addressed.

  • Why can it take months to go from proof of concept (PoC) to production deployment of a machine learning model?

    -Moving from PoC to production can take months because there is a significant amount of work involved beyond just training the model. This includes setting up data pipelines, API interfaces, and monitoring systems, among other components, to ensure a reliable production environment.

  • What is the proof of concept (PoC) to production gap in machine learning?

    -The PoC to production gap refers to the difference between having a working machine learning model in a development environment (like a Jupyter notebook) and deploying it in a real-world production system. This gap exists due to the complexity of integrating the model into a larger system.

  • How much of the overall code in a machine learning project is typically dedicated to the machine learning model itself?

    -In many machine learning projects, only 5-10% (or even less) of the overall code is dedicated to the machine learning model itself. The majority of the code focuses on other aspects of the system, such as data management, monitoring, and deployment.

  • What are some components beyond the machine learning model that are necessary for a successful deployment?

    -Components beyond the machine learning model include data collection, data verification, feature extraction, monitoring, and analysis tools to ensure the model works effectively in production and can adapt to real-world changes.

  • Why is it important to consider the full life cycle of a machine learning project?

    -Considering the full life cycle of a machine learning project is essential because the model's performance in production depends on more than just training. It includes deployment, monitoring, and adapting to changes over time, ensuring long-term success and value.

Outlines

00:00

👋 Introduction to Machine Learning Engineering for Production

In this opening, Andrew Ng introduces the specialization 'Machine Learning Engineering for Production.' He highlights how learning to deploy machine learning models into production is crucial for creating real-world value. This skill set is essential not only for maximizing the usefulness of models but also for passing job interviews, as many interviewers ask about deployment experience. Andrew also introduces Robert Crowe, a Google expert who will teach parts of the specialization. By the end of the course, learners will understand the full lifecycle of a machine learning project, from training to deployment.

05:00

📱 Example: Defect Detection in Smartphones Using Computer Vision

Andrew presents a practical example where computer vision is used in a factory to detect defects in smartphones, such as scratches. He explains the concept of edge devices and their role in processing images locally within a factory for quality control purposes. The example demonstrates the need for deploying machine learning models in real-world environments, where images are captured, analyzed, and decisions are made using a prediction server. Andrew emphasizes that building the model is only one part of the process—deploying it with API interfaces and other software is equally critical to ensuring its operational success.

⚙️ Practical Deployment Challenges: Concept Drift and Real-World Issues

Andrew discusses the challenges of deploying machine learning models in production, using concept drift as an example. In a factory setting, changes in lighting or other factors can cause the model’s performance to degrade, leading to issues like data drift. These challenges are often unexpected and require machine learning engineers to adapt their models to new conditions. Andrew underscores the importance of not only developing a good model in a controlled environment, such as a Jupyter notebook, but also ensuring that the model performs well in unpredictable production settings. This problem-solving mindset is key to making machine learning systems valuable in practice.

🛠️ Beyond Machine Learning Code: The Complexity of Deployment

This paragraph addresses the complexity of deploying machine learning systems in production. Andrew points out that the machine learning code itself often represents only a small fraction—about 5 to 10 percent—of the entire system. The rest includes components like data management, monitoring, and feature extraction, which are essential for running a successful production deployment. Andrew introduces a diagram adapted from a research paper to illustrate the various components beyond the model. He notes that transitioning from a proof-of-concept model to full production often involves significant additional work, which is why this specialization will teach learners how to manage the entire lifecycle of a machine learning project effectively.

Mindmap

Keywords

💡Production Deployment

Production deployment refers to the process of taking a machine learning model from the development phase and implementing it in a real-world environment where it can generate value. In the video, it's emphasized that building a machine learning model is just the first step, and putting it into production is critical for practical use, like defect detection in manufacturing.

💡Edge Device

An edge device is hardware that operates at the 'edge' of a network, such as a device within a factory that performs localized computing without relying on cloud services. The video explains that in manufacturing, edge devices are often used for tasks like inspecting smartphones for defects, as they can continue working even if the internet connection goes down.

💡Prediction Server

A prediction server is a system that receives input data, runs it through a trained machine learning model, and returns predictions. In the video, the prediction server is a key component in the defect detection system, receiving images of phones and deciding whether they are defective. It can operate in the cloud or at the edge.

💡API

API (Application Programming Interface) is a set of protocols for building and interacting with software applications. In this context, the video describes how an API is used to send images of phones from an edge device to a prediction server for defect detection, showing how APIs facilitate communication between components of a machine learning system.

💡Data Drift

Data drift, also known as concept drift, occurs when the data in a production environment differs from the data used during the training of the model. The video highlights that this is a common challenge in machine learning deployment, such as when lighting conditions change in a factory, which can affect the model's performance in detecting defects.

💡Proof of Concept (POC)

A proof of concept (POC) is an early model of a machine learning project that demonstrates the feasibility of the approach. The video describes how even when a POC works in a controlled environment (like Jupyter Notebook), much more effort is required to translate it into a reliable production system.

💡Automated Visual Defect Inspection

Automated visual defect inspection is the process of using computer vision algorithms to detect defects in products, such as smartphones on a manufacturing line. The video provides an example of this system, where images are analyzed to identify scratches or cracks, illustrating a practical application of machine learning.

💡Model Lifecycle

The model lifecycle refers to the entire process of developing, training, deploying, and maintaining a machine learning model. In the video, Andrew Ng stresses the importance of understanding the full lifecycle, from model training to production deployment, and managing changes in data or production environments.

💡Data Verification

Data verification is the process of ensuring that the data used in machine learning models is accurate and reliable. The video mentions data verification as part of the broader set of tasks required for a production deployment, such as verifying data integrity before sending it to a model for predictions.

💡Feature Extraction

Feature extraction involves selecting and transforming raw data into a set of features that can be used by a machine learning model. The video references feature extraction as one of the essential components that need to be built and managed in addition to the machine learning code when deploying models in production.

Highlights

Training a machine learning model is only the beginning; putting it into production is key to maximizing value.

Many interviewers will ask if you have experience deploying machine learning algorithms.

The specialization teaches hands-on skills to deploy machine learning models, including managing the entire life cycle.

An example of machine learning in production: inspecting smartphones on a manufacturing line for defects using computer vision.

Edge devices in factories can use inspection software to detect scratches on phones, making real-time decisions.

Edge deployments are common in manufacturing to ensure operations continue even if internet access is lost.

Concept drift or data drift occurs when the data collected during training differs from real-world deployment data.

Machine learning engineers should take responsibility for adjusting models to real-world data conditions.

Success in a development environment, like a Jupyter notebook, is only the beginning; practical deployment requires more work.

A significant challenge in production deployment is dealing with data distribution changes, requiring engineers to adapt models.

Only 5-10% of the code in a machine learning system is related to the model; the majority supports other functions like data collection and monitoring.

A gap exists between proof-of-concept models and real-world production deployment.

Transcripts

play00:00

hi and welcome to machine learning

play00:02

engineering for production a lot of

play00:04

learners have asked me hey andrew i've

play00:06

learned to train a machine learning

play00:08

model now what do i do

play00:11

machine learning models are great but

play00:13

unless you know how to put them into

play00:15

production it's hard to get them to

play00:17

create the maximum amount of possible

play00:19

value

play00:20

or for those of you that may be looking

play00:22

for position in machine learning many

play00:24

interviewers will ask have you ever

play00:25

deployed a machine learning algorithm

play00:28

in this full course specialization the

play00:31

first course taught by me the second

play00:33

third and fourth courses taught by

play00:34

robert crowe who's an expert at this

play00:36

from google

play00:37

we hope to share with you the practical

play00:39

hands-on skills and techniques you need

play00:42

to not just build a machine learning

play00:43

model but also to put them into

play00:46

production and so by the end of this

play00:48

first course and by the end of this

play00:51

specialization i hope you have a good

play00:52

sense of the entire life cycle of

play00:54

machine learning project from training

play00:56

model to put into production and really

play00:58

how to manage the entire machine

play01:00

learning project let's jump in

play01:03

let's start with an example

play01:05

let's say you're using computer vision

play01:07

to inspect phones coming off the

play01:09

manufacturing line to see if there are

play01:11

defects on them so this film shown on

play01:14

the left doesn't have any scratches on

play01:16

it but if there was a scratch or crack

play01:18

or something

play01:19

a computer vision algorithm

play01:21

would hopefully be able to find

play01:24

this type of scratch or defect

play01:27

and maybe put the bounding box around it

play01:29

as part of quality control

play01:32

if you get a data set of scratch phones

play01:35

you can train a computer vision

play01:37

algorithm maybe in your network to

play01:39

detect these types of defects but what

play01:41

do you now need to do in order to put

play01:43

this into production deployment

play01:46

this would be an example of how you

play01:48

could deploy a system like this you

play01:51

might have an edge device by edge device

play01:54

i mean a device that is living inside

play01:57

the factory that is manufacturing these

play02:00

smartphones and that edge device would

play02:02

have a piece of inspection software

play02:04

whose job it is to take a picture of the

play02:07

phone see if there's a scratch and then

play02:08

make a decision on whether this phone is

play02:11

acceptable or not this is actually

play02:14

commonly done in factories this is

play02:16

called automated

play02:17

visual defect inspection what the

play02:19

inspection software does is it will

play02:21

control camera that will take a picture

play02:24

of the smartphone as it rolls off the

play02:25

manufacturing line

play02:27

and it then has to make an api call to

play02:30

pass this picture

play02:31

to a prediction server

play02:34

and the job of the prediction server is

play02:37

to accept these api calls you know

play02:39

receive an image make a decision as to

play02:42

whether or not this phone is effective

play02:45

and return this prediction and then the

play02:47

inspection software can make the

play02:49

appropriate control decision whether to

play02:51

let the stone move on in the

play02:53

manufacturing line or whether to shove

play02:54

it to a side because you know was

play02:56

defective and not acceptable

play02:59

so after you have trained a learning

play03:01

algorithm maybe trained in your network

play03:04

to take as input x pictures of phones

play03:07

and map them to why

play03:09

predictions about whether the phone is

play03:12

defective or not

play03:13

you still have to take this

play03:16

machine learning model

play03:18

puts it in a prediction server set up

play03:20

api interfaces and really write all of

play03:22

the rest of the software in order to

play03:25

deploy this learning algorithm into

play03:28

production

play03:29

this prediction server

play03:31

is sometimes in the cloud

play03:33

and sometimes the prediction server is

play03:35

actually at the edge as well in fact in

play03:38

manufacturing we use edge deployments a

play03:41

lot because you can't have your factory

play03:43

go down every time your internet access

play03:45

goes down

play03:46

but

play03:47

cloud deployments with prediction server

play03:49

is a server in the cloud is also used

play03:51

for many applications

play03:53

let's say you write all the software

play03:55

what could possibly go wrong

play03:58

it turns out that just because you've

play04:00

trained a learning algorithm that does

play04:03

well on your test set which is to be

play04:05

celebrated it's great when you do well

play04:07

when you hold a test set

play04:09

unfortunately reaching that milestone

play04:11

doesn't mean you're done there can still

play04:14

be quite a lot of work and challenges

play04:16

ahead

play04:18

to get a valuable production deployment

play04:20

running

play04:22

for example

play04:24

let's say your training set has images

play04:26

that look like this

play04:28

there's a

play04:29

good phone on the left the one in the

play04:31

middle has a big scratch across it and

play04:34

you've trained your learning algorithm

play04:36

to recognize that phones like this on

play04:37

the left are okay

play04:40

meaning that no defects and

play04:43

maybe draw bounding boxes around

play04:45

scratches or other defects it finds in

play04:47

phones

play04:49

when you deploy it in the factory you

play04:51

may find that

play04:52

the real-life production deployment

play04:54

gives you back images like this

play04:57

much darker ones because the lighting

play04:59

factory

play05:00

because the lighting conditions in the

play05:01

factory have changed for some reason

play05:03

compared to the time when the training

play05:05

set was collected

play05:06

this problem is sometimes called concept

play05:09

drift or data drift you learn more about

play05:12

these terms later in this week

play05:15

but this is one example of the many

play05:18

practical problems that we as machine

play05:21

learning engineers should step up to

play05:23

solve

play05:24

if we want to make sure that we don't

play05:26

just do well on the hold out test set

play05:28

but that our systems actually create

play05:31

value

play05:32

in a practical

play05:34

production deployment environment i've

play05:37

worked on quite a few projects where my

play05:40

machine learning team and i would

play05:42

successfully know a proof of concept and

play05:44

by that i mean we train a model in

play05:47

jupiter notebook and it will work great

play05:49

and we will celebrate that you know you

play05:51

should celebrate it when you have a

play05:52

learning algorithm work well in a

play05:54

jupiter notebook or in a development

play05:56

environment

play05:58

but it turns out that sometimes i'll see

play06:01

many projects where that success which

play06:02

is a great success

play06:04

to the practical deployment is still

play06:06

maybe another six months of work

play06:09

and this is just one of many of the

play06:12

practical things that a machine learning

play06:15

team has to

play06:16

watch out for and handle in order to

play06:19

actually deploy these systems

play06:21

some machine learning engineers will say

play06:23

is not a machine learning problem to

play06:24

access these problems you know the data

play06:26

set changes some machine engineers think

play06:29

well is that a machine learning problem

play06:32

my point of view is that our job is to

play06:34

make these things work um and so if the

play06:37

data set has changed is i think of it as

play06:40

my responsibility when i work on a

play06:41

project to step in and do what i can to

play06:44

adjust the data distribution as it is

play06:46

rather than as i wish it is

play06:49

so this specialization will teach you

play06:51

about a lot of these important practical

play06:53

things for building machine learning

play06:55

systems that work not just in the lab

play06:57

not just in the jupyter notebook but in

play06:59

a production deployment environment

play07:01

a second challenge of deploying machine

play07:03

learning models in production is that it

play07:06

takes a lot more than machine learning

play07:08

code

play07:09

over the last decade there's been a lot

play07:11

of attention on machine learning models

play07:14

so your neural network or

play07:16

other algorithm

play07:17

that learns a function

play07:19

mapping from some input to some output

play07:22

and there's been

play07:23

amazing progress in machine learning

play07:26

models

play07:28

but it turns out that if you look at a

play07:30

machine learning system in production

play07:32

if this little orange rectangle

play07:34

represents the machine learning code the

play07:37

machine learning model code

play07:38

then this is all the code you need for

play07:41

the entire machine learning project

play07:44

i feel like for many machine learning

play07:46

projects

play07:48

maybe only five to ten percent maybe

play07:50

even less of the code is machine

play07:52

learning code and and i think this is

play07:54

one of the reasons why when you have a

play07:58

proof of concept model working

play08:00

maybe in jupiter notebook

play08:02

it can still be a lot of work to go from

play08:04

that initial proof of concept to the

play08:07

production deployment so sometimes

play08:10

people refer to the poc

play08:13

or the proof of concept to production

play08:15

gap

play08:16

and a lot of that gap is sometimes just

play08:19

the sheer amount of work it is to also

play08:22

write all of this code out here

play08:25

beyond the

play08:26

initial

play08:28

machine learning model code so what is

play08:31

all this other stuff this is a diagram

play08:34

that i've adapted from a paper by d

play08:38

scully and others beyond the machine

play08:41

learning code there are also many

play08:43

components especially components for

play08:46

managing the data such as data

play08:49

collection data verification feature

play08:52

extraction

play08:53

and after you are serving it how to

play08:55

monitor the system well monitor the data

play08:58

comes back help you analyze it but there

play09:00

are often many other components that

play09:02

need to be built

play09:03

to enable a working production

play09:05

deployment so in this course you learn

play09:08

what are all of these other pieces of

play09:11

software needed for a valuable

play09:13

production deployment

play09:15

but rather than looking at all of these

play09:17

complex pieces one of the most useful

play09:19

frameworks i found for organizing the

play09:21

workflow of a machine learning project

play09:23

is to systematically plan out the life

play09:26

cycle of a machine learning project

play09:28

let's go to the next video to dive in to

play09:31

what is the full life cycle of a machine

play09:33

learning project and i hope this

play09:35

framework will be very useful for all of

play09:38

your machine learning projects that you

play09:39

plan to deploy in the future let's go to

play09:41

the next video

Rate This

5.0 / 5 (0 votes)

関連タグ
Machine LearningProduction DeploymentModel TrainingData EngineeringConcept DriftEdge DevicesAPI IntegrationQuality ControlAutomationML Lifecycle
英語で要約が必要ですか?