Model Monitoring with Sagemaker

Metal Toad
2 Oct 202325:49

Summary

TLDRThis webinar discusses model monitoring in production environments using AWS SageMaker. It covers the importance of monitoring model performance, the concept of model drift, and how SageMaker helps detect data quality and model accuracy issues. The session also includes a case study on implementing custom model monitoring for a security application.

Takeaways

  • 😀 Model monitoring is crucial for tracking the performance of machine learning models in production environments and identifying when they are not performing as expected.
  • 🔍 Model drift refers to the decline in a model's performance over time due to changes in data or the environment, which can be categorized into data drift, bias drift, or feature attribution drift.
  • 🛠 AWS SageMaker offers tools for model monitoring that can help detect issues like data quality, model quality, bias, and explainability, with SageMaker Clarify being a key tool for bias and explainability.
  • 📈 Establishing a baseline for model performance metrics is essential for model monitoring, allowing for the comparison of current performance against expected standards.
  • 🔒 Data capture is a critical component of model monitoring, involving the collection of both the data sent to the model and the ground truth labels for model quality monitoring.
  • 📊 Model monitoring jobs can be scheduled to detect data quality drift and model accuracy, with results and alerts made available in S3, SageMaker Studio, and CloudWatch for action.
  • 👷‍♂️ Best practices for model monitoring include keeping instance utilization below 70% to ensure effective data capture and maintaining data in the same region as the model monitoring.
  • 📝 The architecture of data capture involves a training job, a model endpoint, a baseline processing job, and a monitoring job that captures and analyzes data quality and model predictions.
  • 📉 Monitoring for data quality involves checking for violations such as missing or extra columns, unexpected data types, or a high number of null values, which can trigger CloudWatch alerts.
  • 🎯 Model quality monitoring requires ground truth labels to compare predictions with reality, using metrics like RMSC or F1 score to evaluate accuracy and prediction quality.
  • 🔧 Custom model monitor scripts can be created for specific needs, deployed as Docker containers in AWS ECR, and scheduled to run at intervals to capture and analyze custom metrics.

Q & A

  • What is model monitoring?

    -Model monitoring is the process of monitoring your model's performance in a production-level environment. It involves capturing key performance indicators that can indicate when a model is performing well or not, helping to detect issues like model drift.

  • Why is model monitoring necessary?

    -Model monitoring is necessary because when machine learning models are deployed to production, factors can change over time, causing the model's performance to drift from the expected levels. Monitoring helps detect these changes and ensures the model continues to perform as intended.

  • What is model drift?

    -Model drift refers to the decay of a model's performance over time after it has been deployed to production. It can be caused by changes in the data distribution (data drift), changes in the data that the model sees compared to the training data (bias drift), or changes in the features and their attribution scores (feature attribution drift).

  • What are the different types of model monitoring?

    -The different types of model monitoring include data quality monitoring, model quality monitoring, bias monitoring, and explainability monitoring. These aspects help ensure the model's predictions are accurate and unbiased, and that the model's decisions can be explained.

  • How does SageMaker help with model monitoring?

    -SageMaker provides tools and services for model monitoring, including the ability to capture data, establish baselines, and schedule model monitoring jobs. It can detect data quality drift, model accuracy, and other issues, with findings made available in S3 and visualized in SageMaker Studio.

  • What is the lifecycle of model monitoring in SageMaker?

    -The lifecycle of model monitoring in SageMaker involves deploying a model, enabling data capture, collecting ground truth data, generating a baseline, scheduling model monitoring jobs, and taking action based on the findings, such as retraining the model if necessary.

  • What are some best practices for model monitoring in production?

    -Best practices include keeping instance utilization below 70% to avoid reduced data capture, ensuring data captured for monitoring is in the same region as the model monitoring, and using lowercase variables with underscores to ease parsing in JSON and Spark jobs.

  • How can custom metrics be used in SageMaker model monitoring?

    -Custom metrics can be used by developing a script, packaging it in a Docker container, and deploying it as a custom model monitor metric in AWS. This allows for monitoring specific aspects of model performance that are not covered by standard metrics.

  • What is the role of CloudWatch in SageMaker model monitoring?

    -CloudWatch plays a crucial role in SageMaker model monitoring by receiving alerts and metrics from the monitoring jobs. These alerts can trigger actions, such as retraining the model or adjusting the model's parameters, based on the detected issues.

  • How can AWS Proof of Concept (PoC) funding support machine learning initiatives?

    -AWS offers PoC funding for machine learning projects in partnership with aligned partners. This funding can cover up to 10% of the annual reoccurring revenue, with a one-time cap of $25,000, to support the evaluation and development of machine learning solutions.

Outlines

00:00

🔍 Introduction to Model Monitoring with SageMaker

This paragraph introduces the webinar's focus on model monitoring within AWS SageMaker. It discusses the common challenges faced post-deployment of machine learning models, emphasizing the importance of model monitoring to track performance in a production environment. Model drift, which can occur due to changes in data distribution or model bias, is identified as a key issue. The speaker transitions to a deeper exploration of model drift and its solutions using AWS SageMaker.

05:01

📈 Understanding Model Monitoring and Data Quality

The second paragraph delves into the specifics of model monitoring, highlighting the importance of monitoring both data and model quality. It explains the significance of establishing a baseline for statistical properties and the role of business rules in this process. The paragraph also touches on the best practices for data capture and monitoring, including the importance of keeping instance utilization below 70% and ensuring data is captured in the same region as the model monitoring for consistency.

10:02

🛠️ Model Quality Monitoring and Violation Checks

This section discusses the model quality monitoring process, which requires ground truth labels to assess the accuracy of predictions against actual outcomes. It describes the architecture of data capture and the workflow for monitoring, including the establishment of baselines, violation checks for data integrity, and the use of CloudWatch alerts for taking action on detected issues. The paragraph also outlines the types of violations that can occur and how they are managed.

15:03

📊 Model Accuracy and Custom Metrics in Monitoring

The focus shifts to model accuracy and the use of custom metrics for monitoring. It describes a scenario where a client, SecurToad, requires constant monitoring of a machine learning model with custom metrics. The solution involves using the Jensen Shannon Divergence metric to measure distribution changes in model scores, which can indicate the need for retraining. The paragraph outlines the process of creating a Python script for this metric and deploying it as a custom model monitor in AWS.

20:04

🔧 Automating Model Monitoring with AWS and GitHub Actions

This paragraph explains the automation of the model monitoring process using AWS services and GitHub Actions. It details the creation of a Docker container image for the custom model monitor script and the deployment pipeline to AWS ECR. The paragraph also describes the integration of the model monitor into the ML training pipeline, including steps for training jobs, endpoint configurations, and the setup of model monitoring jobs that run hourly.

25:05

📚 Conclusion and PoC Funding Opportunities

The final paragraph wraps up the webinar by summarizing the solution created for SecurToad and introducing the concept of AWS Proof of Concept (PoC) funding. It explains how AWS and partners like Metal Toad can support the evaluation of machine learning initiatives with funding up to $25,000. The paragraph concludes with an invitation for further questions and a brief mention of best practices for using SageMaker Model Monitor effectively.

❓ Q&A Session on SageMaker Model Monitoring

The last part of the script is a Q&A session where participants ask about best practices for using SageMaker Model Monitor and how it captures and stores data required for monitoring and analysis. The responses provide insights into the technical aspects of model monitoring scripts, their similarity to Lambda functions, and the use of environmental variables within the scripts.

Mindmap

Keywords

💡Model Monitoring

Model monitoring refers to the process of tracking a machine learning model's performance in a production environment. It involves capturing key performance indicators to determine when a model is functioning well or when it requires intervention. In the video, model monitoring is crucial for detecting issues such as model drift, ensuring that the deployed ML solutions maintain their expected performance over time.

💡Production Environment

A production environment is the setting where software solutions, including machine learning models, are deployed for real-world use after development and testing phases. The script discusses the importance of model monitoring in a production environment to ensure ongoing performance and to detect any changes that may affect the model's accuracy.

💡Model Drift

Model drift is a concept that describes the decline in a model's performance over time after deployment. It can be caused by changes in data distribution (data drift) or changes in the relationship between features and target variables (concept drift). The video script mentions model drift as a central problem that model monitoring aims to detect and address.

💡Data Quality

Data quality in the context of the video refers to the assessment of the statistical properties and integrity of the data being fed into the model. Ensuring data quality is vital for model monitoring as it helps in identifying data drift, which might lead to model performance degradation. The script discusses monitoring data quality through statistics like mean, sum, and standard deviation.

💡Model Quality

Model quality pertains to the accuracy and reliability of a model's predictions compared to actual outcomes. The script highlights the importance of monitoring model quality through metrics such as RMSE or F1 score, which can indicate when a model's predictive performance deviates from the established baseline.

💡Bias Drift

Bias drift is a type of model drift that occurs when the model starts to make predictions that are consistently biased towards a certain class or outcome. The video script mentions bias drift as one of the factors that can change over time in a production environment, affecting the fairness and accuracy of the model's predictions.

💡Feature Attribution Drift

Feature attribution drift refers to changes in the importance or relevance of features used in a machine learning model over time. This can impact the model's decision-making process and overall performance. The script discusses this concept as part of the model drift issues that need to be monitored in a production environment.

💡SageMaker Clarify

SageMaker Clarify is a tool within the AWS SageMaker suite used for model explainability and bias detection. The script mentions SageMaker Clarify as an integral tool for addressing model bias and explainability, which are important aspects of model monitoring to ensure transparency and fairness in model predictions.

💡Data Capture

Data capture is the process of collecting and storing the data that is sent to a model for inference, as well as the ground truth labels for model quality monitoring. The script explains that data capture is a critical step in model monitoring, allowing for the analysis of data quality and model predictions against actual outcomes.

💡Jensen Shannon Divergence

Jensen Shannon Divergence is a statistical measure used to quantify the difference between two probability distributions. In the context of the video, it is used as a custom metric to monitor the distribution of model scores across multiple clients, helping to detect when the model's predictive performance may be drifting from the expected distribution.

💡Proof of Concept (POC) Funding

Proof of Concept (POC) Funding is a financial support mechanism offered by AWS in partnership with aligned partners like Metal Toad. It provides funding to evaluate and demonstrate the viability of a project before full-scale implementation. The script mentions POC funding as an opportunity for stakeholders to explore machine learning initiatives with potential financial support from AWS.

Highlights

Webinar discusses model monitoring with SageMaker, focusing on maintaining ML solution performance in production environments.

Model monitoring involves capturing key performance indicators to assess model performance and detect issues.

Model drift is identified as a significant challenge, causing model performance to decay over time in production.

Data drift and bias drift are explained as types of model drift, affecting model accuracy due to changing data distributions.

Feature attribution drift is introduced as a change in the importance of features over time, impacting model performance.

Model monitoring is crucial for detecting when a model needs retraining, optimizing resource use and maintaining performance.

SageMaker Clarify is highlighted as a tool for model explainability and bias detection, integrating with model monitoring.

Data quality monitoring is discussed, emphasizing the importance of capturing data sent to the model and ground truth labels.

Establishing a baseline for model quality and data quality is crucial for detecting drift and maintaining model accuracy.

Model monitoring jobs are scheduled to detect data quality drift and model accuracy, with results available in S3 and visualized in SageMaker Studio.

Best practices for model monitoring include keeping instance utilization below 70% to ensure effective data capture.

The architecture of data capture is detailed, explaining the flow from training data to model deployment and monitoring.

Violation checks for data quality are described, including checks for missing columns, extra columns, and unexpected null values.

Model quality monitoring requires target variables and involves capturing labeling data to assess model predictions.

Jensen Shannon Divergence is introduced as a metric for measuring distribution differences, used for custom model monitoring.

A custom model monitor script is developed using Python and deployed as a Docker container image to AWS ECR for model monitoring.

A step function is used in the ML training pipeline to automate model training, deployment, and monitoring for continuous improvement.

AWS offers proof of concept funding for machine learning initiatives, supporting up to 10% of one year's annual reoccurring revenue.

Best practices for using SageMaker Model Monitor include using lowercase variables and VPC endpoints for ease of integration.

Data capture configuration in SageMaker allows defining an S3 upload path and setting a sampling percentage for data analysis.

Model monitor scripts are similar to Lambda functions, with environmental variables set up for script execution.

Transcripts

play00:05

so today's webinar will be about model

play00:07

monitoring with

play00:09

sagemaker so clients often face the

play00:12

following problem once my ml solution is

play00:15

deployed to a production level

play00:16

environment how do I know if my ml

play00:18

solution is

play00:20

working the solution model

play00:23

monitoring what is model monitoring

play00:26

model monitoring is the process of

play00:28

monitoring your model performance in the

play00:31

production level environment and this

play00:33

means capturing key performance

play00:35

indicators that can tell you when a

play00:37

model is doing well and when it's doing

play00:40

not so why do I need model monitoring

play00:43

when deploy machine learning models to

play00:45

production there are S factors that can

play00:47

change over time the model may start to

play00:50

drift from the performance You're

play00:52

Expecting when you deployed it the data

play00:54

being fed into your model may change

play00:56

over time depending on your use case the

play01:00

second model is deploy to production you

play01:02

already are going to need to get ready

play01:04

to retrain and redeploy

play01:06

it so what is exactly model drift model

play01:10

drift is the decay of a model's

play01:12

performance over time once it's deployed

play01:15

to production model drift may be called

play01:18

Data drift the data distribution that is

play01:20

being fed into your ml model changes

play01:22

over time or bias drift if a type of is

play01:26

a type of model drift that occurs if the

play01:28

data that you use to train your model is

play01:31

different compared to the data your

play01:32

model sees in production and feature

play01:35

attribution drift is the drift of the

play01:37

features being used in your ml model and

play01:40

their attribution scores which can

play01:42

change over time depending on the data

play01:44

that's being fed in but at the core

play01:47

model drift is caused by changes in your

play01:50

data and now I'll hand us over to proi

play01:53

to give us a deep dive into solving

play01:55

these specific model drift problems

play01:57

using AWS sagemaker

play02:00

thanks Ray uh so let's go to the next

play02:05

slide okay now Reay has already covered

play02:09

why we need the model monitoring

play02:11

basically as he mentioned model

play02:12

monitoring is monitoring the model in

play02:15

production we can um keep on I mean if

play02:19

you don't want to model the

play02:20

monitor monitor the model another way of

play02:23

doing it is keep on retraining the model

play02:25

but continuous retraining does not solve

play02:28

issues and as well as it needs a lot of

play02:31

resources so retraining only when

play02:34

necessary is what you should aim for and

play02:37

that's where the model monitor is going

play02:39

to help you if you can go to the next

play02:43

slide as mentioned uh already there are

play02:46

these types of uh model uh monitoring

play02:49

the data quality the model quality the

play02:52

bias and

play02:53

explainability uh basically the data

play02:56

quality has a lot of uh for example a

play02:59

lot lot of Statistics mentioned here

play03:01

mean and sum and standard deviation Etc

play03:04

where uh and I'm going to go into a

play03:05

little bit more detail about that but

play03:08

those are the criteria on which the data

play03:10

will be examined the other one is the

play03:13

model quality where your model is

play03:15

predicting something and how does that

play03:18

prediction compare with the actual

play03:20

reality and that's where we are looking

play03:23

at the model quality and of course the

play03:25

rmsc or the F1 score Etc are the uh

play03:28

metrics used used for the accuracy and

play03:31

prediction the other one is bias re

play03:34

already mentioned about it and then the

play03:36

model explainability I should mention

play03:39

that for model bias and model

play03:41

explainability the sage maker clarify is

play03:44

a tool that we are we use typically

play03:47

because clarify is used for model

play03:49

explainability and it wonderfully

play03:52

integrates with the model monitoring

play03:54

service so you can basically use the

play03:57

sage model Monitor and further use

play04:00

clarify to talk about the bias and

play04:02

explainability in today's presentation

play04:04

I'm majorly going to focus on the data

play04:06

quality and the model quality and let's

play04:08

do that if you can go to the next

play04:12

slide okay so this is typically the life

play04:15

cycle of model monitoring uh what we are

play04:18

going to do is uh we need a model that

play04:22

is deployed of course which is deployed

play04:23

in production and we need to enable the

play04:26

data capture now this data capture is in

play04:29

two ways one is you are capturing the

play04:32

data which is um sent to the model and

play04:37

uh there is a specific way you enable in

play04:39

in the sagemaker API where you enable

play04:41

the data capture uh and the other one is

play04:45

for and we call it ground truth data I

play04:47

will be referring it to to it as ground

play04:49

truth it does not mean sagemaker ground

play04:51

truth it's just the ground truth data so

play04:55

the other data that you're going to

play04:56

capture is the ground truth labels and

play04:59

this this data is something uh which is

play05:01

needed for the model quality monitoring

play05:03

and that is the data that the customer

play05:07

the user will have to provide and I'll

play05:09

explain a little bit more about that the

play05:11

data quality monitoring does not need

play05:14

the ground truth labels the model

play05:16

quality monitoring model quality

play05:18

monitoring will need the ground truth

play05:19

labels but anyway basically you have a

play05:23

model which is deployed then you collect

play05:25

the ground truth based on which quality

play05:28

that you are talking uh you're

play05:29

mentioning then you generate a baseline

play05:32

now for data quality as well as model

play05:35

quality uh establishing a baseline of

play05:38

statistics and mentioning the

play05:40

constraints is an important step in um

play05:43

uh the model monitoring where you

play05:45

establish something against which the

play05:47

quality of the model will be compared so

play05:50

that the data drift or the accuracy

play05:52

drift can be justified um can be

play05:55

calculated and then um uh you schedule

play05:59

the model model monitoring jobs uh which

play06:01

will detect the data quality drift the

play06:03

model accuracy Etc and those uh uh

play06:08

findings are made available in S3 as

play06:11

well as could be visualized in sagemaker

play06:13

studio they're also sent to Cloud watch

play06:16

on which further action depending on the

play06:18

cloud watch alerts can be taken if you

play06:20

can go to the next

play06:23

slide okay so let's look at monitoring

play06:26

data quality next slide please

play06:31

so uh we looked at couple of statistical

play06:33

properties uh for the independent

play06:36

variables in the data that were

play06:37

mentioned in the earlier slide they can

play06:39

be mean mode Etc uh what you're going to

play06:41

do is looking at the ground truth data

play06:44

when you are establishing a baseline you

play06:46

are going to have some business rules

play06:48

the business rules could be

play06:49

automatically suggested they could be um

play06:52

also edited and updated by you uh and

play06:56

then once those uh business rules are

play06:58

set basically a baseline will be

play07:01

established looking at your uh sent uh

play07:05

looking at your initial ground truth

play07:07

data again I'm not looking at the

play07:09

predicted variable I'm just looking at

play07:11

the data now the best practices for this

play07:14

type of Baseline generation and further

play07:16

data quality is that the S3 in which the

play07:19

data is captured um the initial data as

play07:22

well as later on the data which is

play07:24

captured by the model during model

play07:26

monitoring should be in the same region

play07:28

of course as the model monitoring and

play07:31

both for model monitor model quality

play07:33

Monitor and data quality monitor the

play07:36

instance utilization should be something

play07:38

that you should watch for uh you should

play07:40

not have the instance uh utilization

play07:43

above 70% because many times Sage maker

play07:47

tends to uh reduce the data capture if

play07:51

the instance utilization is above 70 so

play07:53

those are the and then uh of course the

play07:55

model monitoring Effectiveness kind of

play07:57

goes down so

play08:00

uh this is where uh you should uh these

play08:02

are some of the best practices um if you

play08:06

can uh go to the next slide uh when we

play08:09

look at the yeah this is the diagram

play08:12

that depicts the architecture of the

play08:14

data capture uh you have the training

play08:17

data using the training data uh and

play08:19

sagemaker training job there is a model

play08:22

that is generated a train and it is

play08:25

enabled as a sagemaker endpoint now you

play08:28

also have a Baseline processing job

play08:30

which looking at your business rules it

play08:32

has created a Baseline and various uh

play08:36

mentioned various statistics suggested a

play08:38

lot of constraints there is a small

play08:41

group of users which is mentioned here

play08:43

which shows that you can look at the

play08:45

constraints and you can update the

play08:47

constraints the business rules so to say

play08:50

uh once the data the once the endpoint

play08:52

is deployed the there are requests uh

play08:56

for uh specific uh data production data

play08:59

and predictions are made and there is a

play09:02

modeling job there is a monitoring job

play09:05

for the model monitor which is scheduled

play09:07

which is going to uh s save all the

play09:11

results and statistics and violations in

play09:14

S3 so now when I'm talking about

play09:16

violations those violations are of

play09:18

course going to generate Cloud watch

play09:20

alerts and metrics and depending on the

play09:22

cloud watch alert you can decide to take

play09:24

action maybe if it is a cloud watch

play09:26

alert which says say for example the

play09:29

data quality prediction the the the data

play09:31

quality monitor shows me that the data

play09:33

drift is more than 30% maybe that's the

play09:36

time you decide to retrain the model and

play09:39

that's where um uh looking at the

play09:41

cloudwatch alert you can just um trigger

play09:44

a job that is going to retrain the model

play09:47

so this is about the data quality

play09:49

monitoring workflow if you can go to the

play09:51

next

play09:53

slide some of the violation check types

play09:56

are mentioned here so say for example

play09:59

if the type of the data that is uh

play10:02

getting sent to the model is is the same

play10:04

on which the model was originally

play10:05

trained is are there any missing columns

play10:08

are there any extra columns are there

play10:10

same categories are there more than um

play10:14

uh more than expected number of nulls in

play10:17

the data all of this can happen when

play10:19

suddenly the data that is sent in the

play10:21

production changes typical example given

play10:24

is uh during the pandemic we suddenly

play10:27

shifted the interactions to remote uh

play10:29

from the inperson interactions so the

play10:32

data that was sent to the models was

play10:34

suddenly um and radically shifted so all

play10:37

these violations are going to kind of

play10:39

document the data quality that is coming

play10:41

in for you and I talked about the cloud

play10:43

was alerts these violations will show up

play10:46

in the alerts and you can act on them

play10:48

once they show up in cloud watch if you

play10:50

can go to the next

play10:52

slide next is the model mon monitor uh

play10:56

model quality uh which is going to need

play10:58

your target variables and I'll tell you

play11:00

how if you can go to the next

play11:03

slide this is about the model accuracy

play11:06

so it's not just the data that is going

play11:08

to shift it is the accuracy as well so

play11:11

sometimes the data does not shift very

play11:13

drastically but the business rules uh

play11:16

are such that the accuracy of the model

play11:19

uh in predicting the target variable

play11:21

keeps on uh going below the established

play11:25

thresholds for this what you want to do

play11:27

is you want to capture the labeling data

play11:31

as well now the data capture that is

play11:34

enabled by Sage maker is going to

play11:36

predict is going to capture the data

play11:38

that you that the model predicted but

play11:41

there also needs to be a way for the

play11:43

model monitor to get an access to the

play11:45

predicted variables of the model and

play11:48

typically this is done uh doing human in

play11:50

the loop or there are multiple ways of

play11:52

doing that where in the production data

play11:55

doing some sampling you are predicting

play11:57

the variables and basically Sage maker

play12:01

the target variables and basically

play12:02

sagemaker is going to combine those

play12:04

human predicted Target variables uh over

play12:07

the period of time with the uh Target

play12:10

variables predicted by Sage maker uh

play12:13

model and uh execute a merge job and

play12:17

then uh depending on um the results of

play12:20

the merge compare the feature statistics

play12:23

uh this is more elaborated in the

play12:25

workflow uh on the next slide if you can

play12:28

go there

play12:29

oh sorry there's one more so these are

play12:32

typically the okay these are typically

play12:35

the uh quality metrics that that it is

play12:37

compared against and of course depending

play12:40

on whether it is multiclass

play12:41

classification or binary classification

play12:43

or regression models those are the

play12:45

quality metrics that sagemaker model

play12:46

monitoring is going to work off of if

play12:49

you can go to the next

play12:50

slide okay so here as depicted in the

play12:53

diagram basically the initial part the

play12:56

right hand side left hand side of part

play12:58

of the diagram looks the same for data

play13:00

quality and model quality uh you are

play13:03

basically establishing a Baseline and

play13:05

then uh what is happening is uh once the

play13:09

labels are provided the ground truth

play13:11

interface on the rightmost side that is

play13:14

the uh the inference uh variables which

play13:18

are shared by human in the loop and the

play13:20

inference inputs are shared uh also by

play13:24

the sage maker now in this case it is a

play13:27

batch transform but it could be a real

play13:28

time and point it could be a batch

play13:30

transform as well um those two variables

play13:34

are merged and the merg data is also

play13:36

made available to you and uh a

play13:38

monitoring job is going to monitor

play13:41

results of the merged job and then the

play13:43

workflow kind of goes similar um way the

play13:45

data quality work goes where uh you can

play13:49

look at the thresholds and the

play13:51

statistics and a violation report is

play13:52

generated in Cloud watch and you can

play13:54

take action on that essentially this is

play13:57

like uh looking at the overall model

play13:59

quality and not just the data quality if

play14:02

you can go to the next slide I think

play14:04

that's the last one okay yeah so feel

play14:07

free to stop uh feel free to ask

play14:10

questions on any anything related to the

play14:12

sagemaker model monitoring and I can

play14:14

answer

play14:15

those great thank you proy uh I'll now

play14:18

be showing us how we used uh some of the

play14:20

tools that proy had described to solve a

play14:22

particular client ml

play14:24

problems so security toad came to us

play14:27

with a deployed machine learning model

play14:29

in production that needed to be

play14:31

constantly monitored for performance to

play14:34

ensure that their product was working

play14:36

consistently for all their clients the

play14:38

metrics that they needed to monitor were

play14:40

custom and needed to be monitored every

play14:43

hour secur out's application takes in as

play14:46

input cloudfront logs from its clients

play14:49

and uses an ml algorithm to determine

play14:52

whether activity on one of their client

play14:54

sites is malicious or

play14:56

not and secur system is is adjustable

play14:59

and also allows their clients to adjust

play15:01

the sensitivity of their system by

play15:03

setting thresholds to Auto Blacklist

play15:06

malicious IP addresses so our client

play15:08

needed a way to monitor their model and

play15:10

production and monitor it every hour uh

play15:13

with a custom metric for this particular

play15:16

algorithm this is the solution that we

play15:19

created for them secur Do's model needs

play15:22

to perform well across a wide range of

play15:24

clients so in order to do this we'll

play15:26

need a model that is monitor across the

play15:30

performance on multiple clients of the

play15:32

same algorithm so the model that they

play15:35

have essentially produces scores which

play15:38

can be used to generate a distribution

play15:40

of scores on site activity similar to

play15:43

the curve that you see on the right if

play15:46

this distribution of scores drifts and

play15:48

takes a different form over time then

play15:51

the model may need to be retrained then

play15:54

the question became what metric do we

play15:56

need to use to capture this specific

play15:58

kind of drift

play15:59

and was this something that we needed a

play16:00

custom script for so we chose to use a

play16:04

metric called Jensen Shannon Divergence

play16:06

which measures the difference between

play16:08

two distributions on a scale from 0 to

play16:11

one closer the number is to one more

play16:14

dissimilar the distributions are the

play16:17

closer the number is to zero the more

play16:19

similar they are the score is zero they

play16:22

are completely the same if it's one

play16:24

they're completely different this gave

play16:27

us a single metric that we could set up

play16:29

a monitor for by having a single value

play16:32

like this that helps describe the mod

play16:34

performance across multiple clients and

play16:37

flagged activity we could easily set a

play16:39

threshold to be used uh to determine

play16:42

when to alert secure toads development

play16:44

team that they need to retrain their

play16:46

model so we built a python script to

play16:50

capture this metric and prepared it to

play16:52

be deployed as a custom model monitor

play16:56

metric to deploy a custom model monitor

play16:58

metric to AWS we developed a GitHub

play17:01

actions workflow and pipeline to create

play17:04

a Docker container image uh with that

play17:07

script on it and upload it to AWS ECR

play17:11

elastic container registry

play17:13

repository uh to then be referenced by a

play17:15

model monitor job when we need to make

play17:17

one so this pipeline not only builds and

play17:21

deploys New Image if the code to the

play17:23

docker file or the custom metric script

play17:26

in Python is modified um but this also

play17:29

automates what would otherwise be a very

play17:32

intensive manual process so we can now

play17:35

easily build and deploy custom model

play17:38

monitor scripts uh giving us the

play17:40

flexibility with the metrics we can

play17:42

return uh from the monitor so now that

play17:46

we have the model monitor script ready

play17:48

we need to factor it in to our

play17:50

deployment

play17:51

pipeline so we built out as part of the

play17:54

ml training pipeline a step for

play17:56

sagemaker model monitor creation

play17:59

and automated cleanup for old endpoints

play18:01

and resources so I'll guide us through

play18:03

each step of the pipeline starting from

play18:06

the left and explain their features and

play18:08

what they do to add uh to the secure

play18:10

toad product so we need continuous

play18:13

monitoring and training of secure toads

play18:16

algorithms to have it perform up to the

play18:18

standard for their clients so in order

play18:20

to facilitate that more seamlessly we

play18:23

have a step function with all the

play18:26

necessary components for training the

play18:28

algorithm creating the model deploying

play18:31

endpoints and monitoring them so the

play18:34

first step here generates a unique

play18:36

training job name based on their

play18:38

requirements The Next Step creates a

play18:40

stagemaker training job based off of the

play18:43

name and secure Toad's custom

play18:46

algorithm we send in hyperparameters at

play18:48

the step as well so once the training

play18:51

job has been completed a model version

play18:54

is created and

play18:55

stored then the endpoint configuration

play18:58

is made which takes into consideration

play19:00

the instance size we're deploying it on

play19:03

the name of the endpoint and the data

play19:05

capture configuration which is used by

play19:08

the model

play19:09

monitor once we have an endpoint

play19:11

configuration the endpoint is deployed

play19:14

and can now be inv invoked by uh their

play19:17

production level site so once the

play19:20

endpoint has been deployed a sagemaker

play19:22

modle monitor job will be attached to it

play19:25

uh via a Lambda function so this takes

play19:28

the custom model monitor script that was

play19:31

deployed to ECR uh via our Docker

play19:34

Pipeline and schedules the script to run

play19:37

every hour so the model monitor takes

play19:39

data from the training data sets S3

play19:42

bucket and some sample inference data

play19:46

from production and generates metrics so

play19:49

the monitoring job is set to only

play19:51

execute if data has been sent through

play19:53

the deployed

play19:54

endpoint so when the model model monitor

play19:58

generates a model report Json file it's

play20:01

put into the model reports S3 bucket

play20:03

that we made for them and an S3 event

play20:06

and notification uh is sent to

play20:09

sqsq uh for evaluation uh by a report

play20:13

processing Lambda basically what this

play20:15

Lambda does is that it looks at the

play20:17

results of the report any kind of

play20:19

significant violations uh the results of

play20:22

the metrics and then determines what

play20:24

action to take after that for example we

play20:27

can conf have a configuration to send an

play20:29

alert to the SNS topic to you send a

play20:33

message to the development team to look

play20:35

at deploying another model or running

play20:37

more

play20:38

experiments and then the final step of

play20:40

the step function cleans up any old

play20:42

endpoints and M monitors it looks at all

play20:46

currently deployed endpoints and models

play20:48

and deletes any endpoints and models

play20:49

that are older uh than the last five

play20:51

that were

play20:52

deployed so now with a model monitor

play20:55

secure toad can better ensure quality of

play20:58

its application services to their

play21:00

clients and that's the solution we

play21:02

created for secur Toad and with that

play21:05

I'll hand us over to Nathan or Stacy to

play21:09

talk about our pocc funding

play21:13

opportunity there happy to do that hi

play21:16

everybody um and thank you guys so much

play21:18

for walking us through that um hopefully

play21:20

everybody had some great takeaways and

play21:22

if you did and you're thinking about um

play21:25

how do I get started and I may need to

play21:28

Mo out this concept to my executive

play21:29

stakeholders before I get before I get

play21:32

full funding approval then AWS and metal

play21:35

toad um have got uh some insights for

play21:39

you here uh AWS offers proof of concept

play21:42

funding uh if you partner with um an

play21:46

aligned partner here like metal toad

play21:49

then we can go in and um evaluate what

play21:52

you guys are trying to prove out we can

play21:54

then work with uh Amazon to understand

play21:59

how this workload may impact your um

play22:02

monthly Amazon uh web services costs and

play22:07

and based on that you may have the

play22:09

opportunity to um get up to 10% of oneye

play22:13

annual reoccurring Revenue that uh will

play22:16

be paid by Amazon to support this proof

play22:18

of concept so uh it's a one-time cap out

play22:23

um opportunity of up to

play22:25

$25,000 uh and it it would be paid to

play22:28

the partners such as metal tote at the

play22:30

completion of the project so anyway if

play22:32

this is something that's interesting to

play22:34

you or you'd like to learn more about

play22:35

how Amazon uh pocc funding could

play22:39

potentially help you with a machine

play22:40

learning initiative please reach out

play22:42

we'd love to to see how we can get you

play22:44

some free

play22:45

[Music]

play22:48

money thank you Stacy we are running

play22:50

very low on time but let's see how many

play22:52

questions we can get uh answered real

play22:54

quick uh the first one we have here is

play22:58

is are there any best practices or tips

play23:01

for effectively using

play23:03

sagemaker model monitor in real world

play23:06

machine learning

play23:08

projects uh praty can you answer that

play23:10

one for us uh yes so I already talked

play23:14

about um the instance uh utilization and

play23:18

the S3 being in the same region Etc

play23:21

other than that um this is not necessary

play23:24

but as a best practice um some simple

play23:27

things like keeping the variables in

play23:29

lowercase and using underscore because

play23:32

um all of this is going to go through uh

play23:35

Sage sorry through Json and Par and um

play23:40

uh sometimes spark because there are

play23:42

spark jobs run U behind the scenes in

play23:45

model monitor so uh it it kind of um

play23:49

helps ease out the issues because of

play23:52

special characters Etc so that's just

play23:55

one more uh practice then typically Ally

play23:58

model monitor is hosted in a VPC so

play24:01

having all the accesses ready the VPC

play24:03

endpoints available Etc all of those

play24:06

needs to be taken care um as part of the

play24:08

best practices and

play24:10

tips thank you very much

play24:13

yeah and we are at time if you want can

play24:15

stick around we have a couple more we're

play24:16

going to try to get to right now um but

play24:19

if you can't thank you for joining us uh

play24:21

next one how does stagemaker model

play24:23

monitor capture and store the data

play24:25

required for model for monitor Ing and

play24:28

Analysis Ray can you grab that one yeah

play24:31

I can grab that one uh so when you're

play24:33

setting up a data capture configuration

play24:36

you can Define on S3 uh upload path so a

play24:41

particular bucket or URI basically a

play24:44

destination you want the data to go to

play24:46

when you're creating it and then you can

play24:48

also set um a sampling percentage so you

play24:52

can sample a specific percentage of the

play24:54

data that is being fed through that

play24:56

specific endpoint

play24:58

and then that's sort of an S3 for you to

play25:00

use for

play25:02

evaluation

play25:05

okay thank you and last one

play25:08

here what do model monitor scripts look

play25:11

like is it similar to writing a Lambda

play25:13

function are there standard parameters

play25:16

they expect Ray can you answer that one

play25:18

as well yeah I can answer that one um

play25:21

it's very simpler simple or similar to

play25:24

running a normal python script like a

play25:27

Jupiter notebook but essentially when

play25:29

you create the model monitor you can set

play25:31

up environmental variables to be passed

play25:33

in uh that are accessible when the

play25:35

script runs uh but it's it's it's very

play25:38

similar to what you'd expect uh in a

play25:40

Lambda

play25:42

function great thank you very much those

play25:45

are all of our questions I hope to see

play25:46

you all here next time when we do our

play25:48

next webinar

Rate This

5.0 / 5 (0 votes)

関連タグ
Model MonitoringAWS SagemakerMachine LearningProduction EnvironmentData DriftModel DriftBias DetectionModel ExplainabilityML DeploymentSagemaker ClarifyCustom Metrics
英語で要約が必要ですか?