Sagemaker Model Monitor - Best Practices and gotchas
Summary
TLDRIn this episode of 'Tech with Nila', viewers are introduced to advanced practices for utilizing Amazon SageMaker Model Monitor, a tool for tracking data and model quality. The session assumes a pre-existing understanding of Model Monitor and discusses its integration with SageMaker Clarify for bias and explainability reports. The video outlines the types of monitoring available, the process of setting up baselines, and the importance of real-time monitoring with ground truth labels. It also touches on customization options, limitations, and the benefits of SageMaker's managed services for ML engineers and data scientists.
Takeaways
- 🛠️ SageMaker Model Monitor is an advanced tool for monitoring data quality, model quality, and bias in machine learning models, requiring at least a level 200 understanding of the service.
- 🔗 SageMaker Clarify and Model Monitor can be used together, with Clarify providing bias and explainability reports, and Model Monitor performing ongoing monitoring tasks.
- 📈 Model Monitor categorizes monitoring into two types: one requiring ground truth labels for model quality and bias drift, and one not requiring ground truth for data quality and feature attribution drift.
- 📊 To set up Model Monitor, you first create a baseline from the training dataset, which computes metrics and constraints, followed by scheduling monitoring jobs for ongoing data capture and analysis.
- 📝 For monitoring jobs that require ground truth, an additional ETL process is needed to merge production data with true labels before Model Monitor can compare and report discrepancies.
- 👥 Different personas prefer different access methods; machine learning engineers or data scientists might prefer AWS SDKs and CLI, while business analysts might prefer the SageMaker UI.
- 🚫 Limitations exist with the UI, such as the inability to delete endpoints with Model Monitor enabled directly from the console, requiring CLI or API use instead.
- 📊 Model Monitor computes metrics and statistics for tabular data only, meaning it can work with image classification models but will focus on output analysis rather than input.
- ⚙️ Custom VPC configurations may be needed for SageMaker Studio launched in a custom Amazon VPC to ensure Model Monitor can communicate with necessary AWS services.
- 💻 Pre-processing and post-processing scripts can be used with Model Monitor to transform data or extend monitoring capabilities, but are limited to data and model quality jobs.
- 🔄 Real-time monitoring can be achieved by quickly generating ground truth labels, potentially using Amazon's Augmented AI service to add human-in-the-loop capabilities.
- 🔄 Automating actions based on Model Monitor's alerts and data is crucial for effective deployment and ensures continuous model performance monitoring and improvement.
Q & A
What is the prerequisite knowledge required for this session on SageMaker Model Monitor?
-The prerequisite knowledge for this session is at least a level 200 understanding of Model Monitor.
What does SageMaker Clarify provide in the context of model monitoring?
-SageMaker Clarify is a container that produces bias and explainability reports, which can be used independently or in conjunction with SageMaker Model Monitor for recurring monitoring tasks.
What are the two types of monitoring jobs in SageMaker Model Monitor that require the help of SageMaker Clarify?
-The two types of monitoring jobs that require SageMaker Clarify are model quality drift and bias drift, as they both need to compute final matrices based on ground truth labels.
How does SageMaker Model Monitor categorize the monitoring process?
-SageMaker Model Monitor categorizes the monitoring process into two types: one that requires ground truth and one that does not require ground truth.
What is the first step in the monitoring process when using SageMaker Model Monitor?
-The first step is to create a baseline from the dataset used to train the model, which computes the matrices and suggested constraints for the matrices.
What are the limitations of using the SageMaker UI for model monitoring?
-Using the SageMaker UI, certain actions like deleting an interface endpoint with model monitoring enabled cannot be done directly in the console and require CLI or API usage.
How does SageMaker Model Monitor handle data capture for non-tabular data like images?
-For non-tabular data like images, SageMaker Model Monitor calculates matrices and statistics for the output (the label based on the image), not the input.
What is the recommended disk utilization percentage to ensure continuous data capture by SageMaker Model Monitor?
-It is recommended to keep disk utilization below 75% to ensure continuous data capture by SageMaker Model Monitor.
How can real-time monitoring be achieved with SageMaker Model Monitor?
-Real-time monitoring can be achieved by quickly generating ground truth labels, which can be facilitated by leveraging Amazon's Augmented AI service to add human-in-the-loop capabilities.
What customization options does SageMaker Model Monitor offer for pre-processing and post-processing?
-SageMaker Model Monitor allows the use of plain Python scripts for pre-processing to transform input data or for post-processing to extend the code after a successful monitoring run. This can be used for data transformation, feature exclusion, or applying custom sampling strategies.
What are the requirements for creating a data quality baseline in SageMaker Model Monitor?
-For a data quality baseline, the schema of the training dataset and inference dataset should be the same, with the same number and order of features. The first column should refer to the prediction or output, and column names should use lowercase letters and underscores only.
How can the generated constraints from the baseline job be reviewed and modified?
-The generated constraints can be reviewed and modified based on domain and business understanding to make the constraints more aggressive or relaxed, controlling the number and nature of violations.
What is the significance of setting start and end time offsets in monitoring jobs that require ground truth?
-Start and end time offsets are important to select the relevant data for monitoring jobs that require ground truth, ensuring that only data for which the ground truth is available is used.
How can SageMaker Pipeline be used for on-demand model monitoring?
-SageMaker Pipeline can be used to set up on-demand monitoring jobs by leveraging the 'check_job_config' method for quality checks and 'clarify_check_config' for bias and explainability checks, making it serverless and cost-effective.
What are the extension points provided by SageMaker Model Monitor for bringing your own container?
-SageMaker Model Monitor provides extension points that allow you to adhere to the contract input and output to leverage the model monitor. The container can analyze data in the 'dataset_source_path' and write the report to the 'output_path', with the ability to write any report that suits your needs.
How can the data from SageMaker Model Monitor be used to automate actions like model retraining?
-The data from SageMaker Model Monitor can trigger CloudWatch alarms and EventBridge rules when drifts are detected and matrix constraints are exceeded, which can in turn start a model retraining pipeline, automating the retraining process based on monitoring violations.
Outlines
📚 Introduction to SageMaker Model Monitor Best Practices
This paragraph introduces the video's focus on best practices for using Amazon SageMaker's Model Monitor. It sets the prerequisite of having at least a 200-level knowledge of Model Monitor and clarifies that the session is advanced. The speaker invites viewers to comment for a basic overview if needed. The paragraph outlines the types of monitoring available, such as data quality, model quality metrics, and feature attribution drift. It also introduces SageMaker Clarify, a tool for generating bias and explainability reports, and explains its relationship with Model Monitor. The paragraph concludes by detailing the working process of Model Monitor, distinguishing between monitoring types that require ground truth and those that do not.
🛠 Model Monitor Limitations and Data Capture Considerations
The second paragraph discusses limitations of the SageMaker Model Monitor, particularly when using the UI for deleting endpoints with enabled monitoring schedules, which must be done via CLI or API due to UI restrictions. It also covers the model monitor's capabilities, such as computing model metrics and statistics on tabular data only, and its current lack of support for multi-model endpoints. Additionally, the paragraph addresses the importance of setting up a VPC endpoint for SageMaker Studio in a custom Amazon VPC and provides guidelines for data capture, including region consistency and disk utilization thresholds to ensure continuous data capture for monitoring.
🔍 Deep Dive into Model Monitor's Baseline and Customization
This paragraph provides an in-depth look at creating a baseline for Model Monitor, which is crucial for monitoring data and model quality, bias, and explainability. It outlines the requirements and best practices for different types of baselines, emphasizing the importance of schema consistency between training and inference datasets. The paragraph also discusses the use of the kernel SHAP algorithm for feature attribution drift and the trade-offs involved. It explains how to review and adapt baseline constraints generated by Model Monitor to suit specific use cases, and how to schedule monitoring jobs for real-time endpoints and batch transforms, including on-demand monitoring through SageMaker pipelines.
🚀 Automating Model Retraining with SageMaker Model Monitor
The final paragraph highlights the importance of effective deployment and model monitoring for continuous oversight of model performance. It describes how to automate actions based on the data collected by Model Monitor, using an example architecture that triggers a retraining pipeline when data drift is detected and model metrics exceed predefined thresholds. The paragraph emphasizes the power of SageMaker Model Monitor as a fully managed tool and encourages viewers to leverage it, while also inviting feedback on the discussed best practices.
Mindmap
Keywords
💡SageMaker
💡Model Monitor
💡Data Quality
💡Model Quality Matrix
💡Data Drift
💡Feature Attribution Drift
💡SageMaker Clarify
💡Ground Truth
💡Pre-processing
💡Post-processing
💡Baseline
💡Bias Drift
💡CloudWatch
💡S3
Highlights
Introduction to best practices for using SageMaker Model Monitor.
Prerequisite of having at least level 200 knowledge of Model Monitor.
Explanation of SageMaker Model Monitor's capabilities for data and model quality monitoring.
Clarification on the relationship between SageMaker Clarify and Model Monitor.
Use of SageMaker Clarify for producing bias and explainability reports.
Model Monitor's role in recurring monitoring of models and data.
Different types of monitoring categorized into those needing ground truth and those not needing it.
Process summary of creating a baseline, scheduling monitoring jobs, and comparing matrices.
Importance of ground truth in model quality drift and bias drift monitoring.
ETL process setup for merging production data with ground truth labels.
Different access mechanisms for machine learning engineers, data scientists, business analysts, and data analysts.
Limitations of using SageMaker UI for certain operations like deleting endpoints.
Model Monitor's support for only tabular data and not for image classification models' input.
Current limitations on monitoring multi-model endpoints and support for batch transform.
Requirements for setting up a VPC endpoint for SageMaker Studio in a custom Amazon VPC.
Guidelines on managing disk utilization for continuous data capture.
Use of Amazon's Augmented AI service for real-time monitoring by generating ground truth labels.
Customization options in Model Monitor for pre-processing and post-processing scripts.
Best practices for setting up a data quality baseline with considerations for schema and feature order.
Model quality monitoring setup requiring access to datasets with ground truth labels.
Feature attribution drift measurement using the kernel shuffle algorithm and considerations for surrogate model complexity.
Bias drift baseline setup with configuration options for label value, threshold, facet name, and facet value.
Automated processing job for computing statistics and constraints for the baseline.
Importance of reviewing and adapting baseline constraints based on domain understanding.
Scheduling monitoring for real-time endpoints and considerations for ground truth availability.
Use of SageMaker Pipeline for setting up on-demand monitoring jobs.
Extension points for bringing your own container to leverage Model Monitor for custom use cases.
Automating actions based on Model Monitor alerts for continuous model improvement.
Transcripts
hello everybody and welcome to the new
episode of tech with Nila so in today's
session we are going to talk about uh
best practices of using CH maker model
monitoring now the prerequisite of this
is you need to have at least level 200
level knowledge of model monitor so to
make a model Monitor and this is a
little bit advanced
session for those who are already aware
of the model monitor but if you feel
that you need to have a level 200
session or just a basic overview of what
sagemaker model monitor is and maybe do
a demo Etc please comment on my video
thank you so let's get started
model monitor provides the following
types of monitoring so it'll help you to
monitor just in the data quality
drift in the model quality Matrix such
as
accuracy
and monitor the buyers in your model
prediction or monitor drift in the
feature attribution
now if you take a look at the sagemaker
model Monitor and if you go through the
documentation there's a mention of CH
maker clarify in conjunction with model
Monitor and you'll be thinking what what
is happening here so let's let's talk a
little bit more about it right
so sagemaker clarify is essentially a
container that produces the bias and
experimulatory report and chpco monitor
is actually a service which performs
these recurring monitoring on the models
and the data which is captured from the
endpoint or the bad transform job now
the two out of the four as shown in this
in this picture here right requires the
help of sagemaker clarify container so
in in conclusion you can use sagemaker
clarify independent to do a one-off job
but you can also use it in conjunction
with sagemaker model monitor to run
those recurring clarified jobs
now in terms of the working process of
model monitor we can categorize the
model monitoring out of the four into
two two different types right one that
needs ground truth and one that doesn't
need the ground root when I talk about
ground truth I'm not talking about the
ground truth service that AWS offers but
I'm talking more in terms of the actual
true labels which are recorded from the
business decision that was taken now
both data quality drift and feature
attribution drift don't really need the
ground truth label the monitoring job is
scheduled on an interval defined during
the scheduling setup and the schedule
wakes the model monitor to check budget
so the process summary is as follows so
step number one is you would create a
baseline from the data set that was used
to train the model now the Baseline
would compute The Matrix and the
suggested constraint for The Matrix
step number two is you would schedule a
monitoring job which captures the data
in production and computes the
production Matrix
and step number three is you would use
the model monitor
but the way it would do what it would be
in the background so what it would do is
it will compare the Matrix and send the
results or violation report to s Street
or cloudwatch
now if you look at the other two types
which requires the ground truth right it
is very similar right the only
difference is that both model quality
drift and the bias drift requires the
ground truth in order to compute the
final Matrix now
here is additional step as well right
you will need to set up another ETL
process to bring all the ground root
label to S3 adjuring to a specific
format as I have highlighted here in
this picture and the process follows the
same flow so you will first create a
baseline from the data set that was used
to train the model then you would merge
the data capture from the production
endpoint with the ground truth label and
then the model monitor will actually
compare the Matrix and push the result
or the violation report to S3 or
cloudwatch
now in terms of the way it is accessed
there are different mechanism let's talk
about it in terms of the Persona so AWS
python SDK and CLI and sagemaker
pipelines are preferred method for
machine learning Engineers or data
scientists whereas business or data
analysts prefers to use the sagemaker UI
now there are some limitation while
using UI for example if you want to
delete the interface endpoint hosted in
sagemaker which already has the model
monitor enabled you will have first
delete the model monitoring schedule via
CLI or API and you can't do it on in on
the console so I know it kind of gets a
little bit tricky and you would get an
error as shown in this particular slide
so you would have to either use CLI or
API to delete it and then from the UI or
through API or CLI you can delete it I
know it's kind of hard but it is the way
it is right now
now the model monitor computes model
Matrix and statistics on tabular data
only for example if you have an image
classification model right that takes
image it's not tabular data but it's an
image as an input and output is a label
based on that image that needs you can
still use model monitor here but the the
key thing to keep in mind is the model
monitor will calculate Matrix and
statistic for the output and not really
the input
model monitor currently supports only
endpoints that host a single model and
does not support monitoring multi-model
endpoints not yet right it also supports
batch transform
so if you have a sequence of different
inference and points right leveraging
the inference pipeline like a single
endpoint but different models in the
background you can still use model
monitor but the way it would do it it
would capture and analyze data for the
entire Pipeline and you will not really
for the individual containers in the
pipeline
and if you launch sagemaker studio in a
custom Amazon VPC you will need to
create a VPC endpoint to enable model
monitor to communicate with Amazon S3
and cloudwatch
now in terms of the data capture
the first thing that you have to keep in
mind is the AWS region so you want to
make sure that the S3 data capture
should be in the same region as the
model monitor schedule
and you want to keep an eye on the
inference instance right you want to
make sure that the disk utilization is
below 75 the reason is to prevent the
impact to the inference request the data
capture stops capturing requests at a
high level of disk utilization and
that's why it is recommended that you
keep your disk utilization below 75
percent in order to ensure data capture
continuous capturing request
and then if you want that real-time
monitoring effect
the way you can do it is you want to
generate ground truth label quickly and
you can leverage Amazon's augmented a to
I service to add the human in the loop
capability who will generate the the
ground truth label and hence you will
achieve the real-time monitoring effect
now you can customize uh some part of it
right some part of the model Monitor and
you can Leverage The pre-processing and
post processing these are plain python
script to transform the input to your
model monitor or extend the code after a
successful monitoring run you can upload
this file to S3 and reference them by
create when you're creating your model
Monitor and this can be using the create
underscore monitoring underscore
scheduled method note that it is it will
only work
with data and model quality jobs so
pre-processing can be used for for
example data transformation as shown in
this particular slide suppose the output
of your model is an array but the
sagemaker model monitor container really
requires tabular or flat and Json
structure so you can use the
pre-processing script to transform the
array into the correct Json structure
you can also use it for feature
exclusion so suppose your model has an
optional feature and you use -1 to
denote the optional feature has a
missing value if you have a data quality
monitor you may want to remove that so
that it's not included and doesn't raise
any kind of violation right and you can
use the preprocessing script to do just
that you can remove those values so
exclude certain features
you can also use it to apply a custom
sampling strategy in your pre-processing
script for example in this use case it
is 10 percent
if your pre-processing script returns an
error you can check the exception
message locked in cloudwatch to debug
but in the pre-processing script you can
also add additional logging in
and then for the post processing Crypt I
don't really have an example but other
way I would want you to think about is
it is when you want to extend the code
following a monitoring run so you want
to call a business application API or
trigger an ETL process or anything that
needs to succeed once the monitoring job
is completed
now let's take let's take a look at the
Baseline right so after you have
configured your application to capture
data during inference the first step to
monitor data and model quality or bias
or explainability is great Baseline so
all the four step has this process in it
right now let's take a look at some of
the requirements and best practices for
different Baseline so if you're using it
for data quality the data quality
Baseline let's take a look at that first
scenario the schema of the training data
set Baseline data set and inference data
set should be the same
both the number and the Order of the
feature the First Column should refer to
the prediction or output an important
point to note here for avoiding error is
to ensure that the column names for the
Baseline data set has only lowered case
letters and underscore as the only
separator now this is to maintain the
maximum compatibility between Spa CSV
Json and parquet special characters can
cause issues too so be careful there as
well now in model quality monitoring the
predictions are compared to the ground
truth label hence a model quality
Baseline needs access to a data set that
contains the ground truth label and
prediction from the model being
monitored
now for feature attribution drift
the way it is measured is by using the
kernel snap algorithm one needs to make
a trade-off between the time taken and
the complexity of the surrogate model by
using the number of samples parameter
above all the main thing is to provide a
good Baseline for your users typically
training data validation data or a
golden batch data
okay now let's take a look at the bias
drift Baseline right you can specify it
when you specify the configuration bias
config and you have the label value or
threshold to indicate the positive
outcome the facet name to specify the
sensitive attribute column or the
feature column and the facet value or
threshold to specify the value in that
particular attribute
now let's take take a look at the second
example here right so the facet theme is
petal petal length The Petal length of
the flower and the facet threshold that
you have is set to five
now the suggest underscore Baseline
method of the model monitor or model
quality monitor classes triggers a
processing job that computes the Matrix
and constraint for the Baseline now the
result of the Baseline job are two files
like we mentioned in the previous slide
statistic.json and constraint.json now
you can review the generated constraint
and modify them before using them for
monitoring based on your understanding
of the domain and the business problem
you can actually make constrained more
aggressive or relax it to control the
number and the nature of the violation
or like in the given example you can add
a field to define a string constraint
based on the understanding of your data
as a best practice always review the
Baseline constraint generated by model
Monitor and adapt it as per your use
case
now to schedule monitoring for real-time
endpoints you would use a create
underscore monitoring underscore
schedule method of the respective model
monitor class now when the ground truth
is required for monitoring jobs like
model quality you will need to ensure
that a monitoring job only uses data for
which the crown truth is available you
can also want to keep an eye on the
start time offset and end time offset to
select the data that you want to use so
in quick example of the first use first
example here right if your ground truth
comes in three days after the prediction
has been made you would want to set the
start time offset to p3d and end time
offset to minus P 1 D which is three
days and one day respectively
now if your ground truth arrives six
hours after the prediction and you have
an hourly schedule then you would use
the six hours and one hour in that
offset now you can schedule model
monitoring job for both a real-time
endpoint and a batch transform like we
talked in the previous slide
you also have an option to run these job
on demand doesn't have to be scheduled
all the time and the way you can do it
is by leveraging sagemaker pipeline
which is a fully managed ml Ops service
that siegemaker offers is completely
free of charge to use but until you will
only end up paying for the underlying
compute so it's serverless in that
manner
and the way you can do that set up your
on-demand monitoring job is by
leveraging the these two methods so
check
job config and then you would specify
the quality check config and then for
the clarify one for checking the bias
and explainability you can leverage your
clarify check config
now sagemaker do offer a lot of
pre-built algorithms and containers but
if you have a use case wherein you have
to have to bring your own container then
that should not stop you from leveraging
model monitor sagemaker model monitor to
provide a pre-bill container with the
ability to analyze the data captured
from endpoint or batch transform job for
tabular data set
but if you are bringing in your own
container you would it provides you
extension points which you can leverage
basically you have to adhere to the
contract input and contract output to
leverage the model monitor the container
can analyze the data in the data set
underscore Source path and write the
report to the path in the output
underscore path the container code can
write any report that suits your name if
you use the following structure and
contract certain output files are
treated specifically by sagemaker in the
visualization and API
some scenarios where this can be useful
is when you want to design monitoring
for computer vision or natural language
use cases or design any kind of custom
metrics you can go ahead and leverage
this mechanism
now you have all this data and you you
know you're getting alerted as well but
what do the data is good only when you
to take it take that data to do certain
action and if you automate the action
then that's the most beautiful part
about it right so having a effective
deployment you know and model monitoring
phase is important right and you want to
continuously keep an eye on what's
really happening
in this example it's just one of the
sample architecture and I will link the
blog post in the description of this
particular video
what it does is it will detect a drift
in data drift for the model which was
deployed right with respect to what was
the trading Baseline
and once that drift is detected and the
Matrix exceeds the model specific
threshold
then a cloud watch alarm is triggered
and an event Bridge rule will start the
model Bill pipeline that means it will
trigger the retraining and hence in this
way you can have the model retraining be
automated based on the violation raised
by the model monitoring
so that's it from my side I mean the
idea here is for us to leverage the best
practices while you're using Sage picker
model monitor it's really a powerful
tool and it's fully managed so you know
go ahead and use it and let me know what
you think
関連動画をさらに表示
AWS DevDays 2020 - Deep Dive on Amazon SageMaker Debugger & Amazon SageMaker Model Monitor
AWS re:Invent 2020: Detect machine learning (ML) model drift in production
Model Monitoring with Sagemaker
AWS re:Invent 2020: Understand ML model predictions & biases with Amazon SageMaker Clarify
ML Engineering is Not What You Think - ML jobs Explained
Online Machine Learning | Online Learning | Online Vs Offline Machine Learning
5.0 / 5 (0 votes)