Build train and deploy model in sagemaker | sagemaker tutorial | sagemaker pipeline
Summary
TLDRこのビデオでは、Amazon SageMakerを使用して機械学習モデルを構築、トレーニング、そしてデプロイするステップバイステップのプロセスを解説します。技術的なセッションで、AWS環境での機械学習の差異に焦点を当て、AWS SDKであるboto3やS3のデータバックアップ、モデルのハイパーパラメーター調整、デプロイメントの簡易性を紹介。最後に、AWSのハンズオンリンクを提供し、ユーザー自身で実践できるように導線を提供します。
Takeaways
- 📚 まずはAWSのSageMaker Studioの使い方を学び、ドメインの作成方法を含むAWSの基礎を押さえることが重要です。
- 🔍 SageMakerでは、AWSのサービスと連携する際にboto3というAWS SDKライブラリを使用し、Python環境でAWSサービスと通信します。
- 📈 モデルのトレーニングには、SageMakerのイメージURIを用いて、AWSが提供する特定の機械学習アルゴリズムを直接使用できます。
- 💾 データの取り扱いはS3バケットを中心に行われ、データのバックアップやモデルのトレーニング・検証に使用されます。
- 🛠️ モデルのトレーニング前には、データのEDA(探索的データ解析)と前処理を行い、特徴量エンジニアリングを適用して最適なモデルに向けてデータを準備します。
- 🔧 モデルのハイパーパラメーターの調整はSageMaker内で簡単に行い、トレーニングプロセスを最適化することができます。
- 🚀 トレーニングが完了したら、SageMakerの`deploy`メソッドを用いてモデルを簡単に展開し、エンドポイントを作成することができます。
- 🔗 モデルのエンドポイントを作成することで、APIや第三者システムへの接続が可能で、モデルの応用範囲が広がります。
- 🗑️ プロジェクト終了後は、エンドポイントやS3バケットのクリーンアップを行って料金の発生を防ぐことが推奨されます。
- 🔬 SageMakerを活用することで、データサイエンスや機械学習プロジェクトをクラウド上で効率的に実行し、スケーラビリティや管理性が向上します。
- 👍 このビデオの内容が気に入った場合は、好評コメントやいいねを残すことで、さらなるAWS関連のビデオコンテンツが提供される可能性が高まります。
Q & A
ビデオで説明されているステップはどのようなプロセスを表していますか?
-ビデオでは、Amazon SageMakerを使用して機械学習モデルを構築、トレーニング、そしてデプロイするステップバイステップのプロセスが説明されています。
Amazon SageMakerスタジオで最初に行うべきことは何ですか?
-最初に行うべきことは、SageMakerセッションを開くことです。これは、SageMakerでアクティビティを実行するためのセッションを作成するものです。
データのインポートと環境設定とはどのような意味ですか?
-データのインポートと環境設定とは、モデルを構築する際に必要なデータをAWSから取得し、そのデータをPandasで確認し、S3にアップロードするプロセスを指します。
boto3とは何であり、なぜAWS環境で重要なのですか?
-boto3はAWS SDKのPythonライブラリで、Python環境でAWSサービスとやり取りするために使用されます。AWS環境で機械学習モデルを構築する際には、boto3を使用してAWSサービスと通信する必要があります。
モデルのトレーニングに使用されるXGBoostのイメージURIはどのように取得するのですか?
-モデルのトレーニングに使用されるXGBoostのイメージURIは、SageMakerのimage_urisモジュールから取得することができます。
トレーニングデータと検証データはどのようにS3にアップロードするのですか?
-トレーニングデータと検証データは、boto3を使用してS3にアップロードします。boto3のsession.resourceメソッドを使ってS3バケットにデータをアップロードします。
SageMakerのトレーニングモデルの定義には何が必要ですか?
-SageMakerのトレーニングモデルの定義には、コンテナ、ロール、インスタンス数、インスタンスタイプ、出力パス、およびSageMakerセッションが必要です。
モデルのデプロイメント時に使用されるデプロイメソッドとは何ですか?
-モデルのデプロイメント時に使用されるデプロイメソッドは、SageMakerのestimatorオブジェクトのdeployメソッドです。これにより、指定されたインスタンスタイプと数でモデルのエンドポイントを作成することができます。
シリアルライザーとは何であり、モデルのデプロイメントにどのように役立つのですか?
-シリアルライザーは、推論エンドポイントのためのデータのシリアル化を実装するメソッドです。モデルのデプロイメント時には、CSVシリアルライザーを使用して、CSV形式のデータがエンドポイントに送信されるように変換されます。
モデルのトレーニングジョブが完了した後、エンドポイントを削除する理由は何ですか?
-モデルのトレーニングジョブが完了した後、エンドポイントを削除することは、エンドポイントが実行されている間は課金されるためです。エンドポイントを削除することで、不要なコストを回避できます。
Outlines
😀 SageMakerで機械学習モデルの構築・学習・展開
このセクションでは、Amazon SageMakerを使用して機械学習モデルを構築、学習、展開するステップバイステップのプロセスが解説されています。技術的なセッションであり、AWS環境での機械学習の特有の点に焦点が当てられています。SageMaker Studioの使い方や、AWS S3からデータを取得、解凍、pandasを使用したデータの可視化、SageMakerセッションの開始、そしてS3へのデータのアップロードが説明されています。
📈 EDAとデータ前処理、モデルのトレーニング準備
第2段落では、データの可視化、前処理、特徴量エンジニアリング、そしてデータの分割について説明されています。EDA(Exploratory Data Analysis)を通じて、カテゴリーデータの頻度テーブル、数値データのヒストグラム、相関分析が行われ、特徴量エンジニアリングによってダミー変数の作成や相関が高い変数の削除が行われています。さらに、boto3を使用してS3にトレーニングデータ、テストデータ、検証データをアップロードするプロセスが紹介されています。
🛠 モデルのトレーニングとデプロイ
このセクションでは、SageMakerでモデルのトレーニングとデプロイのプロセスが詳述されています。XGBoostモデルを使用する場合の例として、AWSが提供するイメージURIを通じてモデルを取得し、S3からトレーニングデータと検証データを取り込み、SageMakerのestimatorを使用してモデルを定義しています。ハイパーパラメータの設定、トレーニングの開始、そしてデプロイメントのためのエンドポイントの作成方法が解説されています。
🔧 モデルのシリアライゼーションとエンドポイントのクリーンアップ
最後のセクションでは、モデルのシリアライゼーションとエンドポイントのクリーンアップについて説明されています。シリアライゼーションは、推論エンドポイントでデータを正しく処理するために必要で、CSVシリアライザを使用することが提案されています。また、モデルのデプロイメント後、エンドポイントの削除が重要で、料金が発生しないようにするために行われています。AWSのハンズオンリンクが提供され、ステップバイステップでモデルを構築、学習、デプロイする方法が学べる場所が紹介されています。
Mindmap
Keywords
💡Amazon SageMaker
💡AWS
💡機械学習モデル
💡トレーニング
💡デプロイ
💡S3バケット
💡boto3
💡ハイパーパラメーター
💡データのシリアル化
💡予測
Highlights
视频将介绍使用Amazon Sagemaker构建、训练和部署机器学习模型的逐步过程。
提醒观众作者有关于AWS和数据科学的播放列表,共六段视频,供复习。
展示了如何在Sagemaker Studio中打开Sagemaker并创建域。
强调了AWS环境中构建模型与本地环境的不同之处。
演示了如何使用wget命令从AWS服务器获取数据。
介绍了如何使用pandas库查看和处理数据。
解释了在Sagemaker中导入数据和设置环境的方法。
讨论了使用boto3库与AWS服务进行交互的重要性。
说明了如何使用Sagemaker会话上传数据到S3存储桶。
展示了如何进行EDA(探索性数据分析)和数据准备。
介绍了特征工程,包括创建虚拟变量和基于分析删除列。
解释了如何使用Numpy的split函数分割训练、验证和测试数据集。
讨论了使用boto3与S3通信并上传训练数据集的过程。
介绍了如何在AWS环境中使用image URIs来获取机器学习模型。
解释了如何定义和训练模型,包括设置实例数量和类型。
展示了如何使用Sagemaker的deploy方法来创建模型端点。
讨论了模型部署后的数据序列化和预测函数的实现。
强调了在模型部署后删除端点以避免产生额外费用的重要性。
提供了一个AWS的实践链接,供观众亲自实践构建、训练和部署模型的过程。
总结了在Sagemaker环境中工作时需要记住的五六个主要事项。
鼓励观众通过点赞和评论来支持更多相关内容的创作。
Transcripts
in this video I am going to explain you
stepwise process of building training
and deploying a machine learning model
with Amazon sagemaker okay so this is
going to be a technical session guys be
ready to pay lot of attention and hence
you will learn lot of things in this
video before going into the video guys I
want to remind you that I have Amazon
AWS playlist on and full data science
there are six videos in case you want to
revise what all we have covered till now
okay in the last video I showed what are
the important components from Sage maker
point of view or machine learning point
of view in AWS okay today we are going
to see how to build train and deploy our
first model let's go ahead and I am into
this sagemaker studio guys and I told
you in last video how to open sagemaker
Studio how to create a domain Etc
everything okay the link is right here
I'll paste in the description as well
you guys can revise if you want
now what we will do is I am not going to
explain you everything which is General
between AWS and normal machine learning
okay I am going to explain you things
which are very specific to AWS because
Normal things more or less you know how
to build Trend and deploy a model but
how things are different when it's an
AWS environment that is what the focus
of explanation is going to be okay so
let's go ahead and try to see what I'm
doing in each step here
so four or five steps four steps
basically this notebook is divided into
first is environment setting and data
import that you can see here environment
setting and data import okay so what I'm
doing here first thing that we will need
is
data right so data we will need now when
you when you build a model in your local
environment or when you build a model in
in your local computer or any other
machine right you may have a CSV file
you may connect to a database you may
get it from some API Etc here what I am
doing is I am getting data from Amazon
itself AWS itself okay so AWS has kept
some data on some location for example
here you can see here Us West S3 US West
Amazon AWS then some data is there
autopilot direct marketing Bank
additional so this is the URL for the
data which they have given it for free
to use to people for practice purpose
learning purpose okay so I need this
data how I will get this data
I'm just running a simple command here
wget wget I want to hit that URL I want
to get that file and then I am unzipping
that data so first thing first how we
are getting the data we are getting the
data kept on AWS server through wget
what is w gate just a normal command
Linux command you can say to get data
from some URL okay so once I run this I
get the data and I unzip that data now
you can see here I have run this
notebook already so this is showing like
this okay
now I am importing pandas and I am just
showing you how that data looks like
okay so as you can see there are
multiple columns age of marital status
education default housing contract Etc
and towards the end you will see the
target column for this data okay
this column is your target column the
last column y column okay now all these
are your independent columns so as I
told you we will put focus on only the
features which are different between
when you build a normal normal thing you
do and when I am doing it in Sage maker
okay so in Sage maker you have to import
Sage maker first okay and then
this is a usual practice of opening sage
maker session so you say s is equal to
Sage maker dot session which means you
are opening a sage maker session here
you want to perform some activities so
you are opening a sales maker session
here using this stage maker session what
I am doing is I am uploading the data
the data which I have just downloaded I
am uploading the data to one of the S3
location this I am doing to keep a
backup of my data for example tomorrow
if I want to refer which data I build my
model on right so in the first step I
extracted the data from some AWS
location in this step I am pushing that
in a S3 bucket and this is the bucket
location where I am pushing that okay
and how I am doing that using this
upload data so what we have done until
now taken and pushed it to the AWS data
now let's move ahead guys and one
important thing you need to know here is
called boto3
Moto 3 is something that you don't use
outside AWS because this is basically a
AWS SDK kind of Library you can think of
that makes you enable to work in Python
environments and interact with different
AWS services so suppose you want to use
python in AWS environment you need both
two three to interact with multiple
Services okay that is the usability of
AWS boto3 now you come here and you
import something known as execution role
and then you say role is equal to
execution rule so as you know AWS is a
cloud platform so roles and permissions
are quite important here okay just pay
attention to what all new things we are
doing that's all okay you no need to go
much deep into model building steps Etc
just new things so new thing is Moto 3
new thing is role okay
let's move ahead all these things are
normal packages important apart from
sagemaker which I had done before also
but let's let it be here okay
and then I'm just doing some Eda and
data preparation so what I'm doing is
I'm doing frequency tables for each
categorical feature and then histogram
for each numerical features okay
so I will not go into much detail of
normal python codes okay so these are
nothing but you know
um seeing the percentages of in my
categorical feature which is how many
times and
some of these you know describe kind of
thing and then some of these Scatter
Plots Etc okay
then I am doing some correlation
analysis here very simple python code
okay nothing new here also so I will
skip this part as well
then I am doing some kind of feature
engineering here this is normal machine
learning stuff so if you can see line
one I am just replacing few values with
some other values line two also same
thing line three I am creating dummy
data okay dummy variables so normal
machine learning stuff here as well okay
then dropping some column based on my
analysis of correlation and based on my
analysis of scatter plot
now this line is important I am breaking
Trend validation and test data by doing
NP dot split here okay
so I am breaking in some proportion here
as you can see and we are having now
Trend test and validation set here
if you go one line down then what we are
doing we are preparing our train data
okay train data validation data test
data and you know predictor and response
we are separating okay
the next block of line which you are see
here which you are seeing here what's
happening just pay attention here guys
as I told you boto is a package that
makes python developer work in AWS
environment
and enable python code to communicate
with various AWS services so here using
boto what we are doing is we are talking
to S3 so see here boto3 dot session dot
resource which resource are we talking
to S3 okay
then we are saying bucket which bucket I
want to talk to I want to talk to
sagemaker session dot default bucket
okay I have taken one bucket you can
give any other bucket as well given this
role or whichever user is running this
has permission to that bucket or that
user whichever role that user has
acquired permission is important guys
otherwise this will fail okay
so default bucket I have taken in this
default bucket what I am doing if you
see this one line from here to here I am
just uploading my train.csv that I
prepared in this line in this block so
what I am doing in the default bucket
using the help of bo23 I am uploading my
train data test data validation data and
all the data that I want to upload okay
now this is two parts of the of this
problem is done for example data import
is done and Eda and data preparation is
also done
let's move ahead to the training model
part so
in training model
important things and different things
from normal day-to-day work right so
normally when you when you want to use
let's say XD boost in Python how you
will use you will import XE boost
package right but in AWS environment AWS
has given you many packages in the form
of image URI so see these two lines guys
carefully here from sagemaker import
image underscore Uris and container is
equal to imageurize dot retrieve every
Boost from this region version whatever
you want this line is critical and
important here suppose you want to use
random Forest suppose you want to use
decision tree suppose you want to use
some other module okay one way of doing
that would be you can get it from python
or you can do a independent installation
and get it here second way of doing it
is directly it's kept in the
AWS repository okay how do you get from
AWS repositories you can just import
image URI and you can import whatever
model you want to use like this okay
this is important
now I want to take train data and I want
to take validation data you remember
where we had kept these data we had kept
in S3 bucket default bucket okay so from
this we are importing it now see the
difference guys this is the real world
process of how you work in a machine
learning project right it's not like you
will always work in a notebook and you
know file is there in the notebook and
you are using it you are keeping it at a
location and then you are pulling it
that's what I am doing here I am pulling
the data that I had kept in the bucket
okay
and what is the way to pull the data
sagemaker dot inputs you just say this
is my input data that's all okay this is
my input data this is my validation data
once you do that then this line I had
run before as well so no need of this
line basically
and then
sagemaker.estimator dot estimator and
here you have to say what is your
container container is nothing but HD
boost model that we got now okay what is
your role if you remember in the
beginning we created git execution rule
okay instance count how many instances
you want to run
what type of instance instance means you
can think of server or ec2 to keep it
very simple
output path where you want to store your
output sagemaker session if you remember
this is the sage maker session that one
line I had run in the beginning also
okay so this is your defining the model
till now you are just defining your
model and now you will go here and you
will just say these are my initial hyper
parameters now go ahead and train the
model fgb dot fit at the moment you say
xgb dot fit right
it will start fitting your model on S3
input Trend and validation on S3 input
validation now if you can see here I had
run this before
okay so starting the training job
preparing the instance for training
downloading input data downloading
training image info data is your data
training image is your file LG Boost
file okay
and then it will start running your
model so see here and then it took some
time when I ran it that is the reason
I'm not lively in the live session
running it okay and then it says you
uploading generated training model so
where it will upload you had given a
path right then training job completed
how much time 98 seconds billable
seconds is 98 I have to pay for this 98
seconds not a problem
and then
this line is very important here guys
SGP predictor is equal to xgb dot deploy
initial instance contents and instance
type MLM for x large pay attention here
guys
how Cloud platform makes your life easy
okay now suppose in the in the real
world or in your day-to-day job if you
want to host a model if you want to
create a endpoint for your model right
it's a big task for you you have to host
it using some API kind of environment or
some kind of python web web UI kind of
environment
right AWS makes it very very simple for
you you just call this deploy method
okay you just say exhibit or deploy and
say on how many instances you want to
deploy and what is the instance type you
want to deploy on if you say these two
things right
it will simply create a model endpoint
for you so create model with name this
create endpoint with name this and
create endpoint with name this so it
will just create endpoint for you now
this model is ready for consumption okay
but before that there are few more
things we have to do the serialization
for the for our model okay so what is
serializer it's a very simple concept I
have opened here this also
so Implement methods for serializing
data for an inference endpoint so
whenever you send your data for
prediction right that needs to be
serialized and that is what will happen
here okay CSV serializer will work here
because our data is in CSV form if you
remember okay
now I am defining a predict function
here and just simply calling this
predict function with our test data okay
test data was created here if you
remember guys test data was created here
when we were breaking intest and Trend
right so here if you see we had written
it to to a S3 bucket list or CSV okay
and what we are doing here we are simply
saying run this predict function on that
data okay
on the test data and then we will take
this and we will do the prediction PD
dot cross step confusion Matrix all
these things you know already once
everything is done then just to just to
ensure that we do not keep that endpoint
running otherwise uh you know money will
be going because that endpoint is
running right so delete the endpoint is
a good practice
so in this entire process guys there are
five six main things that you have to
remember let me summarize you okay as I
am telling you again and again don't
bother about what is this model and how
model is working Etc I did not explain
knowingly those things remember
when you are working in stage maker kind
of environment it's always good to keep
your data in S3 or some of the AWS
storage okay various types of AWS
storage are there which I have a video
in that playlist you can watch that
second thing that you have to you have
to keep in mind is you will need Sage
makers and then session you need to open
you need to keep in mind you will use
boto package whenever in Python you are
interacting with any of the AWS Services
okay you need to keep in mind that
leave these machine learning things you
need to keep in mind that always keep
taking a backup of your data in S3 S3 is
a cheap storage so in S3 okay then you
no need to import from pandas or
somewhere if you want to use any
packages something you will get that in
AWS Itself by using image URI okay you
can hyper you can tune your models very
simply here you can deploy your model by
just calling dot deploy function and
it's very easy to create an endpoint
here and your model is live now my model
is live okay I want to create an API for
this I want to connect it to a third
party system all these things are doable
in just few lines of code okay
now you will also want to practice this
so let me give you a very good link
where you can go and do all these things
that I have done here okay
so go to here this this link I will
paste in the description AWS amazon.com
Hands-On build train and deploy uh Sage
maker okay
so come here and you will see all the
steps that I ran just now
and step by step create a notebook
instance okay and then prepare your data
then what are the code will be also
there you can copy it and run it and if
you go all the way till deletion of the
end point see cleanup it is telling to
clean that
the last step okay clean that model
endpoint otherwise you may be charged it
is also deleting the buckets where we
stored our data or intermediary files
just to be on the safer side so that
there is no charge okay I will give you
this link if you are into data science
if you have to do anything with machine
learning ensure you get your hands dirty
in sagemaker okay you might get
frustrated because things are little
different than how we learn day-to-day
machine learning but remember guys this
is worth giving your time okay I will
come up with more videos on AWS but
first ensure you learn everything up to
here so now there will be seven videos
in this series okay first is basic
secondary stored is third is Computing
fourth is application deployment fifth
is CLI sixth is stage maker introduction
and including this this is building
training and deploying a sage machine
learning model in sales maker okay so
just be clear up to here and I am very
soon coming up with more videos please
give me a thumbs up guys if you like
this content I will create more content
if you guys give me good comments and
likes see you all in the next video
wherever you are stay safe and take care
浏览更多相关视频
How Sonrai Analytics leverages ML to accelerate Precision Medicine (L300) | AWS Events
Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Make Your Applications Smarter with Oracle Machine Learning and AutoML | Oracle DatabaseWorld AI Edi
Advent Of Cyber: Day 15 - Building Email Spam Detector with ML
【AWSとAIの基礎#4】SageMakerによる機械学習
Beginner Machine Learning tutorial (TryHackMe Advent of Cyber Day 14)
5.0 / 5 (0 votes)