YOLOv9 Tutorial: Train Model on Custom Dataset | How to Deploy YOLOv9
Summary
TLDRYOLO v9の紹介と実践的な使用方法を解説したビデオスクリプトです。最新のリアルタイムオブジェクト検出モデルであるYOLO v9は、速度と精度の両面で競合を凌駕していると報告されています。ビデオでは、事前トレーニングされたモデルを使用した推論の実行方法、カスタムデータセットでのモデルのトレーニングと評価、そして推論パッケージを使用したYOLO v9モデルの展開方法を紹介します。また、Roboflowを使用したデータセットのダウンロード、モデルの微調整、評価、および展開のプロセスも解説しています。
Takeaways
- 😀 YOLO v9は速さと精度の両面で競合を凌駕していると発表されています。
- 🎯 YOLO v9はリアルタイムオブジェクト検出の新しいスタンダードを設けているとされています。
- 📚 このビデオでは、事前トレーニングされたモデルを使用して推論を実行し、カスタムデータセットでモデルをトレーニング、評価し、推論パッケージを使用してYOLO v9モデルを展開する方法が紹介されています。
- 🔗 ビデオで使用されるノートブックへのリンクは説明欄にあり、roofflow notebooksリポジトリからもアクセス可能です。
- 💻 Google Colabで作業を進める際、GPUへのアクセス権を確認することが推奨されています。
- 📥 YOLO v9はpipパッケージとして配布されておらず、リポジトリをクローンしてすべての必要なライブラリを手動でインストールする必要があります。
- 🐶 モデルの推論テストには、ビデオ作成者の犬の画像が使用されていますが、ユーザーは自分のデータを使用してテストすることもできます。
- 🔧 YOLO v9は専用のCLIやSDKを持っていないため、detect.pyスクリプトを使用して推論を実行します。
- 🏆 YOLO v9eは同じパラメーターでより多くのオブジェクトを検出でき、YOLO v9ペーパーで報告された性能と一致しています。
- 🤖 モデルのファインチューニングには、roofflowパッケージを使用してデータセットをダウンロードし、YOLO v9のフォーマットに適合させる必要があります。
- 📈 トレーニング後、モデルの評価は必須であり、新しい画像とビデオでのオブジェクト検出のパフォーマンスを確認するのに役立ちます。
- 🛠️ YOLO v9はSDKがなく、モデルを展開するより堅牢なソリューションとしてroofflowを使用することが提案されています。
- 📊 YOLO v9は新しいプロジェクトですが、スピードと精度で競争を抜きにしていますが、まだドキュメント化が不十分です。
- 🏆 YOLO v9はサッカー場の選手検出に適していますが、他にも多くのシナリオで適用可能です。
Q & A
YOLO v9はどのような特徴を持っていますか?
-YOLO v9は速さと精度の両面で競合を凌駕する新しいモデルで、リアルタイムオブジェクト検出における最新の技術を誇っています。
YOLO v9を実行するにはどのような環境が必要ですか?
-YOLO v9を実行するにはGPU環境が必要です。特に、モデルのファインチューニングを行う場合はCPUでは遅くなりがちです。
YOLO v9のリポジトリをクローンするにはどうすればよいですか?
-YOLO v9はPeepパッケージでは配布されておらず、リポジトリを手動でクローンしてすべての依存関係をインストールする必要があります。
YOLO v9のファインチューニングに必要なデータセットはどのように準備すればよいですか?
-ファインチューニングに必要なデータセットは、RoofFlowのUniverseからダウンロードし、YOLO v9に適合する形式にエクスポートします。
YOLO v9のモデルをトレーニングするにはどのようなパラメーターを指定する必要がありますか?
-トレーニングにはバッチサイズ、画像解像度、エポック数、データのパス、ウェイト、および設定ファイルの指定が必要です。
YOLO v9のモデルを評価する際には何を確認するべきですか?
-モデルの評価では、トレーニングセッションの進化を示すグラフ、混同行列、ラベル分布の可視化を確認することが重要です。
YOLO v9のモデルをどのようにデプロイすればよいですか?
-RoofFlowを使用してモデルのウェイトを管理し、ローカルまたはAPIを介してモデルを展開することができます。
YOLO v9を使用して新しいモデルをトレーニングする際にはどのデータセットが推奨されますか?
-YOLO v9を使用してトレーニングする際には、フットボールプレイヤー検出データセットが使用されていますが、他のデータセットも選択可能です。
YOLO v9のモデルをテストする際にはどのようなコマンドを使用しますか?
-テストにはdetect.pyスクリプトを使用し、ウェイトとソース画像のパスを指定します。また、信頼度閾値やデバイスも設定できます。
YOLO v9のモデルをファインチューニングした後のベンチマークはどのように行われますか?
-ファインチューニング後のモデルのベンチマークは、precision, recall, およびmapを測定するpy scriptを使用して行われます。
Outlines
🚀 YOLO v9の登場とリアルタイム物体検出の進歩
YOLO v9は速さと精度の両面で競合他を凌駕した最新のモデルで、以前のバージョンであるYOLO V4, V7, Xの開発者が発表しました。このビデオでは、事前トレーニングされたモデルを使用して推論を実行し、カスタムデータセットでモデルをトレーニング、評価し、推論パッケージを使ってYOLO v9モデルを展開する方法を紹介します。ビデオでは、Google Colabで実行するためのノートブックへのリンクも提供されていますが、GPU環境での実行が推奨されています。また、YOLO v9はpeepパッケージでは配布されておらず、リポジトリをクローンして手動で依存関係をインストールする必要があります。
🔍 YOLO v9の推論実行とモデルのファインチューニング
YOLO v9を使用した推論の実行方法と、カスタムデータセットに対してモデルをファインチューニングする手順が説明されています。推論の結果を比較するために、YOLO v9eを使用した結果も提示されています。また、コミュニティセッションでの議論や、モデルのトレーニングに必要なデータセットの準備方法も紹介されています。トレーニングにはtrain.pyスクリプトを使用し、重要なパラメータとしてバッチサイズ、画像解像度、エポック数などを指定します。
📈 YOLO v9のトレーニング結果とモデルの評価
YOLO v9のトレーニングが完了した後、モデルの評価と新しい画像およびビデオでの物体検出のパフォーマンスを確認する方法について説明されています。トレーニング成果物を分析し、様々な可視化ツールを使ってトレーニングセッションの進展を理解します。また、混同行列やラベルの分布を分析して、モデルの特性を深く掘り下げます。最後に、whale.pyスクリプトを使ってモデルをベンチマークし、検出の正確性とmAPを評価します。
🛠 YOLO v9モデルの展開とロボフローの活用
YOLO v9モデルを展開する方法と、ロボフローを使用してモデルの重みを管理し、モデルを展開するプロセスが紹介されています。inferenceとsupervisionパッケージをインストールし、ロボフローのdeployメソッドを使ってモデルを展開します。その後、モデルをローカルで実行し、supervisionを使って推論結果を視覚化します。さらに、YOLO v9のコミュニティの規模、使いやすさ、ライセンスなどを考慮して他の物体検出モデルと比較し、さまざまなユースケースでの適用可能性も触れています。
👋 ビデオの締めくくりと次回予告
ビデオの最後に、作成者が視聴者に対して感謝の言葉を述べ、次回のビデオに期待をかける場面が示されています。また、チャンネルへの登録、いいね、および今後のコンピュータビジョンに関するコンテンツへのご期待を呼びかけています。
Mindmap
Keywords
💡YOLO v9
💡リアルタイムオブジェクト検出
💡推論
💡カスタムデータセット
💡ファインチューニング
💡GPU
💡モデルの展開
💡精度と速度
💡データセットの不均衡
💡Roboflow
Highlights
YOLO v9模型发布,速度和准确性超越竞争对手。
YOLO v9由YOLO V4、YOLO X和YOLO V7的创造者推出。
YOLO v9在实时目标检测方面达到了新的最佳状态。
视频教程将展示如何使用预训练模型进行推理、训练和评估自定义数据集,并部署YOLO v9模型。
提供了用于运行YOLO v9的notebook链接。
确保访问GPU以优化模型训练和推理过程。
YOLO v9不通过pip包分发,需要手动克隆仓库和安装依赖。
安装了roofflow pip包以简化数据集下载。
YOLO v9不支持自动下载模型权重,但提供了GitHub下载链接和自动化命令。
使用tech.py脚本进行模型推理。
YOLO v9e模型在相同参数下能检测出更多对象。
YOLO v9社区会议讨论了实时零样本目标检测器。
YOLO v9训练需要准备自定义数据集。
训练YOLO v9模型使用train.py脚本,并需要指定多个参数。
模型评估是微调模型后的必经步骤,以了解模型在新图像上的表现。
YOLO v9提供了有用的可视化工具,帮助理解训练过程和模型特性。
部署YOLO v9模型可以使用roofflow管理权重并部署模型。
YOLO v9可以应用于多种场景,如智能自助结账。
YOLO v9是一个年轻的项目,尽管速度快且准确,但缺乏SDK、CLI和文档。
Transcripts
YOLO v9 is out and it's beating out the
competition in both speed and
accuracy the creators of YOLO V4 YOLO X
and YOLO V7 have released new model and
according to the paper is new state of
the art in realtime object
detection in this video I'll show you
how to run the inference using
pre-trained cooes train and evaluate the
model on custom data set and deploy the
YOLO v9 model using inference package so
without further Ado let's dive
in the link to notebook I'll be using is
in the description below but you can
also find it in roof flow notebooks
repository I strongly encourage you to
open it in separate Tab and follow along
we navigate into model section and
search for YOLO v9 object detection
notebook then click open in collab
button and after a few seconds we should
get redirected to Google collab page
before we start we need to make sure
that we have access to GPU it is
especially important if we plan not only
to run the inference but also fine-tune
the model on custom data set this
process can be unbearably slow in CPU
only environments to do this we scroll
slightly down into before you start
section and execute Nvidia SMI command
this command will only execute
successfully in GPU accelerated
environments with Nvidia gpus if your
result is similar to mine then you're
probably good to go however if you see a
message saying that Nvidia SMI command
is not recognize it probably means that
you do not have access to GPU in this
case click runtime and from extended
dropdown select change runtime type then
choose version with Nvidia T4 G GPU the
next step is to clone YOLO v9 repository
and install all necessary libraries
unlike YOLO V8 and yolon Nas YOLO v9 is
not distributed through peep package at
least at the time of the recording this
means we need to clone the repository
and manually install all
dependencies after installation instead
of CLI or SDK we'll have a set of script
to detect train evaluate and Export the
model the project structure is quite
familiar to the one known from older
models like YOLO V5 or YOLO V7 so if you
have any experience with them you should
feel quite familiar if you don't don't
worry I'll still guide you for the whole
process additionally to make our life
easier we'll also install roof flow peep
package it will allow us to download the
data set in format compatible with yolo
v9 I will use the football player
detection data set but you can download
whichever you you like there are already
more than 500,000 data set on rlow
Universe the links to both universe and
my data set are in the description below
unfortunately YOLO v9 lacks support for
automatic model weights download you can
download them manually from GitHub but
inside the notebook you'll find set of
commands allowing you to do that
automatically so let's run them now to
finalize the setup
process our next step is to run
inference with pre-trained I think
it's a great way to get familiar with
new model and at the same time it will
allow us to confirm that the
installation was successful and
everything works as
expected to test the model I will use
the image of my dog but if you want to
run it with your data simply drag and
drop your image into the left panel of
Google collab and replace the source
image path value with path leading to
your image as I said before YOLO v9 does
not have dedicated CLI or SDK so to
perform the inference we will need to
use the tech py script the most
important arguments that we need to
provide are weights and Source the first
one is simply path to weights file that
we already downloaded from GitHub The
Source can be a path leading to
individual image or video but also to
the whole directory with multimedia
files or if you run locally to webcam
stream on top of that we will set values
for two extra parameters confidence
threshold and device I set the
confidence Threshold at 0.1 because I
want to capture as many detections as
possible even those that the model is
not entirely sure about the device
specified the hardware we want to use
during the inference we could pass the
CPU here but given that we have a GPU
will pass the Cuda device index in our
case Zero now let's run the inference
using two architectures and compare the
results the commands we will use are
almost identical the only difference
lies in path to model weights let's
start with Gan C by default YOLO v9 save
inference results in yolow v9 detect exp
directory we can override this Behavior
by passing custom values for project and
name
arguments now let's run the inference
for YOLO v9e and compare the results
this time the results of the inference
were saved in X to directory we can see
that using the same parameters YOLO v9e
is capable of detecting more objects in
the same image and that's consistent
with the performance reported in Yolo v9
paper last week we had our first
Community session where we discussed
yellow World a almost Real Time Zero
shot object detector and I would like to
use this opportunity to thank to
everyone who joined the stream I really
had an awesome time meeting you all and
answering your questions we decided to
continue this initiative so if you have
any questions about the code or demos
that I will show today or about YOLO v9
in general make sure to leave them in
the comments and I will make sure to
answer all of them during the upcoming
stream and of course it would make me
even more happy if you could join it
live you can find more details along
with exact day and time in the
description below and once again thank
you okay we know how to run Yola v9
using pre-trained weights now it's time
to learn how to fine-tune the model but
before we can do that we need to prepare
our data set as I mentioned earlier I
will train my model on football player
detection data set if you'd like to try
a different one feel free to browse for
universe and pick one that seems
interesting then from left panel select
data set and click download data set
button when the popup window appears
pick the export format in our case YOLO
v9 make sure that show download code
option is checked and click continue
after a few seconds a code snippet will
be generated you just need to copy and
paste it into Google collab and you're
good to go I however will stick with my
original choice I just press shift enter
and once it's done I see a prompt asking
me to provide roof flow API key I follow
the link that takes me to roof flow
website I click generate token copy it
and then return to collab paste and hit
enter the downloaded data set is divided
into three subsets train validate and
test the train and validate subsets are
going to be used during the training
while the test subset will be used for
evaluation remember that the test set
should not contain any images that were
used during the training each part
consists of two directories one
containing images and the other
containing labels each label file is
essentially a txt file in standard
yellow format known from earlier
versions of the model each line
describes a single bounding box and
consists of five numbers separated by
spaces the first one is the index
associated with the label for example
for Coco data set index zero is class
person the other four are relative
coordinates describing the position and
dimensions of bounding box in the
image now that our data set is ready we
can finally start the training and to do
it we'll use train py script this time
however we need to specify a lot more
parameters let me briefly introduce you
to the most important
ones let's start with arguments typical
for any computer vision model training
batch size image resolution and
EPO this affects the amount of GPU
memory required during the training as
well as the total time needed to
complete it bad size regulates how many
images pass through the network
simultaneously it should be as high as
possible naturally limited by the amount
of available memory before entering the
network all images must be scaled to a
common size which is defined by image
resolution the higher the value the
smaller objects will be able to detect
unfortunately this also means increased
memory usage and longer training time
Epoch Define the number of training
iteration it's usually a matter of
intuition it needs to be large enough so
the model has time to learn but not too
long so overfitting wouldn't
occur data defines the path to yam file
specifying the structure of our train
set finally weights and config indicate
the checkpoint and architecture we would
like to use during the training in this
tutorial we will train the smallest
architecture available in Yolo V8
repository Gan C it's time to press
shift plus enter and get it going the
training process can take time so let's
use the magic of Cinema to speed it
up
[Music]
after several minutes our training was
completed now it's time to evaluate the
model and check how it performs with
object detection on new images and
videos model evaluation is a must have
after fine-tuning the model custom data
set after all we need to understand the
strengths and weaknesses of our model
and check if there are any differences
in accuracy across different categor
from our data set let's start by going
through the directory storing the
training
artifacts inside besides weights we can
find several visualizations that will
help us understand the progress of
training session let's start with the
graph showing change in key metrics over
time the six charts on the left display
various types of loss function
calculated for both the train and
validation sets these charts are
excellent tool to for detecting
overfitting all of them are expected to
decrease as training progress however
when a model is trained for too long the
validation loss function often tends to
increase signaling that the model is too
closely fitted to images from the train
set in our case it's clear that the
model could have been trained for much
longer without any issues the charts are
still steeply decreasing and there is
plenty of room for further optimization
the remaining four charts are various
metrics such as Precision recall and map
and their values should increase as the
training
progress the next chart we can analyze
is the confusion metrics it shows how
often objects of different categories
are confused with each other I like this
chart because it allows us to delve
deeper into model's characteristics in
our case we can clearly see that the
model excels at detecting play ERS
referees and goalkeepers but performs
significantly worse at detecting balls
YOLO v9 provides another useful
visualization this time focusing on
label distribution we can draw deeper
conclusions by analyzing both of these
charts simultaneously we see that our
data set is unbalanced with player class
being significantly over represented
hence probably the best performance of
the model in detecting players the ball
class performs the worst not only
because it appears the least frequently
but also because the bounding box sizes
for this class are the
smallest since we decided to train our
model using 640 input resolution such
small bounding boxes might simply be
scaled down to Tiny group of pixels
possibly not carrying enough information
to reliably detect the ball class now
let's Benchmark our model using while py
script as expected both recall and map
for this class are significantly lower
than average this is likely to high
number of Mis detections a solution to
this problem could be training the model
using 1280 input resolution but as
mentioned the training session would
then be slower and require more GPU
memory okay it's time to put our newly
trained model to the test and see how it
performs with previously unseen images
and
videos the command does not differ much
from the one we executed at the
beginning of the video we switch the
path from the model pre-trained on Coco
data set to the one we just ftuned and
as the source we pass the entire
directory containing the test data set
let's take a look at some of the
results as expected the model performs
relatively well the biggest issue is
reliable ball detection where we
encounter both false negatives and false
positives double detection also occur
frequently when a referee is
simultaneously detected as both a player
and a referee now let's have some fun
and see how our model handles sh video
clip
[Music]
so how do we deploy the model that has
no SDK we can certainly try to hack a
solution based on the detect py script
available in the repository however it
seems that a much better and certainly
more robust solution would be to use
roof flow to manage your weights and
deploy the model anywhere you want it's
super easy you can do it in few lines of
code let me show you
how we start where we left off with our
model already trained in collab first we
need to install two additional packages
inference and supervision we will use
inference to deploy our YOLO v9 model
locally but you can use it to run all
all sorts of different computer vision
models supervision is a computer vision
Swiss army knife that this time we will
use for annotating our inference
results now we use the deploy method
specify the model type in our case yolow
v9 and the directory containing the
training
results this will send the weights to
roof flow and enable us to use them both
running the model locally and through
the API now let's try to load the model
back into the inference we specify the
model ID which is the end part of the
address displayed above the project
name/ data set version and we pass our
roof flow API
key to get your roof flow API key you
need to log into your roboplow account
and then by expanding the drop-down in
the upper right corner go to settings
from the panel on left side select
workspace and go to roof flow API
section now you can simply copy your
private key return to collab paste it
and press enter to initiate model
download Once the model is loaded we can
choose a random image from our test set
loaded using open CV and run it through
the model finally using supervision we
visualize the
result since supervision offers a
variety of different annotators we can
choose something more suitable for our
use case for examp example the ellipse
annotator funny how such a small change
can make the results seem much more
interesting at the end of last year we
released a video covering my favorite
object detection
models the goal was to evaluate popular
detectors not only in terms of speed and
accuracy but also take into account less
obvious criteria such as Community size
ease of use and
Licensing I mentioned that the higher
accurate accy of the based model on KOCO
data set does not always translate into
higher accuracy of the fine-tuned model
and that the accuracy of detections is
usually more affected by the quality of
the data set that was used for training
than the choice of specific architecture
assessing Yolo v9 from similar
perspective it's worth to mention that
it is still a very young project
although according to the paper it has
managed to beat the competition in terms
of of speed and accuracy on the other
hand it has no SDK no CLI and no
documentation beyond the GitHub read me
so to sum it up I encourage you to try y
v9 train your model deploy it with
inference however keep in mind that
there are still plenty of other popular
object detectors that might be just a
little bit slower and less accurate but
still be a valuable alternative because
they have large community great
documentation and a lot of examples
online that you can use to build your
own project of course a powerful
realtime object detector can be applied
in many scenarios in this tutorial we
trained a model for detecting players on
a football field but YOLO v9 can be just
as well used to Power Smart self-service
checkout the customer simply needs to
move the product in front of the camera
and it is automatically added to the
bill I also encourage you to check our
hagging face space where you can upload
your image and compare YOLO V8 v9 and
yolon Nas side by side we are using
models pre-trained on Coco data set so
detection is limited to only 80
classes if you want to go beyond that I
encourage you to check our YOLO World
space where you can detect any class
without any training and that's all for
today if you like the video make sure to
like And subscribe and stay tuned for
more computer vision content coming to
this channel soon my name is Peter and I
see you next time
[Music]
bye
he
Ver Más Videos Relacionados
Building and training ML models with Vertex AI
RAG Evaluation (Answer Correctness) | LangSmith Evaluations - Part 12
LangChain Agents with Open Source Models!
Install Stable Code Instruct 3b Locally - Best Small Coding Model
Code-First LLMOps from prototype to production with GenAI tools | BRK110
groqとLlama3を合わせて爆速チャットボットを作ってみた
5.0 / 5 (0 votes)