YOLO World Training Workflow with LVIS Dataset and Guide Walkthrough | Episode 46

Ultralytics
2 May 202415:13

Summary

TLDRThis video tutorial guides viewers on training a custom YOLO World model using the large-scale, fine-grain LVIS dataset. It covers setting up the training pipeline with Allytic's platform, selecting the YOLO World 2 model, and choosing between training from scratch or fine-tuning with custom data. The video also highlights the process of using the extensive LVIS dataset, which contains over 1,200 object classes, and demonstrates how to train the model locally with the help of provided code snippets.

Takeaways

  • 📚 The video tutorial focuses on training a custom YOLOv5 model for object detection using a large-scale dataset called LVIS.
  • 🔍 LVIS is a large-scale, fine-grained vocabulary dataset with over 1,200 object categories, which is more extensive than the standard COCO dataset.
  • 💻 The video demonstrates how to set up the training pipeline using the Ultralytics framework, which simplifies the process without needing to write extensive code.
  • 🚀 The tutorial covers both training a YOLOv5 model from scratch and fine-tuning it with a custom dataset for specific tasks.
  • 🌟 YOLOv5 models come in different sizes: small, medium, large, and extra large, each suitable for different computational capabilities and accuracy needs.
  • 📈 The video provides a step-by-step guide on how to use the model's YAML file to specify the dataset, training, and validation splits.
  • 💾 It mentions the importance of having a powerful GPU for training large datasets like LVIS, as it can take several hours or even days.
  • 📊 The tutorial shows how to monitor the training process by tracking metrics such as loss and mean average precision (mAP) over epochs.
  • 🔧 The video suggests that for practical purposes, one might prefer to fine-tune a pre-trained model rather than training from scratch due to the significant time investment.
  • 🔗 The script provides insights into using the trained model for predictions and mentions that the Ultralytics framework provides tools for further analysis like confusion matrices.

Q & A

  • What is the purpose of the video?

    -The purpose of the video is to demonstrate how to train a custom YOLO World model, including using a large-scale dataset called LVIS and setting up the training pipeline.

  • What dataset is being used to train the YOLO World model?

    -The dataset being used is LVIS, which is a large-scale fine-grained vocabulary dataset with over 160,000 images and 1,200 object categories, released by Facebook AI research.

  • What are the main differences between the LVIS dataset and the COCO dataset?

    -The main differences are that the LVIS dataset contains over 1,200 object categories, while the COCO dataset only has 80. LVIS is more comprehensive and provides a larger and more diverse set of objects for training models.

  • What are the supported tasks for the YOLO World model?

    -The YOLO World model supports inference, validation, training, and export tasks. However, export is only available with YOLO World 2 model.

  • How can you train a YOLO World model using your own custom dataset?

    -You can train a YOLO World model using your own custom dataset by creating a dataset in the required format and specifying it in the model training command, using the LVIS dataset structure as a reference.

  • Why is it suggested to use a local environment for training instead of Google Colab?

    -It is suggested to use a local environment because the dataset is very large, and extracting and training it in Google Colab would take a long time. Training on a local GPU is more efficient for large-scale datasets like LVIS.

  • What hardware specifications are required for training the YOLO World model locally?

    -Training the YOLO World model locally requires a powerful GPU, such as an RTX 4090, as it involves processing over 100,000 images, which takes significant computational resources.

  • What are the key metrics used to evaluate the YOLO World model during training?

    -The key metrics used during training are the Box loss, Class loss, DFL loss, and the mean Average Precision (mAP) at different IoU thresholds (0.5 and 0.5-0.95).

  • How long does it typically take to train the YOLO World model on the LVIS dataset?

    -Training the YOLO World model on the LVIS dataset for 30 epochs may take several hours to days, depending on the hardware used. In the video, 10 epochs took around 3 hours using an RTX 4090 GPU.

  • What are the advantages of using open vocabulary models like YOLO World?

    -Open vocabulary models like YOLO World can detect an arbitrary number of object classes beyond those available in datasets like COCO. This flexibility makes them suitable for a wider range of applications without requiring specific training for each possible class.

Outlines

00:00

🚀 Introduction to Training Custom YOLO World Models

The video begins with an introduction to training a custom YOLO World model. The presenter discusses the process of training models using a dataset called 'lb', which is a large scale, fine-grain vocabulary dataset. This dataset is different from the standard COCO dataset, as it contains a vast number of images and classes, allowing for the pre-training of YOLO World models. The video aims to demonstrate how to set up a pipeline for training these models, and also mentions the possibility of using one's own custom data for fine-tuning.

05:00

📚 Exploring YOLO World Models and Datasets

The presenter dives into the Ultralytics documentation to explore the available YOLO World models. They discuss the features of the models, such as their ability to detect an arbitrary number of objects due to their open-vocabulary nature. The video then guides viewers on how to select a model and prepare for training by choosing between different model sizes and versions. The presenter also touches on the process of using the LVIS dataset, a large-scale dataset released by Facebook AI research, for further training and fine-tuning of the models.

10:02

💻 Setting Up Training with Large-Scale Datasets

The video script describes the process of setting up training on a local machine using a large-scale dataset like LVIS, which contains over 100,000 images. The presenter explains the steps involved in unzipping and preparing the dataset for training, emphasizing the importance of having a GPU for such tasks. They also mention the use of Google Colab for smaller datasets and provide insights into the time and resources required for training on such a large dataset.

15:02

🔍 Analyzing Training Results and Model Performance

The final paragraph discusses the process of analyzing the training results and model performance. The presenter shares insights on tracking metrics over time, including losses and mean average precision. They mention the expected decrease in loss and increase in mean average precision as the model trains over epochs. The script also hints at the presenter's intention to cover more about the results in future videos, encouraging viewers to test out the training process themselves.

🎉 Conclusion and Invitation to Future Videos

The video concludes with a summary of the training process for YOLO World models and an invitation to viewers to explore the training of these models with their own custom data. The presenter expresses gratitude for watching and looks forward to engaging with the audience in upcoming videos.

Mindmap

Keywords

💡YOLO World model

The YOLO World model refers to a type of object detection model that is capable of detecting objects in images or videos. In the context of the video, it's a pre-trained model that can be fine-tuned for specific tasks. The video discusses how to train this model using a large-scale dataset called 'lb' to improve its detection capabilities beyond the standard 80 classes from the COCO dataset.

💡Dataset

A dataset in the video refers to a collection of data used for training machine learning models. Specifically, the 'lb' dataset is mentioned, which is a large-scale, fine-grain vocabulary dataset used for pre-training YOLO World models. The dataset contains up to 100,000 images and a variety of classes, making it suitable for training models to recognize a wide range of objects.

💡Fine-tuning

Fine-tuning in machine learning involves adjusting a pre-trained model to better suit a specific task or dataset. In the video, the presenter discusses how one can fine-tune the YOLO World models on smaller custom datasets for particular tasks, leveraging the model's pre-training to achieve better performance.

💡Pre-trained models

Pre-trained models are machine learning models that have already been trained on large datasets and can be adapted to new tasks with less data. The video explains that YOLO World models are pre-trained on open-vocabulary datasets, which allows them to detect a wide variety of objects, and can be further customized through fine-tuning.

💡Open vocabulary

An open vocabulary dataset contains a large and diverse set of classes or labels, unlike fixed vocabulary datasets which have a limited number of classes. The video mentions that pre-trained models are trained on open-vocabulary datasets, which means they can detect an arbitrary number of objects, not just a predefined set.

💡Training

Training in the context of the video refers to the process of teaching a machine learning model to make predictions or perform tasks by feeding it a dataset and adjusting its parameters based on the outcomes. The video provides a step-by-step guide on how to train a YOLO World model using the 'lb' dataset.

💡Validation

Validation in machine learning is the process of assessing the performance of a model on a separate dataset to ensure that it generalizes well to new, unseen data. The video script mentions a validation dataset, which is used to evaluate the YOLO World model's performance during training.

💡Epoch

An epoch in machine learning refers to one complete pass through the entire training dataset. The video discusses training the model for a certain number of epochs, which is a measure of how many times the model has seen the entire dataset during training.

💡Loss

Loss in machine learning is a measure of how far the model's predictions are from the actual outcomes. The video mentions tracking loss during training, which is a common practice to monitor model performance and adjust the training process accordingly.

💡Mean Average Precision (mAP)

Mean Average Precision is a metric used to evaluate the performance of object detection models. It measures the average success rate across all classes. The video script refers to mAP as one of the metrics to track during the training process of the YOLO World model.

💡Google Colab

Google Colab is a cloud-based interactive computing platform that allows users to write and execute code through a web browser. The video script mentions using Google Colab for training the YOLO World model, highlighting its utility for machine learning tasks that require significant computational resources.

Highlights

Introduction to training a custom YOLO World model

Using the large scale fine grain vocabulary dataset called LVIS

YOLO World models are pre-trained on open vocabulary datasets

Demonstration of setting up the training pipeline

Option to use custom data for fine-tuning YOLO World models

Accessing YOLO World models through the Ultralytics platform

Overview of the YOLO World model and its key features

Support for various tasks like inference, validation, and training

Choice between different model sizes: small, medium, large, and extra large

Explanation of transfer learning on the COCO dataset

Details on the LVIS dataset: scale, annotations, and categories

How to use the LVIS dataset for training YOLO World models

Code snippets provided for easy setup and training

Instructions for local training with large datasets

Importance of using a GPU for efficient training

Process of extracting and preparing the dataset for training

Training the model locally with specified epochs and image sizes

Monitoring the training process and tracking metrics

Results after 10 epochs of training and the need for longer training

Analysis of the model's performance on the validation set

Final thoughts on training custom YOLO World models and future applications

Transcripts

play00:00

hey guys welcome to video in this video

play00:01

here we're going to see how we can train

play00:03

a custom YOLO World model so in one of

play00:05

the previous videos we already went over

play00:07

how we can run the model and so on but

play00:09

now we're going to take a look at how we

play00:10

can train our own models we're going to

play00:12

use a data set called

play00:14

lb so it's basically just a large scale

play00:17

fine grain vocabular data set so

play00:19

normally when we train the YOLO World

play00:20

model also the pre-trained ones from

play00:22

allytics they're pre-trained on open

play00:24

vocabulary data set so it's basically

play00:26

just a huge data set with up to 100,000

play00:29

images and also a bunch of different

play00:31

classes so it's not just 80 classes from

play00:33

the KOCO data set but these models and

play00:35

these data sets can be used for pre-

play00:37

training YOLO World models so we're

play00:39

going to show you how we can set up that

play00:40

pipeline you can also use your own

play00:41

custom data if you have smaller data

play00:43

sets that you want to fine-tune the YOLO

play00:45

World models on so let's just jump

play00:47

straight into the Alo litics

play00:48

documentation if we go up inside the

play00:50

models tab we'll be able to see all the

play00:53

models that we have avable with ultral

play00:55

litics so right now we're just going to

play00:57

go down to YOLO World there we go first

play00:59

of all you can read a short description

play01:01

about it like get an overview and also

play01:02

the key features but this is again just

play01:04

an open vocabulary model so it's able to

play01:06

detect an arbitary number of Optics you

play01:09

can even prompt it and so on but in this

play01:10

video here I'm going to show you how we

play01:12

can either like train a YOLO World model

play01:14

from scratch so basically like random

play01:15

initialize waste or how you can use your

play01:17

own custom data set to go in and

play01:19

fine-tune these models for your specific

play01:21

task so right now if you just scroll a

play01:23

bit further down we can see the aaable

play01:25

models supported task and also the

play01:27

operating modes so first all here here

play01:30

let's just go in and use the YOLO World

play01:31

2 model so now we both have a version

play01:33

one and also a version two we can see

play01:35

the different task which are supported

play01:37

so we can both do inference validation

play01:40

training and also export but we can only

play01:42

do export with yolo world to model so

play01:45

definitely just go with that one so we

play01:48

also have all the variations so we have

play01:50

the small medium large and extra large

play01:52

model so we're going to choose which of

play01:53

those we want if we scroll a bit further

play01:55

down we can then see the serero shot

play01:57

transfer on the Coco data set so that's

play01:59

also like a large scale data set which

play02:01

we normally pre-train the standard

play02:03

yellow models on but now we act like on

play02:06

open vocabulary on large scale data set

play02:08

with add a bunch of different classes so

play02:10

we're going to have like hundreds if not

play02:12

thousands of classes in these models I'm

play02:15

going to show you how we can take

play02:16

another data set and fine-tune it or

play02:17

even like train that from scratch it

play02:19

will just require a lot of training time

play02:22

so we can see some USIC examples here

play02:24

this is everything that we have to do

play02:25

I'm going to do this locally because

play02:27

it's going to take a long time the days

play02:28

that we're going to use

play02:30

which I'm going to show you in just a

play02:31

second has 100,000 images in the

play02:34

training set you can see here how you

play02:36

can train it predict and also do

play02:38

validation you can also go in and Export

play02:40

and so on we have code snippit for all

play02:42

of it take it directly copy paste it

play02:44

either into a Google collab notebook or

play02:46

directly into your local environment and

play02:48

you're good to go you don't have to

play02:50

write any code of all you can just use

play02:51

the Alo litics framework and you just

play02:54

have to specify a couple lines of code

play02:55

or directly in the command line and then

play02:58

you can go in and train the models

play02:59

Direct corly so if I want to really go

play03:01

more into details with this let's now go

play03:02

inside the data sets and let's take a

play03:04

look at the data set that we want to use

play03:06

so if we scroll a bit further down we

play03:07

can then see for the optic detection

play03:09

data set we have this lvis data set so

play03:13

if you just press on it we can go in and

play03:15

see that this is a large scale fine

play03:17

grain vocabulary level annotation data

play03:19

set and it is released by Facebook AI

play03:21

research so this is basically just a

play03:23

research Benchmark for optic detection

play03:25

and also instant segmentation but we are

play03:27

only going to specify and work with with

play03:29

the optic detection data set so here we

play03:32

can just see that it is a large

play03:34

vocabulary of categories aiming to drive

play03:36

further advancement in computer vision

play03:38

field so I'm basically just going to

play03:39

call it lvis here so it contains 160,000

play03:42

images and 2 million instance annotation

play03:45

both for optic detection segmentation

play03:47

and captioning task we can see that we

play03:49

have over 1,200 optic categories so

play03:53

instead of just having like the standard

play03:54

objects like cars bicycles animals and

play03:57

so on from the cocer data set where we

play03:58

only have 80 class

play04:00

now we can go in and train these models

play04:02

on 1200 classes directly go in and use

play04:06

those for our own applications and

play04:07

projects as pre-trained models so right

play04:10

now we can just see the key features we

play04:12

won't really go too much into details

play04:13

with that but we have the data set

play04:14

structure I'm going to extract the

play04:16

information and so on I'm going to show

play04:17

you like how you can just you run the yl

play04:19

file with Alo litics and it's going to

play04:21

extract and unip everything take care of

play04:24

it and you can train it directly so we

play04:26

have a training split validation split

play04:27

we also have a mini validation split and

play04:29

the test set here at the end so if we

play04:32

scroll a bit further down we will be

play04:33

able to see the data set yaml file with

play04:36

all the different classes that we're

play04:37

going to do detections on and also the

play04:39

path to our train validation and our

play04:41

mini validation split and this is pretty

play04:43

much everything that we need if you just

play04:45

want to use it directly and train our

play04:47

model we can see the example usage Yow

play04:49

train we also need to specify detection

play04:51

or segmentation and we set the data set

play04:53

path here equal to elvis. yaml then it's

play04:56

just going to pull the data set from Alo

play04:58

litics here and you can you can use it

play05:00

directly and I'm going to do that in

play05:02

just a second so it's around 20 GB it's

play05:04

over 100,000 images so it takes a long

play05:07

time to act like go in and unip and so

play05:08

on so I'm just going to let it run in

play05:10

just a second so we can use it for

play05:12

training later on so if you're using

play05:15

like these large scale data set like

play05:17

definitely do it locally if you have a

play05:18

GPU and so on but you can still do it in

play05:20

Google collab notebook but you will most

play05:22

likely just go in and find tun it using

play05:24

a Google cab notebook with a few hundred

play05:26

images to a few thousand so you can

play05:28

pretty much just see all the classes

play05:29

here we have alarm clock airplane apple

play05:32

apple sauce apricot apron scroll through

play05:36

all of them ball basket ball instead of

play05:38

just like sports ball which we have in a

play05:40

cocer data set beach ball battery bed

play05:43

cow and so on so we pretty much just

play05:45

have any class that you can come up with

play05:48

here in the data set so most likely if

play05:50

you want to use a model directly out of

play05:51

the box a pre-ra model and you don't

play05:53

want to find your own data set you can

play05:54

definitely go in and use these open

play05:57

vocabulary data sets if you just go a

play05:59

bit fill it down let's just go down and

play06:00

verify yes we have

play06:03

1,2003 classes this is how you can go

play06:06

and download if you want to download

play06:08

directly from code this is how you can

play06:09

download the labels and also download

play06:11

the data but if you just use it directly

play06:13

with allytics it's going to extract all

play06:16

the folders the whole data structure and

play06:18

it's going to run the training directly

play06:20

or if you want to use it for predictions

play06:22

later on here we can see some sample

play06:24

images and also annotations so it's

play06:26

basically just ton of different images

play06:28

both for inst mentation optic detection

play06:30

and so on open vocabulary we have 1,00

play06:33

classes so that's pretty much it let's

play06:34

now go ahead and see how we can train

play06:36

this model if we just go inside a Google

play06:38

collab notebook so first all here we

play06:40

just need to P install tics then we need

play06:42

to create an instance of the YOLO world

play06:44

class we just need to specify which of

play06:45

the YOLO models and also the YOLO World

play06:48

version two now we have the model now we

play06:51

can just go down and specify if we want

play06:52

to train it and which of the data set

play06:54

that we want to train it on and right

play06:56

now we just need to specify elvis. yaml

play06:59

the number of pox image sizes and we

play07:01

also have a bunch of other Arguments for

play07:03

the training script here that you can go

play07:05

and set based on the Alo litics

play07:07

documentation you can also run it

play07:09

directly in the command line so you can

play07:11

just call your load detect train specify

play07:13

the data path here so Elvis if it's not

play07:16

able to find that on your own local

play07:17

computer it's going to pull it from the

play07:19

AL litic registry where we just have all

play07:22

the data sets in there so it will take a

play07:25

long time with this specific data set

play07:26

here but if you use roboff flow if you

play07:28

use conversion tool and so on to

play07:29

generate this yaml file just for a few

play07:31

hundred images you can do it perfectly

play07:33

fine in here and it will only take a few

play07:36

seconds to extract so right now I won't

play07:38

do it here on Google collab notebook it

play07:40

will just take too long so I'm going to

play07:41

do it locally on my own computer so

play07:44

right now let's just go in here and run

play07:45

it and then I'm going to do it on my

play07:47

local computer so I'm going to do the

play07:48

exact same thing but I'm going to run

play07:50

the training because it will take like

play07:52

several hours to do the whole training

play07:53

when we're talking about like 100,000

play07:55

images that we need to process so I have

play07:57

an RTX 490 on my home computer and we're

play08:00

going to see the training results go

play08:02

over Epoch for Epoch so we can see how

play08:04

we can train these YOLO World model on a

play08:07

large scale data set and it doesn't even

play08:09

have to be large scale you can do it on

play08:11

your own images and data set as well so

play08:15

first of all we just paper install Alo

play08:16

lytics we create an instance of the

play08:18

model we train it and we don't have to

play08:19

run the last line down here at the

play08:21

bottom because it's going to do the

play08:23

exact same thing in here so for epox we

play08:26

can also specify the bat size and so on

play08:27

but we're just going to go with default

play08:29

ones for now because we we're only

play08:31

interested in seeing the data set

play08:32

structure and then we're going to do it

play08:34

locally so if you just take a look at

play08:36

the data set locally while it's running

play08:38

in here in Google collab so right now I

play08:39

just have data sets and we have Elvis we

play08:42

have The annotation images and also

play08:43

labels if I go inside the images we have

play08:46

test train and validation and if we just

play08:48

go inside the validation set we can see

play08:50

that we have 5,000 images for the mini

play08:52

validation and these are all the images

play08:55

and we have all the labels for the

play08:56

images and this is only the validation

play08:58

set with 5K images so if you just scroll

play09:01

through it you can see there's a bunch

play09:03

of variations like a bunch of different

play09:04

types of images bunch of different

play09:06

objects and so on so this is a really

play09:08

good data set to pre-train a model on so

play09:12

normally when you have such a huge data

play09:14

set you just train the model from

play09:15

scratch but it would probably just take

play09:17

too long to convert so I'm also just

play09:18

going to fine-tune it just to be able to

play09:20

run it like for 10 20 Epoch so it

play09:22

doesn't take like multiple days to train

play09:25

on my own single 490 GPU so right now we

play09:29

can see that it just unip so to start

play09:31

with it's going to download the model

play09:32

directly if you're running this for the

play09:34

first time so right now here we can see

play09:36

that it's missing the path so it can

play09:37

recognize this yl file locally or at

play09:41

least in your environment right now so

play09:42

it's going to extract all of that from

play09:44

ultr lytics so we can see that it's on

play09:46

sipping from data sets LV label segments

play09:48

into this directory and we can go over

play09:52

and see to the left so we have our data

play09:53

sets we have our Elvis and then we have

play09:55

our annotations labels and we're also

play09:57

going to have our images later on so

play10:00

this is such a huge data set and it will

play10:01

take very long to go in and extract in

play10:04

Google collab it probably took me around

play10:06

like 20 minutes locally on my own

play10:07

computer but let's now go and see how we

play10:09

can set it up and also run training

play10:12

directly on our own local environment so

play10:15

while it's just unzipping the whole data

play10:17

set in here let's just go in and open up

play10:19

a new terminal I'm just going to use

play10:21

Anaconda prompt right now we can just go

play10:23

down and take this command directly

play10:25

throw it in here after we have PIV

play10:27

installed Al litics locally on on our

play10:29

own environment so this is not a Google

play10:31

callab notebook this is on my own

play10:33

computer and if I just delete all of

play10:34

this and verify that we have a GPU

play10:36

attached to it we can call Nvidia SMI

play10:40

there we go and we should get all the

play10:41

information about our GPU 24 GB of RAM

play10:44

Nvidia T4 rjx 4090 so we're good to go

play10:48

and we can just copy paste this command

play10:49

in we don't want to run it for too long

play10:51

so let's just go down and act like run

play10:53

it for 30 EPO and I'm going just going

play10:55

to like L it run and then we're going to

play10:56

come back and take a look at the results

play10:58

EPO Epoch the image size here we can

play11:01

also specify bad size and so on but

play11:02

let's just go with the default

play11:03

parameters so when we run it locally we

play11:05

don't need the explanation mark that is

play11:07

only in a Google collab notebook so

play11:09

right now it's just going to extract the

play11:11

whole data set so the training images we

play11:13

can see that is extracting all the

play11:14

images here it's going pretty pretty

play11:16

fast but we also need to extract 100,000

play11:18

images and we can see the track bar over

play11:21

here or the progress bar so right now is

play11:23

around 20% it's going to take the

play11:25

training and also the validation set

play11:28

after it's done extracting all the

play11:30

images loading it into the system it is

play11:32

going to start the training for the for

play11:34

the epoch that we have specified and

play11:35

then we can just lock the metrics over

play11:37

time take a look at the losses and also

play11:40

the mean error positions see how our

play11:42

model converges and then we're just

play11:43

going to let it run and come back and

play11:45

take a look at the results because this

play11:46

is going to take a long time to process

play11:49

it'll probably take several hours to be

play11:51

able to train this model and this is

play11:52

still just on a fine-tune Model if he

play11:54

wants to train from scratch it will

play11:56

probably take multiple days for a model

play11:58

to be able to to convert so we can do

play12:00

meaningful predictions with our new YOLO

play12:03

World model that we have trained from

play12:04

scratch on a large scale data set so

play12:07

this is also how you take a model from

play12:08

scratch and create these preachment

play12:10

models which we have with ulv 5 UL V8 UL

play12:13

world and so on so right now we can see

play12:15

that our training and validation set has

play12:17

been extracted we can also see that we

play12:19

have our Optimizer set up image size is

play12:22

640 for the train and also validation

play12:24

we're using eight data load workers and

play12:26

we're also locking the results to runs

play12:27

the tech train starting training for

play12:29

thir Epoch and now we can go in and

play12:31

track Epoch per Epoch the whole training

play12:34

process so right now we can see Epoch 1

play12:36

out of 30 the Box loss class loss dfl

play12:39

loss and also our instances we'll also

play12:41

get the mean position and so on but

play12:42

right now we can see here that it has

play12:44

processed 500 batches out of 6,000 for

play12:47

single Epoch so that's a lot of data

play12:49

that we need to process for every single

play12:51

Epoch so right now let's just let it run

play12:53

for some hours and we can go back and

play12:55

take a look at the training results

play12:57

after that so model is not down training

play12:59

let's go down and take a look at the

play13:00

epoch and the results so right now we

play13:02

have just trained for 10 Epoch and we

play13:04

should definitely have trained it for

play13:05

longer but it will take like several

play13:06

days to train either model from scratch

play13:08

or just fining this on the pre-trained

play13:10

YOLO World model so we're going to take

play13:12

a look at the metrics Epoch per Epoch we

play13:14

both have all the losses we also have

play13:15

the mean a position of 50 and also mean

play13:18

a position 50 to 95 and these are pretty

play13:22

much the values that we should look at

play13:23

the average position should be

play13:25

increasing and the losses should be

play13:26

decreasing over the number of epo if you

play13:29

just go a bit further down we can then

play13:30

see that the mean reposition which we

play13:32

have here is just increasing over time

play13:34

we start out at around

play13:36

028 and then we end off after 10 Epoch

play13:40

at

play13:42

0.764 so that's pretty good our mean

play13:45

positions are increasing and we can also

play13:46

see that our losses is decreasing

play13:49

significantly at least here in the start

play13:50

which is also expected so we definitely

play13:52

need to train this model for longer we

play13:54

can see that the 10 Epoch completed in 3

play13:56

hours if it were to actually like train

play13:58

a model we can see that it has hasn't

play13:59

even converged yet it is not near that

play14:01

but if you want to train this model

play14:03

fully we'll have to run it for probably

play14:05

several days so right now the's position

play14:08

is around like 07 and we could probably

play14:10

expect it to be up in the0 40 ranges for

play14:14

the specific data set after it's done

play14:16

training it will also go in and do

play14:17

evaluation with all the classes and so

play14:19

on so we can see the individual classes

play14:20

how does the model perform on those and

play14:23

this is not really too meaningful but

play14:24

you can dive into some of the classes if

play14:25

you have some specific ones that you

play14:27

want to take a look at or you can inside

play14:29

the Run folder take a look at the

play14:30

confusion Matrix but we have videos

play14:32

about covering like the whole run folder

play14:34

all the results that it's going to

play14:36

generate after we have trained a world

play14:38

model with ultra lytics so thank you

play14:40

guys watch this video here I hope you

play14:42

learn T basically just to see how we can

play14:44

train a Yol World model both on a large

play14:46

scale data set but you can also do the

play14:48

exact same thing with your own custom

play14:50

data with a few hundred images so

play14:52

definitely go in test it out it is

play14:54

really nice to learn how you can set up

play14:56

the whole training Pipeline and test out

play14:58

these open open vocabulary models where

play15:00

we can do pretty much the optic

play15:02

detection on an arbitrary optic instead

play15:04

of only the 0 classes from the Coco data

play15:06

set so thanks a lot for watching again

play15:08

and I hope to see you guys in one of the

play15:09

upcoming videos until then Happy

play15:11

training

Rate This

5.0 / 5 (0 votes)

相关标签
YOLO WorldObject DetectionCustom TrainingMachine LearningAI ModelsData SetsImage RecognitionComputer VisionDeep LearningModel Fine-tuning
您是否需要英文摘要?