Auto Annotation for generating segmentation dataset using YOLOv8 & SAM

Code With Aarohi
23 May 202314:09

Summary

TLDRIn this tutorial, Arohi demonstrates how to utilize the auto-annotation feature of the ultralytics package, which implements Meta AI's Segment Anything Model (SAM), for efficient image segmentation. The video explains the process of segmenting images and videos using SAM, and how to generate pixel-level annotations with the help of a pre-trained object detection model. It covers the technical requirements, steps to install ultralytics, and a detailed walkthrough of the auto-annotation function, showcasing its potential to save time and effort in creating accurate segmentation datasets.

Takeaways

  • 📚 The video is a tutorial on how to perform auto-annotation for image segmentation using the ultralytics package.
  • 🕒 It highlights that image segmentation annotation is more time-consuming than object detection due to the need for pixel-level annotation.
  • 🚀 Meta AI released a segmentation model called 'segment anything model' in April 2023, trained on a massive dataset with over 1 billion masks on 11 million images.
  • 🔧 Ultralytics integrated the 'segment anything model' into their package and introduced an auto-annotation feature to automate image segmentation tasks.
  • 💻 The tutorial uses Python 3.9, PyTorch 2.0.1, CUDA 11.7, and is demonstrated on an RTX 3090 GPU with ultralytics version 8.0.106.
  • 🖼️ The video demonstrates how to use the 'segment anything model' to segment images and videos, and even from a webcam.
  • 📹 It shows how to view the output image or video with segmentation masks directly on the screen.
  • 🔍 The auto-annotation feature uses a pre-trained object detection model to generate bounding boxes, which are then used by the segmentation model to create masks.
  • 📁 The process results in the creation of annotation files in a 'labels' folder, which are crucial for training segmentation models.
  • 🛠️ The video emphasizes the efficiency and accuracy gains from using auto-annotation, especially beneficial for large datasets where manual annotation is labor-intensive.

Q & A

  • What is the main focus of the video by Arohi?

    -The main focus of the video is to demonstrate how to perform auto-annotation on a dataset for image segmentation using the ultralytics package and the SAM model.

  • Why is image segmentation annotation considered more time-consuming than object detection annotation?

    -Image segmentation annotation is more time-consuming because it requires pixel-level annotation where each pixel of an image is assigned a class label, whereas object detection annotation involves providing bounding boxes for objects of interest.

  • What is the significance of the 'segment anything' model released by Meta AI?

    -The 'segment anything' model is significant because it is an instant segmentation model trained on a large dataset with over 1 billion masks on 11 million images, making it the largest dataset for image segmentation to date.

  • How does the auto-annotation feature in ultralytics work?

    -The auto-annotation feature in ultralytics uses a pre-trained object detection model to generate bounding boxes and class labels, which are then used by the 'segment anything' model to create segmentation masks for the areas of interest.

  • What are the system requirements mentioned in the video for running the ultralytics package?

    -The system requirements mentioned are Python version 3.9, torch version 2.0.1, Cuda 11.7, and an RTX 3090 GPU.

  • How can one view the segmentation results on the screen using ultralytics?

    -To view the segmentation results on the screen, one can set the 'show' parameter to true when using the ultralytics model to perform segmentation.

  • Can the 'segment anything' model be applied to videos or live streams?

    -Yes, the 'segment anything' model can be applied to videos or live streams by providing the video path or setting the source to zero for a webcam, and the model will perform segmentation on each frame.

  • What is the purpose of the object detection model in the auto-annotation process?

    -The purpose of the object detection model in the auto-annotation process is to provide bounding boxes and class labels for the objects of interest, which are then used by the 'segment anything' model to generate segmentation masks.

  • How does the auto-annotate function within the ultralytics package create annotation files?

    -The auto-annotate function in the ultralytics package creates annotation files by performing detection using a pre-trained detection model, fetching bounding boxes and class IDs, and then using the 'segment anything' model to generate segmentation masks, which are written to text files in a labels folder.

  • What is the advantage of using the auto-annotation feature for large datasets?

    -The advantage of using the auto-annotation feature for large datasets is that it saves a significant amount of time and effort compared to manual annotation, while also potentially improving accuracy due to the use of pre-trained models.

Outlines

00:00

🖼️ Introduction to Auto Annotation for Image Segmentation

Arohi introduces a tutorial on auto-annotation for image segmentation using the ultralytics package. She explains the difference between image segmentation and object detection annotation, emphasizing the pixel-level detail required for segmentation. Arohi highlights Meta AI's 'segment anything' model, trained on a vast dataset, and its recent integration into the ultralytics package, which now features an auto-annotation tool. This tool automates the creation of segmentation masks using a pre-trained object detection model to generate bounding boxes for the segmentation model to work on. The tutorial is aimed at those familiar with Python and deep learning environments, as Arohi lists her own software and hardware versions used for the demonstration.

05:01

📹 Demonstrating Segmentation on Images and Videos

Arohi demonstrates how to use the 'segment anything' model in ultralytics for image segmentation. She shows the process of importing the model, applying it to an image, and viewing the results. The video also covers how to display the segmented image on the screen. Arohi extends the demonstration to video files, explaining how to apply segmentation to each frame of a video in real-time. She also mentions the capability to use the model with a webcam. The paragraph concludes with a transition to the next part of the tutorial, which focuses on generating annotations for images using the auto-annotate feature of the ultralytics package.

10:03

🔍 Generating Annotations with Auto Annotate

Arohi begins the auto-annotate task by accessing the ultralytics repository and navigating to the 'annotator' function within it. She outlines the process of using the auto-annotate function to create segmentation masks for images. The tutorial explains the necessity of a pre-trained detection model to provide bounding boxes for the segmentation model. Arohi details the steps involved in the auto-annotate function, including performing detection, fetching bounding boxes and class IDs, and using the segmentation model to create masks. The function writes the annotations to text files within a 'labels' folder. The video concludes with a summary of how the auto-annotation feature can save time and improve accuracy for large datasets, and Arohi thanks the viewers for watching.

Mindmap

Keywords

💡Auto annotation

Auto annotation refers to the process of automatically generating annotations for images or videos, which typically involves identifying and labeling objects within the visual data. In the context of the video, auto annotation is used to create pixel-level annotations for image segmentation tasks, which is a significant time-saver compared to manual annotation. The video demonstrates how to use the auto annotation feature in the ultralytics package, which leverages pre-trained models to perform this task.

💡Image segmentation

Image segmentation is a process in digital image processing that partitions an image into multiple segments or regions. Each segment represents a set of pixels that share certain characteristics, such as color or texture. In the video, image segmentation is the main focus, where the goal is to provide a class label to each pixel of an image, which is more detailed and time-consuming than object detection annotation.

💡Object detection

Object detection is a computer vision technique that deals with identifying and locating multiple objects in an image or video. It typically involves drawing bounding boxes around the objects of interest. The video contrasts object detection with image segmentation, highlighting that object detection is less time-consuming as it only requires bounding boxes rather than pixel-level annotations.

💡Segment Anything Model (SAM)

The Segment Anything Model (SAM) is an instant segmentation model released by Meta AI, which has been trained on a large dataset to perform pixel-level image segmentation. In the video, SAM is used as the core model for auto annotation, demonstrating its capability to generate segmentation masks on images and videos with the help of bounding boxes provided by a detection model.

💡Pre-trained model

A pre-trained model is a machine learning model that has already been trained on a large dataset, allowing it to be used or fine-tuned for specific tasks without starting from scratch. In the video, the presenter uses a pre-trained object detection model to generate bounding boxes, which are then used by the SAM model for auto annotation of segmentation tasks.

💡Bounding boxes

Bounding boxes are rectangular areas drawn around objects of interest in images or videos, used to define the region where the object is located. In the context of the video, bounding boxes are generated by a pre-trained detection model and are crucial for the SAM model to perform segmentation, as they provide the areas on the image where the segmentation masks need to be applied.

💡Ultraalytics

Ultraalytics is a company that has implemented the SAM model in their software package, providing tools for computer vision tasks such as object detection and image segmentation. The video demonstrates how to use the auto annotation feature of the ultralytics package to prepare image segmentation datasets automatically.

💡Class label

A class label is a descriptor that identifies the category or class to which an object or pixel belongs in an image. In image segmentation, each pixel is assigned a class label, which is a fundamental part of the annotation process. The video explains how auto annotation can be used to assign these labels efficiently.

💡Pixel-level annotation

Pixel-level annotation is the process of assigning a label to each pixel in an image, which is a detailed and precise form of annotation. It is used in the video to describe the level of detail required for image segmentation, as opposed to the less granular object detection annotation that involves only bounding boxes.

💡Dataset

A dataset is a collection of data, often used in machine learning and computer vision to train models. In the video, the presenter discusses how to create a dataset for image segmentation using auto annotation, which involves preparing a set of images and generating corresponding annotation files.

Highlights

Introduction to the process of auto-annotation for image segmentation datasets.

Comparison of the time consumption between image segmentation and object detection annotation.

Release of Meta AI's segment anything model in April 2023.

Description of the segment anything model's training on a dataset with over 1 billion masks on 11 million images.

Ultralytics' implementation of the segment anything model and the introduction of the auto-annotation feature.

Explanation of how auto-annotation can automate the creation of image segmentation datasets.

Requirement of an object detection pre-trained model for auto-annotation.

Details of the technical environment used for the demonstration, including Python, Torch, Cuda, and ultralytics versions.

Instructions on how to install ultralytics and prepare the environment for auto-annotation.

Demonstration of using the segment anything model for image segmentation with ultralytics.

How to view the output image with segmentation mask on the screen.

Process of applying segmentation to a video using the segment anything model.

Using the segment anything model with a webcam for real-time segmentation.

Tutorial on generating annotations for images using the auto-annotate function from ultralytics.

Explanation of the necessity of a detection model for providing bounding boxes to the segment anything model.

Description of the process of creating annotation files using the auto-annotate function.

How the auto-annotate function works by combining detection and segmentation models.

Efficiency and accuracy benefits of using auto-annotation for large datasets.

Conclusion and appreciation for watching the tutorial on auto-annotation with ultralytics.

Transcripts

play00:00

hello everyone this is arohi and welcome

play00:02

to my channel so guys in my today's

play00:04

video I'll show you how to perform Auto

play00:07

annotation on a data set for image

play00:09

segmentation

play00:10

so annotating our image segmentation

play00:13

data set is more time consuming as

play00:15

compared to The annotation of object

play00:18

detection because image segmentation

play00:21

annotation requires pixel level

play00:23

annotation where we provide class label

play00:27

to each pixel of an image on the other

play00:29

hand an object detection annotation we

play00:33

provide bounding boxes for the objects

play00:35

the objects we are interested in okay

play00:38

and guys in last month only April 2023

play00:41

meta AI release their segment anything

play00:44

model which is an instant segmentation

play00:47

model and this model was trained on a

play00:52

very big data set that has moved over 1

play00:56

billion mask on 11 million images okay

play00:59

and this data set is the largest data

play01:02

set for Ms segmentation till now okay so

play01:06

recently

play01:08

uh ultrolytics company they implemented

play01:12

that Sam model in their ultralytics

play01:14

package and then they they have created

play01:18

a feature with the name of Auto

play01:20

annotation that auto annotation feature

play01:23

will you know using that feature you can

play01:27

perform image segmentation you can

play01:29

prepare your image segmentation data

play01:32

sets

play01:33

um automatically without uh without

play01:36

doing the manual labeling without doing

play01:38

the manual annotation so today I'll show

play01:41

you how to use the ultralytics auto

play01:44

annotation feature so that you can

play01:47

prepare your own image segmentation data

play01:50

set and the only thing is you need uh

play01:54

object detection pre-trained model using

play01:57

that object detection pre-train model

play01:59

you can create your annotations

play02:02

annotation files for segmentation tasks

play02:06

so let's see how to perform that okay so

play02:11

here

play02:12

so guys the python version I'm using is

play02:15

3.9 and the torch version is 2.0.1 and

play02:18

the Cuda is 11.7 and I'm working on RTX

play02:23

3090 GPU and the ultralytics version I'm

play02:26

using is

play02:28

8.0.106 okay so these are my versions

play02:31

you if you are trying ultralatics for

play02:34

first time so you just need to perform

play02:37

pip install ultrolytics and your

play02:40

environment will be ready to execute

play02:42

this code okay once you install the

play02:46

ultralytics after that you only need to

play02:49

import this so guys first I'm showing

play02:51

you how to see how to use the segment

play02:55

anything model okay with ultralytics you

play02:59

want to suppose you have an image and

play03:00

you want to put mask on that image so

play03:04

how to perform that using ultralytics

play03:07

which implemented that Sam model in it

play03:11

okay so you just need to import from

play03:14

this we are importing the Sam model and

play03:17

there are two kind of models of Sam one

play03:19

is Sam underscore L which is a large

play03:21

model and underscore b means base model

play03:24

okay

play03:25

so first I am using the base model we

play03:28

are just calling the Sam like this and

play03:30

provide the model so this is a trained

play03:33

model okay so you will directly now we

play03:36

are providing the image to it on which

play03:39

we want to perform the segmentation so

play03:42

model dot product just provide the path

play03:45

of the image so my image is in images

play03:47

folder okay let me show you the full

play03:50

device so this is the images folder

play03:52

inside it I have an image with the name

play03:54

of one dot jpg this is my image on this

play03:58

image I want to perform the segmentation

play04:00

okay so let's run the code

play04:04

when you will run it

play04:06

so your result will store like this okay

play04:09

inside you will get a runs folder inside

play04:13

that you will get a segment folder and

play04:16

this is my folder where the segmentation

play04:18

masks are there okay so let's open it

play04:21

and see

play04:23

so runs segment predict 4 and this is

play04:28

the image okay with the segmentation

play04:30

mask okay so this is how you can use Sam

play04:33

for segmentation Mass segmenting your

play04:36

image and this is how you will see the

play04:38

results okay now

play04:41

suppose

play04:43

you want to right now see guys our

play04:47

results got stored in predict folder uh

play04:49

sorry runs folder okay but what if you

play04:52

want to see the output image on the

play04:55

screen right now so then you just need

play04:57

to put this show equals to true and then

play05:00

run this command then what will happen

play05:02

is you will get the image with the

play05:05

segmented with the mask segmented image

play05:08

you will see it on the screen okay so

play05:11

let's let's execute it so when you'll

play05:13

execute it

play05:15

so you can see here

play05:21

so this is the image okay this is how it

play05:23

works

play05:25

now let's suppose till now we have tried

play05:28

our image but let's see if you want to

play05:30

try it on a video then what you need to

play05:32

provide just over here earlier we

play05:35

provide the image path now provide your

play05:37

video path so my video is this videos

play05:39

folder let's open a videos folder first

play05:45

inside this video this is the video on

play05:47

which I'm testing Okay so

play05:51

no

play05:53

let's run our code

play05:55

okay so if you'll write show equals to

play05:58

true then the video will open right the

play06:02

process will going on segmentation will

play06:04

get performed but side by side you will

play06:06

see a video with the segmentations okay

play06:09

so let's execute it

play06:15

so the processor started

play06:18

now you can see here

play06:21

the video is opening so it is working on

play06:24

each frame of a video one by one so it

play06:28

will take some time but this is how it

play06:29

works okay

play06:32

so if you want to stop the process in

play06:34

between you can do that otherwise see

play06:36

you can see for all the frames so we

play06:39

have 199 frames so all the frames it is

play06:43

um you know working on all the frames of

play06:45

the video one by one okay so after that

play06:48

if you want to work on a web camera so

play06:51

what you need to do just provide Source

play06:53

equals to zero and it will work on a web

play06:56

camera also okay so this is how you do

play06:59

if you do want to see the result on the

play07:01

screen then you can remove this show

play07:03

True from here okay now the next thing

play07:05

is now you know how to use Sam model to

play07:10

view videos to

play07:12

um to on images and on web camera now

play07:16

let's generate the annotations for the

play07:20

images okay so Auto annotation task the

play07:23

task which I told you in the beginning

play07:25

we are going to do that's what we are

play07:27

starting now okay so from ultralytics

play07:30

YOLO data and annotate

play07:32

there they have a function Auto annotate

play07:36

okay so let's open the ultralytics repo

play07:39

so this is the ultralytics trapper okay

play07:42

inside this ultralytics

play07:45

YOLO and then data inside the data they

play07:50

have annotator when you open this

play07:53

annotator there they have a function

play07:55

with the name of Auto annotate you can

play07:57

see this function okay so this this

play08:00

function is responsible for putting the

play08:04

masks on the images okay okay so now

play08:08

let's see over here so we are calling

play08:11

that auto annotate function

play08:13

after that so data equals to images so

play08:18

here you need to provide the path of the

play08:21

folder where your images are okay so I

play08:25

have

play08:26

these images for these two images I want

play08:31

to create uh annotation files okay so

play08:35

how YOLO works for each image you will

play08:38

get a one annotation file okay so we

play08:41

have two images in our data set so you

play08:43

will get corresponding to annotation

play08:46

file one file will have The annotation

play08:49

it all of this and other file will have

play08:51

The annotation detail of this okay

play08:54

so let's come here

play08:57

then you will provide the detection

play09:00

model so guys this Auto annotation

play09:03

feature how this feature works so the

play09:07

detection model the pre-trained

play09:09

detection model okay you this is a

play09:12

mandatory step you need a detection

play09:14

model okay so with the help of the

play09:16

detection model you will get the

play09:19

bounding boxes on the objects you are

play09:22

interested in those bounding boxes will

play09:25

go to the segment anything model then

play09:28

segment anything model will put a mask

play09:30

on the area where the bounding boxes are

play09:34

so that's why you need a bounding boxes

play09:36

because why we need this step because

play09:38

segment anything model can only put mask

play09:42

and there are no corresponding labels

play09:45

for them okay so when meta AI they they

play09:49

trained the Sam model there were no

play09:52

labels attached to the

play09:55

masks okay so that's why we need a

play09:59

object detection model object detection

play10:02

model will put up bounding boxes on the

play10:05

objects okay and it will give you a

play10:08

class label then that will be that

play10:11

bounding box will be the input to the

play10:14

segment anything model that segment

play10:17

anything model will then put a

play10:19

segmentation then we'll put a mask on

play10:23

the on the area the bounding boxes okay

play10:27

so that's why so this is the detection

play10:29

model and this is the Sam model and this

play10:31

is the folder in which our data set is

play10:34

and I want to have The annotation files

play10:37

for that so when you'll execute it when

play10:40

you will execute it it will create a

play10:43

labels folder and inside that labels

play10:46

folder you will see The annotation files

play10:49

okay so now in my case I have two images

play10:52

now let's see the labels folder here is

play10:54

a label for folder open it so you can

play10:58

see the two annotation files the first

play11:00

file will have

play11:03

the first file will have The annotation

play11:06

okay segmentation annotation and what is

play11:10

there this first two why this is 2 over

play11:13

here this is the class ID so in Coco

play11:16

data set for car the class ID is 2 so

play11:19

that's why we have a class ID and this

play11:21

is the segmentation these are the

play11:23

different the points okay the annotated

play11:27

annotation points okay so in the same

play11:29

way for Class 2 also you can see we have

play11:33

The annotation file so guys you can know

play11:36

from here the hard C

play11:38

uh using this feature you can save lot

play11:42

of your time if you have large data sets

play11:45

and segmentation The annotation for

play11:48

segmentation task is very time consuming

play11:50

right and you have to do it very

play11:51

carefully but you with the help of the

play11:55

detection model and the segmentation

play11:57

model you can do it in very less time

play12:01

and efforts will be less and obviously

play12:04

you will get a better accuracy okay so

play12:07

now the thing is let's see that auto

play12:09

annotate function

play12:11

so this is the auto annotate function

play12:13

okay so what is happening in this order

play12:15

no trade function so these are the

play12:18

things detection model is here

play12:20

segmentation model is here then

play12:24

this is how you perform uh detection

play12:28

right in Yolo V8 How We call we are

play12:31

calling our detection model okay here

play12:34

then we are providing that data to it

play12:36

and the results are getting stored in

play12:38

this then we are using a for Loop

play12:41

because what we want is we want to

play12:44

if you are working on a video or on a

play12:46

stream then for each frame you have to

play12:49

get the bounding boxes and the class

play12:52

labels so we are using Loop in that and

play12:55

inside that Loop over here you can see

play12:58

here we are using the segment anything

play13:02

model okay so in this Auto annotate

play13:05

function what is happening first we are

play13:07

performing detection detections are

play13:10

stored in date underscore results and

play13:13

then we are fetching the boxes and the

play13:16

class IDs and then inside that using

play13:20

this line

play13:21

set image set image is a function of the

play13:24

segment anything model so whenever you

play13:26

want to give image to a segment anything

play13:28

model so we use this set image okay so

play13:31

that's what we are doing over here and

play13:33

we are providing the image to it and

play13:35

this here we are running the Sam model

play13:39

and here we are updating the results and

play13:43

this few these few lines are responsible

play13:46

to write the annotations in the text

play13:50

file in the labels folder okay so guys

play13:55

this is how you can use the auto

play13:57

annotation feature of the ultralytics

play14:00

which is using Sam model Sam model is

play14:03

developed by meta AI so I hope this

play14:06

video is helpful thank you for watching

Rate This

5.0 / 5 (0 votes)

Связанные теги
Auto AnnotationImage SegmentationUltralyticsSAM ModelMachine LearningDeep LearningData AnnotationAI TutorialPython CodingVideo Processing
Вам нужно краткое изложение на английском?