YOLO-World: Real-Time, Zero-Shot Object Detection Explained

Roboflow
21 Feb 202417:48

Summary

TLDRThis video introduces YOLO World, a zero-shot object detection model that's 20 times faster than its predecessors. It requires no training and can detect a variety of objects in real-time, even on budget GPUs. The video discusses its architecture, speed advantages, and demonstrates how to run it on Google Colab. It also covers its limitations and potential applications, like detecting objects in controlled environments or combining it with segmentation models for faster processing.

Takeaways

  • 🚀 YOLO World is a zero-shot object detection model that can detect objects without any training.
  • 💻 It is designed to be 20 times faster than its predecessors, making real-time detection feasible.
  • 🔍 The model can be run on Google Colab, allowing for easy access and use without extensive hardware requirements.
  • 📈 YOLO World's architecture consists of a YOLO detector, a text encoder, and a custom network for cross-modality fusion.
  • 📊 It uses a lighter and faster CNN network as its backbone, contributing to its speed.
  • 🔑 The 'Prompt then Detect' paradigm allows for efficient processing by caching text embeddings, reducing the need for real-time text encoding.
  • 👥 The model can detect objects from a user-specified list of classes without needing to be trained on those specific classes.
  • 📉 Lowering the confidence threshold can help detect more objects, but may also result in duplicated detections.
  • 🎥 YOLO World excels in processing videos, achieving high FPS on powerful GPUs and decent FPS on more budget-friendly options like the Nvidia T4.
  • 🛠️ Non-max suppression is used to reduce duplicated detections by discarding overlapping bounding boxes with high Intersection over Union (IoU) values.
  • 🌟 While YOLO World is a significant advancement, it may not replace models trained on custom data sets in all scenarios, especially where high accuracy and reliability are critical.

Q & A

  • What is the main advantage of YOLO World over traditional object detection models?

    -YOLO World is a zero-shot object detector that is significantly faster than its predecessors, allowing for real-time processing without the need for training on a predefined set of categories.

  • How does YOLO World achieve its speed?

    -YOLO World achieves its speed through a lighter and faster CNN Network as its backbone and a 'prompt then detect' paradigm that caches text embeddings, bypassing the need for real-time text encoding during inference.

  • What are the three key parts of YOLO World's architecture?

    -The three key parts of YOLO World's architecture are the YOLO detector for multiscale feature extraction, the text encoder that encodes text into embeddings, and a custom network for multi-level cross-modality fusion between image features and text embeddings.

  • How does YOLO World handle detecting objects outside of the COCO dataset?

    -YOLO World uses a zero-shot detection approach where it can detect objects by specifying the list of classes it is looking for without needing to be trained on those specific classes.

  • What is the 'prompt then detect' paradigm mentioned in the script?

    -The 'prompt then detect' paradigm refers to the process where the model is given a prompt (list of classes) once, and then that information is used for subsequent detections without needing to re-encode the prompt for each inference.

  • How can YOLO World be used in Google Colab?

    -YOLO World can be run in Google Colab by installing necessary libraries, ensuring GPU acceleration, loading the model, setting the classes of interest, and then inferring on images or videos.

  • What is the significance of the Nvidia T4 GPU in the context of YOLO World?

    -The Nvidia T4 GPU is significant because it allows for decent FPS (frames per second) with YOLO World, making it a budget-friendly option for real-time object detection.

  • How does non-max suppression help in refining YOLO World's detections?

    -Non-max suppression is an algorithm that eliminates overlapping bounding boxes by keeping the one with the highest confidence score and discarding others, thus refining the detections and preventing duplicates.

  • What is the role of the 'relative area' filter in processing videos with YOLO World?

    -The 'relative area' filter is used to discard detections that occupy a large percentage of the frame, which helps in filtering out large, high-level bounding boxes that are not the desired objects.

  • What are some limitations of YOLO World compared to models trained on custom datasets?

    -YOLO World may be slower and less accurate than models trained on custom datasets. It also struggles with detecting objects outside of the COCO dataset with high confidence and may misclassify objects in complex scenes.

  • How does the author suggest combining YOLO World with other models for improved performance?

    -The author suggests combining YOLO World with fast segmentation models like Fast Mask R-CNN or Efficient Mask R-CNN to build a zero-shot segmentation pipeline that is significantly faster than using GroundingDINO alone.

Outlines

00:00

🚀 Introduction to YOLO World

The script introduces YOLO World, a zero-shot object detection model that can detect a wide range of objects without any training. It is highlighted as being 20 times faster than its predecessors and capable of real-time processing with a powerful GPU, or even decent FPS on a budget-friendly Nvidia T4. The presenter discusses the architecture of YOLO World, explaining its three key parts: the YOLO detector for feature extraction, the text encoder for embedding, and a custom network for cross-modality fusion. The model's speed is attributed to its lighter CNN backbone and the 'prompt then detect' paradigm, which caches text embeddings to avoid real-time encoding. The script also mentions a community session where questions about YOLO World can be addressed.

05:01

🛠️ Setting Up and Using YOLO World

The script outlines the process of setting up YOLO World in Google Colab, including ensuring GPU acceleration and installing necessary libraries. It details the steps to load the model, set classes, and perform inference. The presenter demonstrates how to use the model to detect objects like a person and a dog, and discusses the challenges of detecting classes outside the standard COCO dataset. To improve detection, the script introduces techniques like lowering the confidence threshold and using non-max suppression to eliminate duplicate detections. The presenter also shares their experience of achieving 15 FPS on Nvidia T4 and suggests that YOLO World is particularly effective for video processing.

10:02

🔍 Advanced Detection with YOLO World

The script explores advanced detection scenarios with YOLO World, such as detecting objects with specific characteristics like 'holes filled with yellow substance'. It discusses the challenges of defining prompts for such objects and how using color references can improve detection accuracy. The presenter also addresses issues like large high-level bounding boxes that do not precisely match the target objects, especially with low-resolution images. To refine results, the script introduces a method to filter detections based on the relative area of bounding boxes to the frame. The presenter demonstrates how to process an entire video, save results, and discusses the limitations of YOLO World compared to custom-trained models.

15:03

🌟 Conclusion and Future of YOLO World

The script concludes by emphasizing YOLO World's significance in making open vocabulary detection faster and more accessible. It mentions the potential of combining YOLO World with segmentation models to create powerful video processing pipelines. The presenter encourages viewers to experiment with YOLO World and provides links to resources like a cookbook and a hacking face space for further exploration. The script ends with a call to action for viewers to like, subscribe, and join community sessions for more computer vision content.

Mindmap

Keywords

💡YOLO World

YOLO World is a zero-shot object detection model that can identify objects without any training, making it a powerful tool for real-time object detection. It is highlighted in the video as being 20 times faster than its predecessors, which is a significant advancement in the field of computer vision. The script mentions using YOLO World to process images and videos, emphasizing its speed and efficiency.

💡Zero-shot object detection

Zero-shot object detection refers to a model's ability to detect and recognize objects without being explicitly trained on examples of those objects. This is a key feature of YOLO World, as it can detect objects from a predefined set of categories without additional training. The video discusses how this capability allows for more flexible and immediate deployment in various scenarios.

💡Real-time processing

Real-time processing is the ability of a system to process information quickly enough to handle live input, such as video streams. The video script emphasizes YOLO World's capability to run in real-time, which is crucial for applications like live video analysis or surveillance systems.

💡GPU

GPU, or Graphics Processing Unit, is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. The script mentions the need for a powerful GPU to run YOLO World effectively, indicating the computational intensity of real-time object detection tasks.

💡Architecture

In the context of the video, architecture refers to the structural design of the YOLO World model, which consists of a YOLO detector, a text encoder, and a custom network for cross-modality fusion. This architecture is what enables the model to be both fast and accurate, as explained in the video.

💡Inference

Inference in machine learning is the process of deriving conclusions from premises. In the script, inference is used to describe the process by which YOLO World analyzes input data to detect objects. The video discusses how YOLO World can perform inference without needing to encode prompts each time, which contributes to its speed.

💡Prompt then detect paradigm

The 'prompt then detect' paradigm is a method where the model is given a prompt (in this case, a list of classes to look for) and then performs detection based on that prompt. The video explains how YOLO World uses this paradigm to avoid real-time text encoding, thus improving its speed.

💡Non-max suppression

Non-max suppression is an algorithm used to reduce overlapping bounding boxes to a single box for each object, which improves detection accuracy. The video mentions how this algorithm is used by YOLO World to refine object detection results.

Highlights

Introduction to YOLO World, a zero shot object detector that is 20 times faster than its predecessors.

YOLO World can run in real-time with access to a powerful GPU, offering decent FPS even on cheaper Nvidia T4.

Traditional object detection models are limited to predefined categories, requiring new training for new objects.

Zero shot detectors like Grounding Dyo and YOLO World eliminate the need for training by using a prompt list of classes.

YOLO World's architecture consists of a YOLO detector, text encoder, and a custom network for cross-modality fusion.

The use of a lighter CNN network and prompt-then-detect paradigm contributes to YOLO World's speed.

Instructions on how to run YOLO World in Google Colab to process images and videos.

Demonstration of the model's ability to detect objects from a list of specified classes without training.

The model's performance in detecting classes outside of the COCO dataset is significantly lower in confidence.

Techniques to improve detection by lowering the confidence threshold and using non-max suppression.

YOLO World's superior performance in processing videos compared to individual images.

Practical example of using color and position as references in prompts for better detection.

How to filter detections based on relative area to avoid detecting the entire object when only a part is desired.

The process of saving the results of video processing using the supervision package.

Discussion on the limitations of zero shot detectors like latency and accuracy compared to custom-trained models.

Potential use cases for YOLO World, such as in controlled environments like a factory.

Encouragement to prototype with YOLO World and refine detections using the techniques shown.

The future potential of combining YOLO World with fast segmentation models for a zero shot segmentation pipeline.

Introduction of a hacking face space where users can process images and videos using YOLO World and other models.

Final thoughts on YOLO World as an important step towards making open vocabulary detection faster and more accessible.

Transcripts

play00:01

what if I tell you that there is a model

play00:03

that you can use to detect all of these

play00:05

objects and more without any training

play00:09

and that it can run in real

play00:11

time well at least if you have access to

play00:15

a powerful GPU but you can still get a

play00:18

decent FPS on cheap Nvidia T4 today we

play00:22

are going to talk about YOLO World a

play00:24

zero shot object detector that is 20

play00:28

times faster than its predecessors we'll

play00:30

talk about architecture discuss the main

play00:33

reasons why it's so fast and above all I

play00:36

will show you how to run it in Google

play00:38

collab to process images and videos

play00:42

traditional object detection models such

play00:44

as faster rcnn SSD or YOLO are designed

play00:48

to detect objects within predefined set

play00:50

of

play00:51

categories for instance models trained

play00:54

on Coco data sets are limited to 80

play00:56

categories if you want a model to detect

play00:59

new objects we need to create a new data

play01:02

set with images depicting the objects we

play01:04

want to detect annotate them and train

play01:08

our detector this of course is timec

play01:10

consuming and expensive in response to

play01:13

this limitation researchers began to

play01:15

develop open vocabulary models not even

play01:18

a year ago I showed you grounding dyo a

play01:21

zero shot object detector that back then

play01:24

blew my mind and to be honest I'm still

play01:27

impressed by its capabilities all have

play01:30

to do is prom the model specifying the

play01:32

list of classes that you are looking for

play01:34

and that's it no training is required

play01:37

the downside of grounding Dino was its

play01:40

speed it took around 1 second to process

play01:43

a single image good enough if you don't

play01:46

care about the amount of latency but

play01:48

pretty slow if you are thinking about

play01:50

processing live video streams that's

play01:53

because zero shut detectors are usually

play01:55

using heavy Transformer based

play01:58

architecture and require simultaneous

play02:01

processing of text and images during the

play02:04

inference and that brings us to Yellow

play02:07

World a zero shot object detector that

play02:10

at least according to the paper is

play02:12

equally accurate and 20 times faster

play02:15

than its predecessors if you would like

play02:17

to learn more about grounding Dino and

play02:19

yellow world you can find links to two

play02:22

of my blog posts covering those models

play02:24

in the description below now let me give

play02:27

you a brief overview of the model

play02:29

architecture

play02:31

YOLO world has three key Parts YOLO

play02:33

detector that extracts multiscale

play02:36

features from input image clip text

play02:38

encoder that encodes the text into text

play02:41

embeddings and a custom networks that

play02:43

perform multi-level cross modality

play02:45

Fusion between image features and text

play02:47

embeddings which has the name so

play02:49

complicated that I won't even try to

play02:51

pronounce it rep parameterizable Vision

play02:54

language path aggregation network using

play02:57

a lighter and faster CNN Network as its

play03:00

backbone is one of the reasons for

play03:02

yellow world speed the second one is

play03:06

prompt then detect Paradigm instead of

play03:08

encoding your prompt each time you run

play03:11

inference yellow world use clip to

play03:13

convert the text into embeddings those

play03:16

embeddings are then cached and reused

play03:20

bypassing the need for realtime text and

play03:23

coding okay enough of the talk let's

play03:26

take a look at some code but first a

play03:28

short announcement

play03:31

this week we will have our first

play03:32

Community session so if you have any

play03:35

questions about yellow World model or

play03:38

about the code that I will show today

play03:40

leave them in the comments I will try to

play03:42

answer all of them during the live that

play03:45

we will host this week you can find more

play03:48

details about it in the description

play03:50

below you can also join the live and ask

play03:52

your questions real time that would be

play03:54

awesome because I don't want to sit

play03:58

alone the link to the The cookbook I'll

play04:00

be using is in the description below and

play04:03

I strongly encourage you to open it in

play04:05

separate Tab and follow along we click

play04:08

the open in collab button located at the

play04:11

very top of the cookbook and after a few

play04:14

seconds we should get redirected to

play04:16

Google collab website now the first

play04:19

thing we need to do is to ensure that

play04:21

our collab is GPU accelerated as usual

play04:25

we can do it by executing the Nvidia SMI

play04:28

command after brief moment we should see

play04:30

a table containing information including

play04:33

installed version of Cuda and the name

play04:35

of the graphic card that we have at our

play04:38

disposal now that we confirm that Cuda

play04:41

session has GPU support it's time to

play04:44

install the necessary libres the first

play04:47

is rof flow inference a python package

play04:49

that we will use to run Yello World

play04:51

locally but you can use it to run all

play04:54

sorts of different computer vision

play04:55

models and the second is supervision the

play04:58

computer vision Swiss army knife that we

play05:01

will use among other things for

play05:03

filtering and annotating our detections

play05:06

the installation may take few moments so

play05:09

let's use the magic of Cinema to skip it

play05:12

to confirm that everything went smoothly

play05:14

let's try to import the packages we need

play05:17

both open CV and tqdm are available in

play05:21

Google collab out of the box that's why

play05:23

we didn't include them in the

play05:25

installation section if you're running

play05:27

the notebook locally on your PC make

play05:30

sure that those packages are also

play05:32

installed yellow world is available in

play05:35

four different sizes SM m l and X but

play05:38

for now only the first three are

play05:41

accessible via inference package of

play05:44

course along with different sizes you

play05:45

should expect different speeds and

play05:48

accuracies in this tutorial I will use

play05:51

the L version but you should use the

play05:54

version that is suitable for you based

play05:56

on your speed accuracy and Hardware

play05:59

requirements

play06:01

to load the model we simply create an

play06:03

instance of yellow world class which we

play06:05

imported a few cells above this class

play06:08

has two core functions set classes and

play06:10

infer as mentioned in the introduction

play06:13

to avoid the need for realtime text

play06:15

encoding yellow World utilizes The

play06:17

Prompt then detect Paradigm by using the

play06:21

set class method our prompt is encoded

play06:23

into offline vocabulary let's choose a

play06:26

list of our classes person by backpack

play06:30

dog eye nose ear and tongue we see that

play06:34

the clip model is being downloaded in

play06:36

the background it will be used to

play06:38

convert our list of classes into

play06:41

embeddings now we just need to load our

play06:43

image and pass it as an argument to a

play06:46

second method I mentioned infer then

play06:49

using the utilities available in

play06:51

supervision package we can visualize our

play06:55

results well unfortunately from the

play06:57

entire list of the class cles that we

play06:59

provided only two were detected a person

play07:03

and a dock in my experiments I noticed

play07:06

that classes outside of Coco data set

play07:09

are detected with a significantly lower

play07:11

confidence level let's try to lower the

play07:14

threshold to include detections that

play07:17

model is less certain about our updated

play07:21

code is very similar essentially only

play07:23

two things have changed we drastically

play07:26

lowered the confidence threshold from

play07:28

the default

play07:29

0.5 to

play07:32

0.03 and we updated our code visualizing

play07:36

detections to display not only the class

play07:39

name but also the confidence level such

play07:42

a low confidence level is not something

play07:44

that I would usually recommend but in

play07:46

case of yellow world this strategy works

play07:49

really well in this visualization we see

play07:52

that this time significantly more of the

play07:55

wanted classes have been detected but we

play07:58

have new problem duplicated detections

play08:02

each object is now associated with two

play08:04

or even three bounding boxes to solve

play08:07

this problem we'll use non-max

play08:09

suppression non-max suppression is an

play08:11

algorithm that use intersection over

play08:13

Union IOU to estimate the degree to

play08:17

which detections overlap with each other

play08:20

detections with high IOU over the set

play08:23

threshold and low confidence are then

play08:27

discarded some time ago I wrote a blog

play08:29

post about nms so if you want to learn

play08:32

more the link is in the description once

play08:35

again we introduced only minor change in

play08:38

the code this time we use the with nms

play08:41

function available in supervision and

play08:44

set an aggressive IOU threshold value of

play08:48

0.1 the lower the value within the range

play08:51

between 0 and one the smaller the

play08:54

overlap between detections must be for

play08:57

one of them to be discarded comparing

play09:00

the results at each stage we see that

play09:02

ultimately we could detect a lot more

play09:05

objects from Wanted list while

play09:07

maintaining the high quality of obtained

play09:09

results I know that confidence levels at

play09:12

around 1% appear quite unusual but as I

play09:16

said in case of yellow World it just

play09:19

works yellow World shines brightest when

play09:22

processing videos not just individual

play09:24

images according to the paper it can

play09:27

achieve up to 50 FPS on Nvidia v00

play09:31

during my experiments I got 15 FPS on

play09:34

Nvidia T4 a much budget friendly

play09:38

alternative transition from processing

play09:40

single image to processing entire videos

play09:42

is fairly straightforward we just Loop

play09:45

over the frames of the video and run

play09:48

inference for each of them because we

play09:50

look for the same objects on each frame

play09:53

we only need to encode our class list

play09:56

once in our next experiment we will

play09:59

attempt something more ambitious than

play10:01

detecting dock class in this video we

play10:04

see an object with holes that are being

play10:07

filled with yellow substance let's check

play10:10

if yellow world will be able to locate

play10:12

those filled

play10:14

holes because the objects that we are

play10:16

looking for are hard to Define choosing

play10:19

the right prompt was a challenge I tried

play10:21

several variants But ultimately using

play10:24

Color reference proved to be the most

play10:26

effective one thing that I didn't me

play10:29

mention in the intro but I think is

play10:30

quite important authors of the paper

play10:33

show examples of using color and

play10:35

position as reference so don't hesitate

play10:38

to use them in your

play10:40

prompts let's load the first frame of

play10:43

our video and run set classes method

play10:46

setting our prompt to Yellow feeling as

play10:49

before we run inference with low

play10:51

confidence threshold of

play10:55

0.02 now our model successfully detects

play10:58

individual holes but along with them it

play11:01

accidentally detect the entire object

play11:04

with holes we observed the same effect

play11:07

few months ago while testing grounding

play11:09

dyo both models tend to return large

play11:11

highlevel bounding boxes that in some

play11:14

sense meet our criteria but are not the

play11:17

objects that we are looking for this

play11:19

especially happens when processing

play11:21

images or video with low resolution to

play11:25

solve this problem we'll filter our

play11:27

detections based on the relative area if

play11:31

a given bounding box occupy a larger

play11:33

percentage of the frame than a set

play11:36

threshold it will be dropped it sounds

play11:38

quite complicated but implementing it

play11:40

with supervision is a piece of cake

play11:43

first using the video info class will

play11:46

obtain information about the resolution

play11:48

of the video the width and the height of

play11:51

the entire frame knowing this values

play11:54

calculating the total area in pixels is

play11:57

quite straightforward on the other hand

play12:00

supervision also provides easy access to

play12:02

the area of individual bounding boxes

play12:05

now all we need to do is divide the

play12:07

individual areas by the area of the

play12:10

entire frame to obtain the relative area

play12:13

thanks to naai the entire operation can

play12:16

be vectorized and performed

play12:18

simultaneously for all bounding

play12:21

Boxes by setting a relative area

play12:24

threshold in my case

play12:26

0.1 we can construct a l iCal condition

play12:29

that allows us to filter out bounding

play12:32

boxes larger than 10% of entire frame

play12:35

when we visualize the result we see that

play12:38

only detections representing the filled

play12:40

holes remain awesome the final step is

play12:43

to process the entire video and save the

play12:46

result in separate file to do this we

play12:48

will use two utilities available in

play12:50

supervision the frame generator will

play12:53

help us Loop over frames of the source

play12:55

video while video sync will take care of

play12:58

recording according the result all the

play13:01

code we've wrote so far goes inside the

play13:03

for Loop and will be triggered for each

play13:06

frame of the video we see that yellow

play13:08

world has successfully solved a task

play13:11

that traditionally would have required

play13:13

training of a model on the custom data

play13:16

set so is yellow World a golden solution

play13:19

the model that ends training on custom

play13:21

data sets well no there are still cases

play13:25

where I would choose model trained on

play13:26

custom data set over zero detector like

play13:29

yellow world let's start with obvious

play13:32

issue

play13:34

latency yellow world is much faster than

play13:37

its predecessors but still significantly

play13:39

slower than state-of-the-art real-time

play13:41

object detectors therefore if we need

play13:44

fast processing and have limited

play13:46

computational resources we still need to

play13:48

rely on more traditional Solutions

play13:51

yellow world is also less accurate and

play13:53

reliable than detectors trained on

play13:55

custom data set there are of course

play13:58

cases when yellow world may prove useful

play14:01

especially when we tightly control the

play14:05

environment a perfect example is one of

play14:07

the Snippets I showed you in the intro I

play14:10

think yellow world could be successfully

play14:12

deployed a cound factory for instance to

play14:14

keep count of daily

play14:16

production unexpected objects cannot

play14:19

suddenly come into the view of the

play14:21

camera and qu sounds move one by one on

play14:24

the other hand preparing the suitcase

play14:26

demo took me quite some time and and I

play14:29

still must admit I did some cherry

play14:31

picking yellow World excels at detecting

play14:34

suitcases however other objects often

play14:37

appear in the frame as well and the

play14:39

model May occasionally misclassify those

play14:41

objects as

play14:43

suitcases so to sum it up I encourage

play14:46

you to prototype with yellow World check

play14:48

if it works in your use case use the

play14:50

techniques that I showed you today to

play14:52

refine detections however be prepared

play14:55

that at the end of the day you may still

play14:57

train your model on custom data set

play14:59

especially if you don't control the

play15:01

environment you have poor camera

play15:03

placement or you simply look for objects

play15:06

that for some reason cannot be detected

play15:08

yellow World opens the way to use cases

play15:10

that so far have been impossible like

play15:13

open vocabulary video processing or

play15:16

using zero shut detectors on the edge

play15:18

but it's just the beginning by combining

play15:21

yellow world with fast segmentation

play15:23

models like Fast sum or efficient sum we

play15:26

can build a zero shut segmentation

play15:28

pipeline that run dozens of times faster

play15:31

than grounding dyo plus some combo I

play15:33

showed you a few months ago inspired by

play15:36

this idea I created a hagging face space

play15:38

where you can use yolow world and

play15:40

efficient Sam to process your images and

play15:43

videos you can even take this idea a

play15:45

step further and build video processing

play15:48

pipeline that automatically removes

play15:50

background behind detections use the

play15:52

fusion base models to dynamically

play15:54

replace them or completely remove them

play15:57

from the frame by combin combining

play15:58

yellow World efficient Sam and prop

play16:05

[Music]

play16:18

[Music]

play16:27

painter

play16:29

[Music]

play16:37

yellow world is an important step in

play16:39

making open vocabulary detection faster

play16:41

cheaper and wildly available maintaining

play16:44

almost the same accuracy it is 20 times

play16:46

faster and five times smaller than

play16:48

leading zero shot object detectors I

play16:51

highly encourage you to take a look at

play16:53

our materials the link to cookbook and

play16:55

hagging face space are in the

play16:57

description below if you have any

play16:59

questions leave them in the comment I'll

play17:01

try to answer all of them during the

play17:03

upcoming Community session and that's

play17:06

all for today if you like the video make

play17:08

sure to like And subscribe and stay

play17:10

tuned for more computer vision content

play17:12

coming to this channel soon my name is

play17:14

Peter and I see you next time

play17:27

bye

play17:41

[Music]

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Object DetectionYOLO WorldZero-ShotReal-TimeAI ModelComputer VisionGPU AcceleratedInference SpeedOpen VocabularyVideo Processing
¿Necesitas un resumen en inglés?