Object Detection using OpenCV Python in 15 Minutes! Coding Tutorial #python #beginners
Summary
TLDRThis tutorial introduces viewers to object detection using OpenCV and Python. The presenter demonstrates how to install necessary libraries, access a camera feed, and identify various objects in real-time. The video also covers how to use the gtts and play sound libraries to make the computer vocalize the detected objects, creating an interactive and informative experience for the audience.
Takeaways
- 😀 The tutorial focuses on object detection using OpenCV, a popular computer vision library.
- 🛠️ The presenter guides through the installation of necessary libraries, including `opencv-contrib-python`, `cvlib`, `gtts`, and `play-sound`.
- 🔎 `opencv-contrib-python` is preferred over `opencv-python` for its additional libraries that enhance functionality.
- 📱 The script demonstrates real-time object detection using the computer's webcam, identifying various objects like an apple, orange, and cell phone.
- 🎯 The `cvlib` library is utilized for its pre-trained models to recognize common objects within the video frames.
- 🗣️ The tutorial includes a feature to convert detected objects into spoken words using Google Text-to-Speech (gtts).
- 🔊 The `play-sound` library is integrated to efficiently play the synthesized speech.
- 📝 The script maintains a list of unique detected objects to avoid repetition in the output.
- 📑 The tutorial concludes with a function that converts the list of detected objects into a natural-sounding sentence and plays it aloud.
- 🎉 The presenter encourages user interaction through comments, likes, and subscriptions for further tutorials.
Q & A
What is the main focus of the tutorial?
-The main focus of the tutorial is to demonstrate how to use OpenCV for object detection, allowing the computer to identify and announce different objects seen through a camera feed.
Why is OpenCV-contrib-python used instead of OpenCV-python?
-OpenCV-contrib-python is used because it contains additional libraries beyond the basic modules of OpenCV-python, providing more functionality for advanced tasks such as object detection.
What libraries are installed for object detection in the tutorial?
-The tutorial installs 'opencv-contrib-python', 'cvlib', 'gtts', and 'play-sound' libraries to handle object detection, text-to-speech conversion, and audio playback, respectively.
How does the tutorial handle real-time object detection?
-The tutorial accesses the camera using 'cv2.VideoCapture' and processes each frame in a loop to detect objects in real-time, then draws boxes and labels around the detected objects.
What function is used to draw boxes around detected objects?
-The 'draw_box' function from 'cvlib.object_detection' is used to draw boxes around the detected objects in the video frames.
How is the list of detected objects managed to avoid duplicates?
-The tutorial uses a for loop to check if an item is already in the 'labels' list before appending it, ensuring each object is only announced once.
What is the purpose of the 'speech' function defined in the tutorial?
-The 'speech' function is used to convert the list of detected objects into a natural-sounding sentence and then use the 'gtts' library to convert the text into speech.
How does the tutorial ensure a more natural pause in the spoken output?
-The tutorial uses string interpolation to create a sentence with 'and' and commas, which are then joined into a single string to ensure natural pauses when the computer speaks the detected objects.
What is the significance of creating a 'sounds' directory in the project?
-The 'sounds' directory is created to store the audio files generated by the 'gtts' library, which are then played using the 'play-sound' library.
How does the tutorial handle user interaction to stop the object detection process?
-The tutorial uses a 'cv2.waitKey' function to check if the user presses the 'q' key, which, if pressed, breaks the loop and stops the object detection process.
Outlines
😀 Introduction to Object Detection with OpenCV
The video begins with an introduction to object detection using OpenCV. The presenter demonstrates holding up various objects such as an apple, an orange, and a cell phone, and explains the goal of getting the computer to verbally identify these objects within a video frame. The presenter then proceeds to explain the first step, which is to install the necessary dependencies. They guide the viewers through the installation of 'opencv-contrib-python' and 'cvlib' using pip, highlighting the additional libraries provided by 'opencv-contrib-python' over the standard 'opencv-python'. The presenter also addresses potential warnings about updating pip and suggests resolving them promptly.
🛠️ Setting Up Object Detection and Speech Synthesis
In this section, the presenter continues the setup process by installing additional libraries for speech synthesis and efficient sound playback. They use 'gtts' for text-to-speech conversion and 'play-sound' for playing the synthesized audio. The presenter then imports necessary modules from OpenCV, cvlib, and the other installed libraries. They explain the process of accessing the computer's camera to capture live video for real-time object detection. The script includes a loop to continuously read frames from the video capture and use cvlib to detect common objects, drawing boxes and labels around the detected objects.
🔍 Detecting and Storing Object Labels
The presenter explains how to process the detected object labels to avoid duplicates in the list. They create a list called 'labels' and use a for loop to append unique labels to this list. The script checks if an item is already in the 'labels' list before adding it, ensuring each object is only listed once. The presenter then demonstrates testing the list by printing its contents, which shows the successful detection of unique objects like a person and a tie.
🗣️ Implementing Speech Output for Detected Objects
The final part of the video focuses on converting the list of detected objects into spoken words. The presenter creates a function called 'speech' that takes text input, converts it to speech using gtts, saves the audio to a file, and plays it using the 'play-sound' library. They demonstrate how to construct a natural-sounding sentence from the list of labels and then use the 'speech' function to vocalize the findings. The video concludes with a live demonstration of the complete setup, where the computer successfully identifies objects in the video feed and verbally announces them, followed by a call to action for viewers to engage with the content and a wrap-up of the tutorial.
Mindmap
Keywords
💡OpenCV
💡Object Detection
💡cv2.VideoCapture
💡cv2.imread
💡cv2.imshow
💡cv2.waitKey
💡gtts
💡play sound
💡cvlib
💡String Interpolation
💡List Comprehension
Highlights
Introduction to object detection using OpenCV.
Installation of dependencies including OpenCV and cv-lib.
Explanation of the difference between OpenCV Python and OpenCV-contrib-python.
Installation of gtts and playsound libraries for text-to-speech functionality.
Importing necessary libraries for object detection and speech.
Accessing the camera for a live feed of object detection.
Loop to process each frame from the video capture for object detection.
Using cv-lib to detect common objects and draw boxes around them.
Displaying the detected objects with labels in real-time.
Creating a list to store unique labels of detected objects.
Explanation of avoiding duplicate entries in the labels list.
Printing the list of detected objects to verify the functionality.
Creating a more natural-sounding sentence from the list of detected objects.
Using string interpolation to format the detected objects into a coherent sentence.
Defining a function to convert text to speech using gtts and playsound.
Saving the generated speech as an MP3 file.
Playing the generated speech to confirm the object detection results.
Final demonstration of the object detection and speech synthesis working together.
Encouragement for viewers to ask questions and subscribe for more tutorials.
Transcripts
[Music]
i found a person and a tayachara apple a
doughnut orange a cell phone hey guys
welcome to another tutorial with opencv
today we are going to look into object
detection so i can just hold up an apple
or an orange
heck even my cell phone lots of
different objects and i'm going to get
my computer to tell me out loud using
its voice what it saw inside of the
frame that's enough introduction let's
get right into
[Music]
it so the first thing that we need to do
is install our dependencies so we are
going to import a couple of libraries
the first thing that we're going to
import is opencv contribute python so
we're going to say
pip
install open cv dash contrib dash python
and some people ask me why i use this
lately instead of just open cv python
it's because opencv python does contain
the main components that you need for
the basic modules of opencv but with
opencv can trip python it's going to
contain some extra libraries so that we
have a little bit extra to work with and
so we're just going to hit enter and
that's going to start installing okay
and if you ever get a warning like i did
where you need to update pip or anything
you can go ahead and do that as well i'm
just going to copy and paste that that
should be just really quick
and there we go perfect after we have
that installed we are now going to go
ahead and install c-lib
so pip install
cv lib okay and we're going to be using
this for our object detection so there's
a library that's already learned what
certain objects are so we're just going
to install that
and depending on your internet
connection that should be rather quick
very good and then we're going to also
allow our computer to
say out loud what it saw so if it sees
me it'll say i saw a person or i saw an
apple an orange so on and so forth so we
we're going to import just a couple more
things we're going to say pip install
gtts
space
play sound whoops
play
sound like that and then finally we are
going to
install
pi object c which is going to help with
that sound be a little bit more
efficient so i'm going to say
pip
3
install
capital p y capital o b j capital c
okay so that will allow play sound to be
a little bit more efficient so i already
have a couple of those installed
installs just gonna say already
satisfied for you it will probably stay
successful if you have any errors just
go back rewind make sure that you typed
everything correctly
otherwise let's move on
i'm just gonna slide down my window here
and i'm going to now import
cv2
import
cv lib as cv
and then from
cvliv.object
detection
import
draw box so it's going to be drawing a
box around our objects for us so make
sure you have two b's for box b b o x
and then we're gonna say from g t t s
import
g capital t t s oops i said g t a let me
do g t t s then finally from play
sound
import
play
sound so there's one two three four five
lines of imports but that is everything
that we're going to be using for this
video so if you need a little bit more
time you can go ahead and pause the
video and continue that
three days later so what we want to do
now is now access our camera now
originally when i was first building
this and testing things out i was just
having it bring in a specific image i'm
just going to look at objects in the
image but i wanted this to be live so it
has a live feed and we can detect all
the objects in the live feed instead so
we're going to access our cameras
so i'm going to say video equals
cv2 dot video
capture
and that takes an index now for most of
you it might be index zero but my web
camera that i want to be using instead
which is a lot more higher quality is
that index one
so you can just mess with those indexes
as you please
but we're going to start with that and
now we're going to say while
true
i'm now going to use my video capture
and i'm going to unpack
each frame into a
variable called frame so what we're
going to do is ret
comma
frame
equals our video
dot read
so unpack that so now we're going
through each frame
and now we're going to use that bb box
where it's going to be seeing the
objects it's going to draw a box around
it and we're also going to give it a
label next to the box to tell us what
the object is
so we're going to say bb box
comma label
and then
conf okay and comp is really just
identifying what the object is it's just
going to be returning some decimal
numbers really so i'm going to say cv
dot detect
common objects
and now we have to tell it where to get
those objects from
so we need to say get it from the frame
so that's going to be each frame from my
video feed and finally we're going to
draw that box so we're going to say
output
image
equals draw
box
and now we need to give dropbox the
frame
i want it to get the
box that it's going to be drawing around
and we're also going to put the label in
there and we'll stick conf in there very
good so now that we have that let's go
ahead and show the user what the image
looks like so i'm going to say cv2 dot i
am show
and we want to show them
the name of the window so i'll just call
this uh object
detection you can call that whatever you
want comma and now we've got to tell it
output image okay before we hit run
we're going to um
give this a wait key so i'm going to say
if
cv2 dot weight key
delay of one and
we're going to check to see if the user
is clicking a certain button
you can say whatever button you want but
i'm going to say if the user
clicks on
q some people like the space bar you can
just do a space
do whatever you want but i'm going to
say if the user says hits q i want you
to break out of this loop
and after i hit q it breaks out of that
window so very good so as you could see
that was already detecting me as a
person even detected this as a tie
that's already working so what i want
this to do now is i want my program to
take each of those labels that it finds
in my screen
and i want it to append or add to a list
so that i have that list of data so what
we're going to do now is we're going to
make a list called labels so let's come
up here and we'll call this labels
make sure this is outside of your loop
so it doesn't accidentally rename itself
inside the loop and just erase all the
data
um and what we're going to say is
we'll do a for loop we're going to say
for
item in label
if item in labels
then we're just going to have it pass
so that means if if it already found a
tie it's going to be checking multiple
images it's going to be checking for
objects in each frame
and so it's going to
say maybe like a thousand ties i don't
want it to do that i'm just going to say
if you find a tie in there then go ahead
and put it in the list but if tai is
already in the list don't add it to the
list so it's only going to say tie one
time and you can alter this if you'd
like
but this is the way i'm going to do it
um
if it's not already in the list then i
want you to
labels.append i want you to append
that item
so that item will be added to this list
called labels
and just to test that out
i'm going to
come down here and print
labels
and let's see if that works
okay and as you can see here is my list
called labels that i just printed so i
found a person and it found a tie so
very good uh i know that this is working
because if i didn't it would be the same
person
a thousand times in the tie a thousand
times but it's only gonna do it once
because of this code here so very good
so what i want this to do now is i
wanted to take this
data called labels so what i'm going to
do now is write code
using string interpolation to tell me
what it found
more logically for example i wanted to
say something like i found a orange a
person a book
a tie a cell phone an apple so on and so
forth
and so
how i'm going to do that is i'm going to
create a for loop for label
in labels
i want this to check to see if this is
the first time it's reading out loud a
label so i'm going to create some kind
of iterator so i'm going to say i equals
0 and i'm going to say here
if i
is 0
then i want this to actually append to a
list because i want this to
sound a little bit more natural when it
says it out loud so i'm going to say new
sentence
equals an empty list so i'm going to say
if i zero
then new
sentence dot append
that means to add and i want it to add
i'll use some string interpolation here
i found a
i'll put label there so if it's found a
person will say i found a person
and i'll do a comma and comma
that'll give this speech out loud to
give a little bit of a pause
so we'll see how that goes
and
then i'm gonna say if it is
not equal to zero
then new sentence dot append and i'll
use string interpolation again
and i'm gonna say
uh
label like so
and then once that is done we are going
to increment
i so i'm going to say
i
plus equals one so the first thing it
finds
i is it going to be zero so the first
thing it finds is going to say i found a
hat
and
a person
a book an apple so on and so forth so
very good with that uh and just to make
sure that this is all going to be all in
one string for our speech
to work properly after this let's go
ahead and use the join function so i'm
just going to say print
space dot join
new
sentence so that will turn this list
into a string so let's go ahead and test
that out
perfect so after hitting run and after
it found those things check it out it
added each of those things to the list
so said i found a person and a tie a
orange a chair a donut a apple again
this might look kind of weird with the
commas but it's going to help with the
pauses when the computer says it out
loud so very good if we have that
working just fine then let's go ahead
and add our speech part of this so
towards the top of our project
i'm going to now add
our speech
so let's go ahead and
define a function so def
we'll call this speech
and it's going to receive some text and
if you've seen my
virtual assistant where you build your
own siri or alexa this is the same exact
function that we're going to be using
so we're going to say
that text because we want to be able to
see it as well and i also want to set my
language to whatever i want i'm going to
set mine to
i will call yeah we'll say english so e
n
if you want to do spanish it's e es or
japanese it's ja you can always look
those up on google if you'd like to but
we'll do english for now
and then we'll give this some output so
output
equals now we're going to use gtts so
gtts which takes some text
which is going to be equal to the text
that we're going to send it
comma
now it's looking for the property of
language and so that will be our
language
and then finally it's going to ask how
fast we want to go so i'm going to say
slow equals false
just like that now what we need to do is
save
the output
into a file so what we need to do is we
need to save that audio
somewhere in our project so come into
your project and let's create a new
directory
and we're just going to call it
sounds so here's sounds i forgot to
change the name of the project so don't
worry about that but sounds is just
underneath that directory so
with that in place we can now save so
we're going to say output dot save
and now we're going to tell where to
save
so i'm going to say dot slash
dot slash sounds
and now call your file whatever you want
i'm just going to say output.mp3
says this will be a mp3 file and then
finally we're going to have it play the
sound
so we're going to use the play sound
library that we imported up here and
we're going to say
play sound which is gonna be that same
exact location
output dot mp3 and now we gotta send
whatever text we want over here
so down here we have a print instead of
print i'm going to say speech because
that's the name of our function
so that will send our string to our
function and it's going to take our text
and make it into actual speech it is
then saving
and then we're going to play that sound
let's see if this works
[Music]
[Music]
i found a person and a tayachara apple a
doughnut orange a cell phone
so hopefully you could have heard that
but it did say i found a person and a
tie an apple a orange a cell phone so
that is working great congratulations if
you just accomplished that that was
really cool pretty simple and if you
have any questions please let me know
down in the comments if you have any
requests please let me know don't forget
to drop a like and subscribe so that you
are notified of my next tutorial thank
you so much and happy coding
[Music]
you know
تصفح المزيد من مقاطع الفيديو ذات الصلة
Face recognition in real-time | with Opencv and Python
Tensorflow Lite with Object Detection on Raspberry Pi!
Execute Python Code Directly from MATLAB (pass and receive variables)
Building an Object Detection App with Tensorflow.JS and React.JS in 15 Minutes | COCO SSD
Driver Drowsiness Ditector | Mini Project
Monitoring Suhu Dan Kelembaban Menggunakan ESP32 + DHT22 Dengan Protokol MQTT
5.0 / 5 (0 votes)