Speech To Text using ESP32
Summary
TLDRThis tutorial video guides viewers on creating a standalone voice assistant using the ESP32 board and Google's speech-to-text API. It covers setting up a Google Cloud account, obtaining an API key, and writing Arduino code to convert speech to text. The video also discusses using an MEMS microphone and ESP32 for audio input, and hints at an upcoming project involving integrating the speech-to-text functionality with a chatbot API for a complete voice-controlled assistant.
Takeaways
- 😀 The video series on ESP32 has successfully demonstrated running a chatbot on an ESP32 board using chat APIs and listening to responses through a speaker.
- 🔍 The audience requested a standalone voice assistant based on chat GPT, which can take questions through a microphone and provide answers via a speaker.
- 🛠️ The presenter's team is working on creating a voice assistant and the first step is learning to convert speech to text using Google Cloud services.
- 📈 The video is sponsored by LTM, promoting their product, Altium 365, an electronics product design platform that facilitates PCB design, data management, and teamwork.
- 🔑 To convert speech to text, one must first obtain an API key from Google Cloud for the Speech-to-Text service, which involves creating a Google Cloud account and enabling the API.
- 💳 The Google Cloud account creation process includes providing business details and credit card information for verification, although no charges are made initially.
- 📝 The video provides a step-by-step guide on how to generate an API key, restrict its usage to the Speech-to-Text API, and integrate it into an Arduino code.
- 🔍 The code for speech-to-text conversion is explained, including modifications needed for WiFi credentials and API key, and the original code is available for reference.
- 🎙️ The hardware setup involves an ESP32 board and an MEMS microphone, with instructions provided for connecting these components.
- 📝 The code logic is divided into microphone and Google Cloud parts, with the microphone capturing audio in a digital format and Google Cloud processing it to return text.
- 🔄 The video mentions a limitation in the code regarding the maximum recording time for speech-to-text conversion, which is around 2.5 to 3 seconds, and seeks solutions from the audience.
- 🔗 The presenter encourages the audience to subscribe for the upcoming video that will demonstrate the creation of a complete voice assistant based on chat GPT.
Q & A
What is the main topic of the video?
-The main topic of the video is about converting speech to text using Google Cloud services for a standalone voice assistant based on chat GPT on an ESP32 board.
What is the purpose of creating a standalone voice assistant?
-The purpose is to enable users to ask questions directly to the device and listen to the answers through a speaker, without the need for manual text input.
Which platform is used for the speech to text conversion in the video?
-Google Cloud services are used for the speech to text conversion.
What is the initial credit provided for a new Google Cloud account?
-A new Google Cloud account is provided with an initial credit of up to 300 US dollars.
How long is the free trial period for the Google Cloud account?
-The free trial period for the Google Cloud account is 90 days from the day the account is created.
What is the name of the product sponsored in the video?
-The sponsored product is called 'LTM 365', an Electronics product design platform.
What does LTM 365 offer for PCB designing and project collaboration?
-LTM 365 offers PCB designing, project sharing for review, centralized cloud storage, component management, real-time supply chain data, and the ability to send designs to manufacturing units.
What is the limitation of the speech to text conversion code presented in the video?
-The limitation is the time constraint, where the speech needs to be completed within approximately 2.5 to 3 seconds for accurate conversion.
How can one access the free trial version of LTM 365 mentioned in the video?
-The free trial version of LTM 365 can be accessed through the link provided in the description of the video.
What is the next step after learning speech to text conversion in the video series?
-The next step is to create a complete voice assistant based on chat GPT, which will involve filtering the text, using chat GPT API, and converting the answers into speech using TTS service.
Where can the code for the speech to text conversion be found?
-The code can be found on the presenter's GitHub repository, the link to which is provided in the video description.
Outlines
🤖 Building a Standalone Voice Assistant with ESP32
The script introduces a project to create a standalone voice assistant using ESP32, following previous successful integrations of chatbot functionalities. The tutorial will guide viewers on converting speech to text with Google Cloud Services, which is essential for the voice assistant project. The video is sponsored by LTM and their product, Altium 365, an electronics product design platform. The speaker provides a step-by-step guide on setting up a Google Cloud account, enabling the Speech-to-Text API, and generating an API key, which is crucial for the project's code.
🔧 Coding and Hardware Setup for Speech-to-Text Conversion
This paragraph details the coding process for speech-to-text conversion using the ESP32 and a MEMS microphone. The speaker provides the complete code, modified to fit the project's needs, and references the original code for further reference. The hardware setup involves the ESP32 and the microphone, with a connection diagram available on the speaker's website. The speaker explains the need to restrict the API key usage to the Speech-to-Text API only for security. The code explanation is simplified using a whiteboard, dividing the process into microphone and Google Cloud parts, detailing the audio signal processing and API request/response in JSON format.
📈 Demonstrating Speech-to-Text Conversion and Future Plans
The speaker demonstrates the speech-to-text conversion process by recording speech and showing the conversion results in the serial monitor. The script discusses the limitations of the current code, specifically the time constraint for recording, and invites viewers with solutions to contribute to the community. The video concludes with an invitation to subscribe for the upcoming project of creating a complete voice assistant based on chat GPT, which will involve text filtering, interaction with the chat GPT API, and text-to-speech conversion for final output through speakers.
Mindmap
Keywords
💡ESP32
💡Chat GPT
💡Speech-to-Text Conversion
💡Google Cloud Services
💡API Key
💡LTM 365
💡Microphone
💡Arduino IDE
💡JSON
💡TTS (Text-to-Speech)
💡Language Code
Highlights
Successfully running chat, Deputy on an ESP32 board using chat, GPD APIs.
Listening to answers from chat GPD through a speaker attached to the ESP32.
Introduction of a project to create a standalone Voice Assistant based on chat GBD.
Attaching a microphone to the ESP32 for direct voice interaction with the Voice Assistant.
Learning to convert speech to text using Google Cloud services for voice input in various projects.
Sponsorship by LTM and introduction of their product, ldm 365, an Electronics product design platform.
Guide on creating a Google Cloud account and enabling the API key for speech to text conversion.
Details on getting a free account with a $300 credit for Google Cloud services.
Instructions on restricting the API key usage to only the speech to text API.
Arduino IDE setup for writing code to convert speech to text.
Hardware requirements for the project: ESP32 with David V1 and an MEMS microphone.
Code modification according to personal needs and original code availability.
Explanation of the code's working mechanism using a whiteboard.
Demonstration of the speech to text conversion process with live examples.
Accuracy of the speech recognition and the confidence level of the results.
Limitation of the recording time to 2.5 to 3 seconds for the current code setup.
Invitation for community input on extending the recording time limit for better user experience.
Sharing of the code on GitHub for community access and further development.
Upcoming project预告: creating a complete Voice Assistant based on chat GPT with text-to-speech capabilities.
Transcripts
so till now in the esp32 chat GPD series
we were successfully able to run chat
Deputy on our esp3 board using the chat
GPD apis and then we were also
successfully able to listen to the
answers coming from the chat GPD using
the speaker attached to the esp32 so
after those two videos many of you
people asked me to make a standalone
Voice Assistant based on chat gbd in
which we can attach a microphone with
the help of which we can ask the
questions directly to it and we can
listen to the answers with the help of
the speaker attached to it well
definitely this is a very interesting
project to be made hence my team started
working on it and now before I teach you
about how to make that Standalone Voice
Assistant based on chat GPD you first
need to learn one more thing before
moving on to the last project which is
converting our speech to text so in this
video I'll be guiding you completely
about how to convert your speech to text
using the Google cloud services and this
will not only help you to make that chat
GPT project but this learning will help
you in multiple of your projects where
you want the voice input the speech
input converted into text for further
processing so this is a very useful
topic to be learned so stick around with
this video as I'll be covering
everything about it let's get started
this video is sponsored by LTM and they
came up with an amazing product called
as ldm 365. so ldm 365 is an Electronics
product design platform that unites PCB
design mcad data management and teamwork
so with LTM 365 you can do the PCB
designing task you can share your
projects over web for review purposes it
do cover sharing your PCB file to
Mechanical team so that they can create
the mechanical product package based on
your PCB then it also provides the
centralized cloud storage so you don't
need to rely on one single computer for
your files it also helps you with
managing your components and get
real-time supply chain data for your
components
it also allows multiple people to work
on single project and in the end it also
helps you with sending your design to
final manufacturing units so Altium 365
takes care of all other tasks so you put
more time and effort in making something
creative and useful and the good part is
you can try out it free version as well
I'll leave its free trial version link
down in the description of this video so
do check that out and now let's start
with this video
so now the first step for converting
speech to text is to get the API key for
Google Cloud so now let me guide you how
to make the Google Cloud account and how
to enable the API key for speech to text
conversion so now to get the Google API
for speech to text you first need to go
to cloud.google.com forward slash speech
to text I'll be linking with this link
down in the description of this video
and here you need to log in with your
Google account after that you can click
on start free button so initially it
will be a free account in which you'll
be getting a credit up to 300 US dollars
so here on the screen you can see 300
credit for free and it will work for 90
days from uh the day you make the
account like it is from today okay so
I'll select my country so here you need
to select what kind of organization we
have or what kind of uh like what's the
need of this Google API so I'll select
other here click on terms and services
as Okay click on continue it's asking
for business name I'll write this write
as techy SMS so here it is asking for
the card number so you need to provide
your card details so it won't be
charging any of the amount initially
okay as we are getting 300 for free but
after that when you use their services
they will start charging uh it according
to the use cases okay so let me type out
team card number it is asking for the
CVV number I'll provide that click on
continue once again so yeah it will be
charging a little amount of rupees 2
which will be graded once uh they verify
your account okay so I'll wait for the
OTP so here is the OTP and yeah uh they
verified my account and uh great so what
brought you to the Google Cloud so how
we came to know about this Google Cloud
that is what it is asking about so let's
select uh learn more explore click on
next what are you what you're interested
in doing with Google clouds okay so
you're interested in artificial uh
intelligence machine learning we are
interested in the apis and that's it
click on next what best describes your
role so I am an educator if okay I am an
educator so I'll select this and click
on done with this we have successfully
created a account for on Google Cloud
for using the text or speech to text
Services okay but we are not done yet we
need to create create an API and for
creating the API you can click on this
convert speech to text API option and
here you can click on enable this API
and now when you go to the credentials
section you'll be landing up on this
page now here you can create your new
API key for that you just need to click
on create credential click on API key it
is creating your unique API key so
here's the API key I'll copy it because
we need to paste it in our Arduino R
code Okay click on the close button and
first you need to go to this API key
option and here we need to uh provide
that we will be restricting this key to
just be utilized for speech to text API
only okay so you'll be using this just
for speech to text and nothing else
click on the save button
and that's it we successfully generated
the API key and now we are ready to
provide this key Insider code so after
learning about how to generate the API
key now let's jump on to the Arduino IDE
and let's understand uh how to write the
code for converting speech to text so
here is the complete code for converting
your speech to text so I modified the
code according to my need while the
original codes link I'll attach in the
description of this video okay and if I
talk about the hardware part of this
then I'm using the esp3 to do with David
V1 and an mems microphone which are
connected according to this connection
diagram
well both the Hardwares are available on
our website whose purchase link is down
in the description of this video so now
if I tell you what changes I you need to
make in this code then you just need to
go to network underscore parameter dot
as header file and you need to provide
Sid name and password of your WiFi
router after that this will remain as it
is no change in this no change in this
as well then you need to just change
this API key and how to generate the API
key that we already discussed in the
previous part okay so that's the only
change you to do in this code and the
rest of the code will remain as it is
now explaining this code will be really
very difficult uh for me and it will be
very confusing for you to understand as
well so let me explain the working of
this code uh using the Whiteboard so now
the code is divided into two part one is
the microphone part and another is the
Google Cloud part so in the microphone
part we do have only one microphone one
esp32 in which we are giving the audio
signal to the microphone which is given
to the esp32 which is stored in a linear
6 16 format okay so it's a linear 16.
it's the encoding format just like dot
MP3 that we are using okay so this is a
16-bit data so here basically what we
are doing is we are giving the analog
signal and storing it in a 16-bit format
like a digital format after doing that
the microphone part is cleared now comes
the part of the Google Cloud so in the
Google Cloud we are using the API or
whose API key we already generated okay
so the API will be requested onto this
host which is speech.googleaps.com and
we'll be attaching the API key inner
headers and the main part here is the
body of the API so this is the body of
the API which we are sending in the code
itself so the body is in the Json format
of course and we are getting the
response as well in the Json format so
what is inside the body let's just uh
discuss okay so first we have the audio
key value pair in which we are providing
the content which is nothing but the
audio file now this audio file is this
Digital Data that we have stored okay so
vs sending the digital format or Digital
Data which is nothing but our own audio
file into this content key value pair
after that inside the config part what
we are doing is we are providing the
configuration of this audio file like
what is its encoding method so it is
linear 16 what it is sample Hertz so it
is 16 000 Hertz so it is sampled at 16
000 Hertz and then the language code so
which language is used in this speech so
in my case I have used the Indian
English language code you can change the
language code in case you are using any
other language for this speech to text
conversion you can find this kind of
language code in the Google itself okay
so this much data along with the API key
we are sending it to this host and once
we send it the Google Cloud will you
know analyze that particular data which
is this and give us back the text format
of the speech and how accurate it is
okay so this kind of data will be
getting in response us so this logic is
embedded inside that code I hope now you
understood the logic
so that's how the speech to text code
works and in case you want to change
couple of parameters discussed in the
Whiteboard then here is that complete
body of that HTTP request under Cloud
speech client dot C plus plus file okay
so here's a linear 16 encoding file the
sample Hertz and the language so in case
you want to change the language for
speech to text you can change the
language code here and rest of the code
will remain as it is so uh now to upload
this code there is one single change
they need to do which is first you need
to go to your tools then to boards then
into boards manager and here the search
for esp32 as of now the current version
I have installed is 2.0.9 but this code
will work only in esp32 boards package
version
1.0.6 so you need to downgrade this
package so I'll select 1.0.6 and click
on the install button if you don't do
that it will show a couple of compiling
errors so make sure you download it
before compiling and uploading Okay so
successfully install 1.0.6 version I'll
click on the close button and here I'll
select the right board which is esp.okt
or V1 right com port and I'll straight
away hit the upload button
okay so the code is successfully
uploaded I'll open the serial monitor
and
okay so it says recording completed now
processing no problem so I'll press the
reset button and as soon as I press the
reset button the CL monitor will say
record start and after that we need to
you know speak anything that we want to
convert in text format let me show you a
quick demo so I'll press the reset
button and
hello my name is Sachin
okay says recording completed now
processing let's just wait for the
result
okay so here is the Json formatted data
that we got and here is the text format
of my speech which is hello my name is
Sachin and it recognized the word
because Sachin is an Indian a name okay
so as I put the uh language as Indian
English it recognized my name completely
right correct and the confidence level
is 0.92 okay so it is 92.99 accurate
okay so it is a very good number and we
got the result very accurate as well so
this is what I spoke let's just try it
once again so I'll press the reset
button and
hello this is Sachin how are you
let's wait
and once again hello this is Sachin how
are you okay so we got the exact text
format uh of our speech now here the
recording time as of now and now is
around 2.5 to 3 seconds so within three
seconds you need to complete your
statement now I try to increase this
time but uh when I increase it I was
getting a lot of uh like errors in the
code okay it is it was getting uploaded
but uh it was not working okay so
maximum I got off around 2.5 to 3
seconds and in this you need to say
whatever you want to like say or you can
give commands in three second quite easy
like turn on the light turn off the
light any command that you want to give
three second is more than enough time
for that okay so yeah this is how you
can convert any speech into text using
this code now there is only one single
issue in this code which is the time
boundation so any one of you watching
this video have any experience in like
converting speech to text using the
esp32 and if you know the solution about
how I can increase the time let limit
then do reach out to us via the comment
section and that will help me and also
to the community as I will be sharing
that code with all of you and as usual I
am sharing this code as well through my
GitHub repository host link you can find
in the description of this video and
yeah this was that last thing which you
need to learn before making our own
voice assistant based on chat GPT and
now we are left with the last video of
this whole series in which we'll be uh
filtering out the text coming from this
code will be giving that text data to
chat GPT API and then we'll be getting
the answers in the text format which
will convert into speech using TTS
service and in the end you'll be able to
listen the answer to the speakers so in
the last video we'll be making that
complete Voice Assistant based on
charity so do hit the Subscribe button
if you don't want to miss out that
Amazing Project which will coming soon
on our channel so yeah that was it about
this video I hope you find this speech
to text thing interesting and useful and
if it's a so well do hit the like button
which will tell you to algorithm that
this video was worth watching and it
will share with other viewers as well
and yeah that being said I am just
ending this video here and now just wait
for my next video another explore learn
share with me techie SMS
foreign
[Music]
関連動画をさらに表示
3. OpenAI API Python - Earnings Call Summarization
Build Talking AI ChatBot with Text-to-Speech using Python!
OMG 🔥 Powerful Speech Analysis - Video Audio To Text Converter | Transcribe Video Audio To Text
Arduino To ESP32: How to Get Started!
End To End Youtube Video Transcribe Summarizer LLM App With Google Gemini Pro
AutoGen Quickstart 🤖 Build POWERFUL AI Applications in MINUTES
5.0 / 5 (0 votes)