Speech To Text using ESP32

techiesms

8 Jul 202313:56

Summary

TLDRThis tutorial video guides viewers on creating a standalone voice assistant using the ESP32 board and Google's speech-to-text API. It covers setting up a Google Cloud account, obtaining an API key, and writing Arduino code to convert speech to text. The video also discusses using an MEMS microphone and ESP32 for audio input, and hints at an upcoming project involving integrating the speech-to-text functionality with a chatbot API for a complete voice-controlled assistant.

Takeaways

😀 The video series on ESP32 has successfully demonstrated running a chatbot on an ESP32 board using chat APIs and listening to responses through a speaker.
🔍 The audience requested a standalone voice assistant based on chat GPT, which can take questions through a microphone and provide answers via a speaker.
🛠️ The presenter's team is working on creating a voice assistant and the first step is learning to convert speech to text using Google Cloud services.
📈 The video is sponsored by LTM, promoting their product, Altium 365, an electronics product design platform that facilitates PCB design, data management, and teamwork.
🔑 To convert speech to text, one must first obtain an API key from Google Cloud for the Speech-to-Text service, which involves creating a Google Cloud account and enabling the API.
💳 The Google Cloud account creation process includes providing business details and credit card information for verification, although no charges are made initially.
📝 The video provides a step-by-step guide on how to generate an API key, restrict its usage to the Speech-to-Text API, and integrate it into an Arduino code.
🔍 The code for speech-to-text conversion is explained, including modifications needed for WiFi credentials and API key, and the original code is available for reference.
🎙️ The hardware setup involves an ESP32 board and an MEMS microphone, with instructions provided for connecting these components.
📝 The code logic is divided into microphone and Google Cloud parts, with the microphone capturing audio in a digital format and Google Cloud processing it to return text.
🔄 The video mentions a limitation in the code regarding the maximum recording time for speech-to-text conversion, which is around 2.5 to 3 seconds, and seeks solutions from the audience.
🔗 The presenter encourages the audience to subscribe for the upcoming video that will demonstrate the creation of a complete voice assistant based on chat GPT.

Q & A

What is the main topic of the video?
-The main topic of the video is about converting speech to text using Google Cloud services for a standalone voice assistant based on chat GPT on an ESP32 board.
What is the purpose of creating a standalone voice assistant?
-The purpose is to enable users to ask questions directly to the device and listen to the answers through a speaker, without the need for manual text input.
Which platform is used for the speech to text conversion in the video?
-Google Cloud services are used for the speech to text conversion.
What is the initial credit provided for a new Google Cloud account?
-A new Google Cloud account is provided with an initial credit of up to 300 US dollars.
How long is the free trial period for the Google Cloud account?
-The free trial period for the Google Cloud account is 90 days from the day the account is created.
What is the name of the product sponsored in the video?
-The sponsored product is called 'LTM 365', an Electronics product design platform.
What does LTM 365 offer for PCB designing and project collaboration?
-LTM 365 offers PCB designing, project sharing for review, centralized cloud storage, component management, real-time supply chain data, and the ability to send designs to manufacturing units.
What is the limitation of the speech to text conversion code presented in the video?
-The limitation is the time constraint, where the speech needs to be completed within approximately 2.5 to 3 seconds for accurate conversion.
How can one access the free trial version of LTM 365 mentioned in the video?
-The free trial version of LTM 365 can be accessed through the link provided in the description of the video.
What is the next step after learning speech to text conversion in the video series?
-The next step is to create a complete voice assistant based on chat GPT, which will involve filtering the text, using chat GPT API, and converting the answers into speech using TTS service.
Where can the code for the speech to text conversion be found?
-The code can be found on the presenter's GitHub repository, the link to which is provided in the video description.