Create Your Own Speech-To-Text Service Using FasterWhisper
Summary
TLDRThis video script outlines the creation of a near real-time AI sales agent using open-source projects and Python. The project aims to automate outbound sales calls, leveraging the speed of the Gro AI API to minimize latency. The tutorial begins with setting up a speech-to-text server using the faster whisper model and proceeds to integrate this with an AI sales agent. The script details the design, workflow, and coding process, promising further videos on enhancing the agent with calendar services and CRM integration for a complete sales solution.
Takeaways
- 😀 The script introduces a new AI sales agent powered by Gro AI API, designed to simulate a real-time conversation with a potential customer over the phone.
- 🔍 The AI agent is tested in the script with a scenario where a person named Sam is contacted by Sally from an AI automation agency, discussing the implementation of an AI chatbot for a business.
- 💡 The video series aims to teach viewers how to create a near real-time, fast AI sales agent using Python and open-source projects for business sales calls.
- 🛠 The creator, Abijit, discusses his motivation for building a custom AI sales agent due to the lack of customization in existing AI sales agent services.
- 📈 The script mentions the use of 'faster whisper', an open-source project, for transcription, and 'sales GPT' for creating the AI sales agent, addressing the challenge of AI API latencies in real-time interactions.
- 🚀 The video outlines the design of the project, starting with the potential customer filling out a form on a business website, which triggers an N8N workflow to engage the AI sales agent.
- 🔗 The AI sales agent uses Trio for calling, and 'faster whisper' for transcribing audio to text, which is then used to formulate responses through AI API services.
- 📝 The script details the process of creating an STT (Speech to Text) server using the 'faster whisper' model, including code snippets and setup instructions.
- 🔧 The video provides a step-by-step guide on setting up the environment, installing necessary modules, and testing the transcription capabilities of the AI sales agent.
- 🔄 The script highlights the importance of managing the entire implementation process and suggests future integrations with calendar services and CRM systems.
- 📆 The video promises a follow-up part that will focus on integrating the local speech-to-text capability with the AI sales agent using Gro API for real-time communication.
Q & A
What is the purpose of the AI sales agent in the video script?
-The AI sales agent is designed to handle outbound sales calls and interact with potential customers in real-time, providing information about AI automation services and scheduling meetings.
Who is the speaker in the video script, and what is their role?
-The speaker is Abijit, who is presenting a tutorial on creating an AI sales agent using Python and open-source projects, aiming to simplify the latest tech innovations for business.
What is the main challenge Abijit faced while researching AI sales agent services?
-Abijit found that most AI sales agent services were good but lacked the customization he needed. This led him to consider creating his own AI sales agent using open-source libraries.
What is the 'faster whisper' project mentioned in the script, and why is it significant?
-The 'faster whisper' project is a reimplementation of the OpenAI Whisper model, which is a speech-to-text engine. It is significant because it offers faster inference, making it suitable for real-time applications like the AI sales agent.
What is the role of the 'sales GPT' in the creation of the AI sales agent?
-The 'sales GPT' is another project that Abijit discovered, which is used in conjunction with the 'faster whisper' to create an AI sales agent capable of formulating responses and engaging in sales conversations.
How does the AI sales agent handle the transcription of audio input from a customer?
-The AI sales agent uses the 'faster whisper' project to transcribe the audio input from the customer during a phone conversation, converting the spoken words into text for further processing.
What is the significance of the 'Gro AI API' in the AI sales agent's functionality?
-The 'Gro AI API' is significant because it offers fast AI API inference, which helps the AI sales agent to respond quickly to customer inquiries, reflecting near real-time human-like behavior.
How does the AI sales agent integrate with calendar services like Calendly?
-If the AI sales agent agrees with the customer to proceed to the next step of the sales process, such as scheduling a meeting, it can use calendar services like Calendly to find available times and confirm with the user.
What is the purpose of the 'n10 workflow' in the script?
-The 'n10 workflow' is used to trigger the AI sales agent when a potential customer submits a form on the business website. It helps in automating the process of initiating contact with the customer.
What are the technical requirements for running the 'faster whisper' model on a local system?
-The 'faster whisper' model can run on systems with moderate specifications, including CPU-based systems. However, for those with an Nvidia GPU, specific DLL files need to be downloaded and placed in the system32 folder for proper functioning.
How can viewers get access to the code used in the video?
-Viewers can access the code used in the video by joining the Discord and Patreon communities, where all the code will be made available for free.
Outlines
🤖 Introduction to AI Sales Agent and Project Overview
The script introduces an AI sales agent powered by Gro AI API and demonstrates a mock telephone conversation with the agent. The agent, named Sam, is contacted by Sally from an AI automation agency regarding the implementation of an AI chatbot for a customer-facing website. They discuss scheduling a meeting to explore the use case further. The video series aims to teach viewers how to create a near-realtime AI sales agent using Python and open-source projects for outbound sales calls. The presenter, Abijit, explains his motivation for creating a custom AI sales agent due to the lack of desired customization in existing products. He mentions previous work on a voice assistant and the challenges faced with transcription speed. The introduction of 'faster whisper' and 'sales GPT' projects as potential solutions is highlighted, along with the use of Grock API to address latency issues in AI API inference.
🛠️ Designing the AI Sales Agent Workflow and Technical Setup
This paragraph delves into the design of the AI sales agent project, starting with the customer's interaction on a business website where they fill out a contact form. The form submission triggers an N8N workflow, which in turn calls the AI sales agent hosted on a server. The AI sales agent makes a call to the customer using Trio, a phone number calling service, and transcribes the conversation using the 'faster whisper' project. The transcribed text is then used to fetch information from a knowledge base and formulate a response via an AI API service. If a meeting is scheduled, the agent can integrate with calendar services like Calendly to book appointments and send confirmation emails to customers. The paragraph outlines the complexity of the project and the plan to build the AI sales agent step by step, starting with the creation of the speech-to-text (STT) server and the basic structure of the AI sales agent in subsequent videos.
🔧 Developing the Speech-to-Text Server and Transcriber Class
The script describes the process of creating a speech-to-text server using the 'faster whisper' model. It details the creation of a 'transcriber' class with methods to handle recording and transcription. The class initializes with a specified model size and includes functionality to start and stop recording based on keyboard input. The recorded audio is saved temporarily and then transcribed using the Whisper model. The transcription is printed out, demonstrating the accuracy of the 'faster whisper' model in converting spoken English to text. The video concludes with instructions on setting up the development environment, installing necessary modules, and testing the STT server with a sample audio recording. The successful transcription of the test recording confirms the server's functionality, setting the stage for integrating this capability with an AI sales agent in future videos.
Mindmap
Keywords
💡AI Sales Agent
💡Gro AI API
💡Transcription
💡Faster Whisper
💡Sales GPT
💡N8N Workflow
💡Calendly
💡No-Code Implementation
💡Trio
💡Knowledge Base
Highlights
Introduction of an AI sales agent powered by Gro AI API for business automation.
AI sales agent's capability to confirm user's interest in AI automation for their business through a simulated phone call.
The founder's intent to implement an AI chatbot for customer-facing websites.
Proposal for a one-on-one meeting to discuss the implementation of an AI chatbot further.
Overview of creating a near real-time AI sales agent using Python and open-source projects.
Challenges faced with existing AI sales agent services regarding customization.
Introduction of 'faster whisper', a project for faster speech-to-text transcription.
The use of 'sales GPT' to enhance the AI sales agent's capabilities.
Addressing the issue of AI API latencies affecting the agent's real-time performance.
Designing the AI sales agent to handle customer inquiries and sales processes autonomously.
Utilization of N1 workflow to manage customer data and trigger AI sales agent interactions.
Integration of Trio, a phone number calling service, for the AI sales agent to contact customers.
Use of the faster whisper project for real-time transcription of customer audio input.
The AI sales agent's process of formulating responses using AI API services and text-to-speech APIs.
Potential for the AI sales agent to schedule meetings and integrate with calendar services like Calendly.
The plan to send meeting invites and close the sales process by emailing customers through N1 workflow.
Explanation of the technical setup for the speech-to-text server using the faster whisper model.
Demonstration of the transcriber class implementation for real-time audio transcription.
Testing the speech-to-text server with keyboard input to simulate a phone conversation.
Success in transcribing English speech with high accuracy using the faster whisper model.
Upcoming integration of the local speech-to-text capability with an AI sales agent using Gro API.
Promise of future videos detailing improvements to make the AI sales agent production-ready.
Transcripts
and there you go our AI sales agent is
ready with Gro AI API so let's test it
I'll test it as if I'm am talking over
telephone with an AI sales agent hi it's
Sam speaking hello Sam this is Sally
from no to together an AI automation
agency I believe you filled out the
contact form on our website may I
confirm if you are the right person to
speak to regarding AI automation for
your business I'd love to understand
your company's use case for AI yeah I am
the right person the founder of work
together I want to implement an AI
chatboard for our customer facing
website Hi Sam it's great to speak with
the founder of work together I
understand you're interested in
implementing an AI chatbot for your
customer facing website would you be
interested in scheduling a one-on-one
meeting with me to discuss this
further yeah sure Fantastic Sam I'll
send you a calendly meeting invite to
your email address thank you for calling
Sam I hope you have a good day hi
everyone in this video series you are
going to learn how you can create a near
realtime fast AI sales agent using
Python and several other open- source
project that can be used by your
business for making outbound sales call
completely using AI if you're new to
this channel my name is abijit you're
watching not together where I try to
simplify latest take and Innovation for
your future and your business let's get
started in a very recent days I have
been checking a lot lot of the AI SAS
sales agent service and while most of
them are really good in terms of what
they offer but I couldn't find all the
customization need that I have in one
single product and as a result I started
thinking of creating my own AI sales
agent by maybe using some open source
libraries now I have already covered a
tutorial about how to create a voice
assistant in this channel before if you
haven't seen that project yet click the
link on the right top and check out that
project however the problem with that
project was the transcription process
was really really slow especially when
it comes to running this in my local
system so then I started doing my own
research and then I found out this
project called faster whisper and also I
found another cool project called sales
GPT to create your AI sales agent
looking through the code for this
project I had the perfect sketch of the
product that I want to build but then
again there was another challenge one of
the biggest challenge of creating an AI
agent to reflect a near realtime
humanlike behavior is that the agent
could become slow because of AI API
latencies when a lot of token needs to
be processed as the conversation
continues to grow with the potential
customer through the telephonic channel
right fortunately we have grock API
released very recently which has really
amused me with the kind of speed that it
offers with AI API inference and that
could certainly solve this problem so
then I started designing the project
into paper from my mind in this video
let's first discuss the design of the
whole project and then we would start
putting up the code module by module and
in this way you would understand how I
approach such problems and break down
the problems and probably try to create
widescale applications of course if you
are a no code person and you want to
create a similar application using
completely no code approach please write
down in the comment I'll definitely
bring a no code implementation of this
same project with that said let's dive
in okay so let's start with the design
so this is the basically potential
customer which would come to the
Business website for example in this
case in our website and then here he
will fill up the form like a contact us
form he'll put up his phone number his
requirement and everything else and once
he submit that information that will
probably call an n80 workflow using an
wook now the reason I'm using N1 because
I wanted to reuse a lot of the open
source project that I already run as
part of my other workflow if you want to
know more about n10 I have already
covered about n10 previously in this
channel I'll attach the link in the
description now once the workflow is
triggered with all the details that user
have put in the n10 will then make a
call to the AI sales agent which is
supposedly in another hosted server it
could be my local machine it could be
another hosted server now this AI sales
agent is where all the code business
logic will be sitting right so the AI
sales agent what it will do is it will
make a call through Trio which is an
phone number calling service provider
using that it will call the customer
using the phone number that user have
provided in the form and once the
connection is established the AI sales
agent will get the user instruction
let's say as a recorded audio and then
it will use an STD server or pce to text
service and in this case we'll use the
faster whisper open source project to
transcribe the audio input that user
have provided through the telephonic
conversation now once we have once we
get the instruction from the user as a
text and then once the AI Sal agent has
got the transcribed instruction from the
user it can check the knowledge base and
get all the necessary information um and
then call the AI API service to
formulate a response as text and then it
will call the text to speech API to get
the respond in a audio format and then
it will send it back to the user through
the T service so next let's say as part
of this conversation AI sales agent is
able to you know agree with the customer
to go to the next step of the sales
process which is to let's say book a
meeting then it can use some tools to
call let's say some calendar service
like calendly or tidal or similar
service and get the free calendar time
that's available confirm with user and
then it will can call na10 with the
output from the AI sales agent as well
as because na10 already have got the
email address and everything as part of
the form it can send the email to the
customer and able to close down this
part of the sales process right as you
can see this entire implementation is
really going to be long and needs to be
managed properly so in this video what
we are going to focus is by is to create
the code around the STD server the AI
Sal Sal agent maybe the very basic
structure of the AI sales agent so that
we can leverage that and then build on
top of it to add different tools like
calendar service or maybe add another
rule to let's say connect with a CRM
system so we could do all that things so
that we will do later let's first create
the STD server process and the AI sales
agent basic structure in this video and
then in the further video we will try to
integrate with na10 and all the other
services yeah deal let's go ahead and
get into my vs code terminal and let's
start writing the code okay before going
to the vs code terminal let's take a
quick look to the faster whisper project
so faster whisper is a reimplementation
of open AI whisper model which is a
really good spe to text engine now it's
a fast inference engine and somewhere it
says that it is like four time faster
with like less memory which is really
good and that's what we are going to use
in this project as well now they have
different model they have large V2 large
V3 they have some smaller models which
can also be run on CPU so if you have a
really moderate PC or a CPU based PC you
can still run the F office per in your
local system uh and you know follow
along this tutorial now one of the
important other thing that I want to
mention is this two Library which you
need if you have if you especially have
GPU or Nvidia graphics card now because
I'm running Windows the problem is that
the Cub less module is currently not
available for Windows so as a workaround
what you can do is you can click here
and you can go to this single archive
and from here you can download this
Windows v27 zip file or even you can
download the V4 that's not a problem and
you will get a bunch of dll files that
you need to copy into your system 32 so
you go to run and then you go to system
32 folder and basically just paste those
dll files as you have downloaded from
this link again this is only if you have
GPU or Nvidia GPU otherwise you don't
have to worry if you are running on CPU
okay now with that said let's go back to
our vs code terminal so let's create a
folder name it AI sales agent and we
will go inside the AI sales agent and
based on the design we will first focus
on creating this STD server okay the spe
to text server using the fer model and
we create a file let's name it stt
server.py let's import sound device numi
Pi input
um P input will be used for for our
testing through keyboard uh we need a
temp file and then we are also importing
the faster whisper now let's create a
class
transcriber and the first thing that we
are going to do is we are going to
define the init method here you will
specify the model size we are using
large V3 because I am running a 16gb RAM
and 12gb graphics card but if you are
running a a more moderate machine you
can use large V to or bunch of other
small models that I have shown you here
yeah it's just like a Constructor we are
just initializing The Whisper model and
we are setting the is recording as false
at the moment the reason is because we
want to first test this stt server let's
say using keyboard input so what we want
to do is whenever let's say a certain
key is pressed in our keyboard then the
recording will be started and then we
will do the transcription so let's add
that code so we are basically saying if
we press the space key then the
recording will be enabled and then to
disable the recording we are defining
another method on release space of
course this function is not defining
whether key is pressed or key is
released and that's what we are going to
control using the record audio method
basically what we are trying to do here
is we are saying whenever it is pressed
then call on press space and when it is
released right the the keyboard listener
is released then you call then we call
on release space right that mean when
the key is pressed it will start doing
the recording if you release the space
uh the recording will be stopped so when
the recording is enabled that means when
the KE is pressed it will do the
recording then it will use the numai
module to like store the recording that
we are going to use and in the end it
will return the recording so now that we
have created the recording we need to
save the recording in a temporary file
so that's what we are doing here is we
are just creating a temporary file
and just saving that recording in the
temporary file using the right function
now that our recording is completed what
we want to do is we want to do the
transcription so here what we are
basically going to do is we are going to
take the file path of the temporary
recording file and then it's going to
use the model you know the whisper model
and call the transcribe method and
basically do the transcription you can
actually see the code samples in this
GitHub repository to understand how it
is all working basically I have taken
reference of this code and then applied
in here the transcribe audio will return
the full transcription and then we are
finally writing our run method where the
actual control will be there it's like a
controller so basically we're saying is
hold the space bar to start talking and
then it will call the record audio first
which will record the audio and then it
will save it in a file path which is a
temporary one and then it will call the
transcription using the file path and
then in the end I'm just going to you
know print the transcription so now that
is all set we will write our main method
so in our main method we are
initializing the faster whis whisper
transcriber as class and then we are run
we are executing the run method yeah so
this is the entire code so what you need
to do is create a requirement. txt file
and I'm going to update this requirement
txt and then what you need to do is to
create a python environment and use a
command like this to create the python
environment and then you can activate
now I have already done all the module
installations for but for you please
make sure you install all these
requirements now let's test this code so
what we will do is run python stt
server. hi my name is Sam and I am
testing the fostter whisper transcriber
project and I have created this code to
record a audio and transcribe it and it
has detected the language as English
with probability .99 that means it is
almost certain that it's an English and
here is the transcription it has got all
of my word properly so that means I can
use this transcript to further you know
use that instruction and call AI apis
and create you know sales agent to make
a voice assistant or a AI sales
assistant as we are trying to do here
right so let's proceed with the next
steps I am sorry to put an end to this
video here in the next part we are going
to integrate this local speech to text
capability with an AI sales agent using
Gro API and that will have ability to
make realtime communication with a
potential customer we are also going to
see some improvement techniques to make
the AI sales agent more production ready
so stay tuned for the video as I'll be
publishing that video within next 24
hours so if you haven't subscribed the
channel yet please do subscribe and turn
on the Bell notification and also make
sure to join the Discord and patreon
Community because all the code that I
have used in this video will be
available there for free and I really
hope that this video has brought some
value to you if yes don't forget to like
the video so that it can reach to other
people like you and take care I'll see
you in the next one
bye oh
تصفح المزيد من مقاطع الفيديو ذات الصلة
Using ChatGPT to Automate Cold Calling for Sales Leads [with Bland AI]
Build Talking AI ChatBot with Text-to-Speech using Python!
AutoGen Studio Tutorial - NO CODE AI Agent Builder (100% Local)
AutoGen Quickstart 🤖 Build POWERFUL AI Applications in MINUTES
I built 10 AI Tools, and am giving it for free!
Easy Way To Make Money Online If You’re Broke in 2024 ($21,495/Month)
5.0 / 5 (0 votes)