Transcribe Audio to Text for FREE | Whisper AI Step-by-Step Tutorial
Summary
TLDRIn this informative tutorial, Jennifer Marie introduces viewers to Whisper, a machine learning model developed by Open AI for speech recognition and transcription. The video demonstrates how to utilize Google Colaboratory to transcribe audio and video files to text for free, without installing any software on the user's computer. The process is explained step-by-step, from installing necessary components like Whisper and FFmpeg, to uploading and transcribing files in various languages. The tutorial also shows how to save transcriptions as .txt or .srt files, which can be used for purposes like YouTube subtitles. Jennifer emphasizes the time-saving benefits of this method, especially when compared to manual transcription, and encourages viewers to subscribe for more helpful content.
Takeaways
- 📚 The video is a tutorial on how to use Whisper, a machine learning model by Open AI, for transcribing audio and video files to text.
- 🌐 Whisper supports 99 languages and is completely free to use.
- 💻 The tutorial demonstrates using Google Colaboratory (Colab) for transcription without installing software on your personal computer.
- 🔍 You can access Google Colab with a Gmail account, and it allows you to run code directly in your browser.
- 🔗 The process involves installing Whisper and FFmpeg within Google Colab to work with audio and video files.
- 📁 It's important to save your transcribed files to your computer as Google Colab will delete the runtime files when the session ends.
- ⏱️ The transcription process is relatively fast; a two-minute audio file was transcribed in 50 seconds.
- 📈 The transcription includes punctuation, capitalization, and timestamps.
- 📑 The output files include a .txt file for text and a .srt file for subtitles, which can be used on platforms like YouTube.
- 🖥️ For video files, the process is similar, and the video demonstrates transcribing a 12-minute video file.
- ⚙️ After the session, you must repeat the installation process for Whisper if you return to transcribe more files later.
- 🚀 Despite the initial installation time, the transcription speed is significantly faster than manual methods, saving potentially hours of work.
Q & A
What is the main topic of Jennifer Marie's video?
-The main topic of Jennifer Marie's video is how to convert audio and video files to text for free using a machine learning model called Whisper, created by Open AI.
Which platform does Jennifer recommend for transcribing files without installing software on your computer?
-Jennifer recommends using Google Colaboratory within a Google Drive account to transcribe files without the need for installation on a local computer.
What is the Whisper machine learning model capable of?
-Whisper is a speech recognition and transcription machine learning model that supports 99 languages and can convert audio or video files to text.
How many languages does Whisper support for transcription?
-Whisper supports transcription in 99 different languages.
What is the first step to start using Google Colaboratory?
-The first step is to open Google Drive, click on 'New', then 'More', and connect more apps by searching for and installing Colaboratory.
How long did it take to install Whisper and FFmpeg in the video?
-It took approximately three minutes to install Whisper and FFmpeg in the video.
What hardware accelerator does Jennifer recommend changing to in Google Colaboratory for better performance?
-Jennifer recommends changing the hardware accelerator from CPU to T4 GPU for better performance.
What are the file formats provided for the transcribed text?
-The transcribed text is provided in .txt format for a plain text document and .srt format for subtitle files.
How long did it take to transcribe a two-minute audio file in the video?
-It took 50 seconds to transcribe a two-minute audio file in the video.
Can the transcription process transcribe video files as well?
-Yes, the transcription process demonstrated in the video can also transcribe video files in addition to audio files.
What is the process to rename a file in Google Colaboratory?
-To rename a file in Google Colaboratory, hover the mouse over the file, click on the three dots icon, and then click 'Rename file'.
What is the benefit of using Whisper for transcription compared to manual transcription?
-Using Whisper for transcription is significantly faster than manual transcription and can handle punctuation, capitalization, and timestamps automatically.
Outlines
📚 Introduction to Online Freelancing and Whisper AI
Jennifer Marie's channel focuses on online money-making strategies and work-from-home freelancing. The video introduces Whisper, a machine learning model for speech recognition and transcription by Open AI, which also created ChatGPT. Whisper is free and supports 99 languages. Instead of installing it on a computer, the tutorial uses Google Colaboratory within Google Drive, allowing users to run code in the browser. The process includes accessing Google Drive, installing Colaboratory, and setting up the environment to transcribe audio and video files.
🔍 Transcribing Audio and Video Files with Whisper and FFmpeg
The tutorial demonstrates how to transcribe an audio file using Whisper AI in Google Colab. It guides viewers on how to change the runtime type to utilize a GPU, install necessary tools, and upload files for transcription. The process automatically detects the language and transcribes the file with punctuation, capitalization, and timestamps. The video also shows how to download the transcription as a .txt or .srt file. Additionally, it covers transcribing a longer video file, highlighting the time efficiency and accuracy of the transcription process. The tutorial concludes with instructions on how to repeat the process for new transcriptions and encourages viewers to subscribe for more content.
Mindmap
Keywords
💡Transcription
💡Whisper
💡Open AI
💡Google Colaboratory
💡FFmpeg
💡Hardware Accelerator
💡.srt File
💡.txt File
💡Machine Learning
💡Freelancer
💡Google Drive
Highlights
Jennifer Marie's channel focuses on teaching online money-making and work-from-home freelancing strategies.
The tutorial introduces Whisper, a machine learning model for speech recognition by Open AI, creators of ChatGPT.
Whisper is capable of transcribing audio and video files into text for 99 languages without any cost or limits.
Google Colaboratory is used instead of installing Whisper on a local computer, allowing for browser-based code execution.
Access to Google Drive is required, which is free and accessible with a Gmail account.
Google Colaboratory can be installed as an app within Google Drive for easy access.
The hardware accelerator in Google Colab is changed to T4 GPU for better performance.
Whisper AI and FFmpeg are installed in Google Colab to handle audio and video files.
Files are uploaded to the left section in Google Colab for transcription.
It's important to save transcribed files to your computer as Google Colab runtime files are temporary.
The transcription process automatically detects the language and transcribes with punctuation and timestamps.
Transcripts can be downloaded in .txt or .srt formats for further use, like uploading to YouTube.
A two-minute audio file was transcribed in 50 seconds, showcasing Whisper's efficiency.
The transcription process includes renaming files in Google Colab for easier reference.
A 12-minute video file was transcribed in just two minutes, highlighting the speed of Whisper.
The process needs to be repeated each time a new transcription session is started in Google Drive.
Whisper's installation in Google Colab takes around three minutes but offers significant time savings over manual transcription.
The tutorial encourages viewers to subscribe for more content and to ask questions in the comments section.
Transcripts
Hello, everyone, and welcome back to my channel, Jennifer Marie,
where I teach you different ways to make money online
and how to become a work-from-home freelancer.
So some of my most popular videos talk to you about transcription,
how to transcribe audio to text, and in today's tutorial,
I'm excited to show you how you can convert audio files
or video files to text completely for free without any limit.
We are going to be using something called Whisper,
and Whisper is a machine learning model
for speech recognition and transcription.
And it's created by Open AI.
Open AI are also the creators of ChatGPT.
This is completely free, and Whisper supports 99 languages,
so you can convert audio or video files to text
in 99 different languages using this method.
Now there is a way that you can install this on your computer.
But I know a lot of you don't have really fast, powerful computers.
So in this method, we will not be installing it on our computer.
Instead, we're going to use Google Colaboratory within our Google Drive account.
And this method allows you to write and run code directly in your browser.
So that way you could do this if you're on your computer,
your friend's computer, at work,
because you're not installing something on the computer itself.
Okay, so first of all, let's open Google Drive.
All you need is your Gmail account to access Google Drive,
and it's also free.
Then you're going to click here on New.
Then go down and click More.
Then click Connect More Apps.
So now we have to search for the app that we want to install.
So click on Search apps, and type in Colaboratory.
And you're going to click on the first one that pops up.
And now just click Install.
Then click Continue.
They may ask you to sign in with your Google account.
So just click on your Google account, and it will be installed instantly.
So now just click Done and close off the marketplace window.
And now we have to open Google Colaboratory.
So to open it, just click on new once again.
Click on More, and it will show up right here Google Colaboratory.
So just click that.
So I'm going to do a demo of how we can transcribe an audio file and a video file.
So first, we're going to transcribe an audio file.
Double click where it says Untitled to rename the file,
but keep the extension as it is and then press Enter.
So now click on Runtime and click Change runtime type.
So we want to change the hardware accelerator from CPU to T4 GPU,
then click Save.
So now we need to install Whisper AI and FFmpeg
to be able to work with both audio and video files.
And remember, we are not installing this on our computer
but instead in Google Colab.
And this might seem complicated, but just follow the instructions
and you'll see how easy it is.
So in the description below, I have pasted this code.
So go into the description below and copy and paste this exact code.
And you're going to paste it in this field right here.
Then click Run Cell on this icon to run the code.
And this will go ahead and install Whisper and FFmpeg.
And it should only take a few minutes.
You can see here it took three minutes to install.
So now we're ready to upload our file on the left,
click here on this folder icon.
And what you're going to do is drag and drop
your audio or video file into this section here on the left.
So this warning will pop up basically telling you to save your files
on your computer because the runtimes files
will be deleted when this runtime is terminated.
So once it's finished transcribing,
and you're finished your session on Google Colab,
it will erase this audio or video file.
So now we want to get the text from this file.
So click here on Code, and we're going to insert this code here.
Again, I have pasted this code in the description below.
So paste that in here and then replace your file name
with your exact file name including the spaces and the extension.
So in my case, it was Corporate-Sample.mp3, then click Run Cell.
So that will begin extracting the text from the file.
You can see it's automatically detecting that this file is in English.
And right here it is transcribing it perfectly
with punctuation, capitalization, and even with time stamps.
So in our first demo, this is around a two minute file.
So we're going to see how long it takes to transcribe a two minute file.
You can see it took 50 seconds.
So in order to download this transcript, just wait a few seconds,
and you will have a few different options on the side here.
So you can see here, there's a .srt file, which is your typical subtitle file
that you can upload to YouTube, for example, and a .txt file.
If these haven't popped up for you, just click on the Refresh icon here.
So to download any of these files, let's try the .txt, one,
just hover over it, and then click on the icon here and click Download.
And let's do the same for the subtitle file.
And I'll show you what they look like.
So this is the . txt file, and you can see it's done an amazing job.
This is perfect.
There's punctuation; it's broken up the sentences correctly,
It's even used hyphens correctly.
And if we open up the .srt file, you can see it has done captions for us.
So we could go ahead and upload this to YouTube.
So now I want to quickly show you what it's like when you upload a video file.
And this video file is around 12 minutes long.
So once again, you're going to drag and drop your file over here on the left,
and the file will start to upload.
And once the file has finished uploading, you'll see it in the list here.
So you can see types of sentences.
And once again, you're going to click on Code.
And we're going to paste what we did before.
Again, you can find this code in the description below.
So we have to replace your file name,
and in this case, it's a really long file name.
And I don't feel like typing it out.
So I can actually rename it by hovering my mouse over the file
and clicking on the three dots icon.
And then click Rename file.
I'm going to rename this to Sentences, so it's easier.
So now replace your file name with sentences.
And remember to put the extension.
It won't work if you don't put the extension,
so Sentences.mp4 in this case, and then click Run cell.
So once again, it's going to begin transcribing.
And it only took two minutes to transcribe this 12 minute file.
And if you know anything about transcription,
you know it takes a long time to manually type this out,
especially considering it's added punctuation and capitalization and everything else.
So again, if you wait a few seconds,
you can see here we can download the .txt file or the .srt file.
And it's done an amazing job so quickly.
So you can go ahead and transcribe as many files as you like using this method.
Now once you're done your session and you close Google Drive,
when you open it again to transcribe,
you'll have to repeat this process once again.
So it does take around three minutes or so to install Whisper,
but it's definitely worth it, considering how fast it transcribes.
And if you were to do this manually, it could take you hours.
So I hope you guys enjoyed this tutorial.
Make sure to subscribe to my channel for more videos like this one.
If you have any questions, feel free to ask me in the comments section.
I really hope you enjoy this.
Let me know if it works for you.
And I'll see you guys in my next tutorial!
Browse More Related Video
Usare l’AI per prendere appunti da qualsiasi video (TUTORIAL)
Transcribe Audio Files with OpenAI Whisper
AUTOMAÇÃO - COMO TRANSCREVER ÁUDIO DO WHATSAPP PARA TEXTO - MAKE INTEGROMAT
Tutorial passo a passo - Resumir Áudio
End To End Youtube Video Transcribe Summarizer LLM App With Google Gemini Pro
HOW TO Make a Bootable Windows 7/10 USB using RUFUS w/ ENGLISH SUBTITLE
5.0 / 5 (0 votes)