Transcribe Audio to Text for FREE | Whisper AI Step-by-Step Tutorial

Jennifer Marie
5 Nov 202308:29

Summary

TLDRIn this informative tutorial, Jennifer Marie introduces viewers to Whisper, a machine learning model developed by Open AI for speech recognition and transcription. The video demonstrates how to utilize Google Colaboratory to transcribe audio and video files to text for free, without installing any software on the user's computer. The process is explained step-by-step, from installing necessary components like Whisper and FFmpeg, to uploading and transcribing files in various languages. The tutorial also shows how to save transcriptions as .txt or .srt files, which can be used for purposes like YouTube subtitles. Jennifer emphasizes the time-saving benefits of this method, especially when compared to manual transcription, and encourages viewers to subscribe for more helpful content.

Takeaways

  • 📚 The video is a tutorial on how to use Whisper, a machine learning model by Open AI, for transcribing audio and video files to text.
  • 🌐 Whisper supports 99 languages and is completely free to use.
  • 💻 The tutorial demonstrates using Google Colaboratory (Colab) for transcription without installing software on your personal computer.
  • 🔍 You can access Google Colab with a Gmail account, and it allows you to run code directly in your browser.
  • 🔗 The process involves installing Whisper and FFmpeg within Google Colab to work with audio and video files.
  • 📁 It's important to save your transcribed files to your computer as Google Colab will delete the runtime files when the session ends.
  • ⏱️ The transcription process is relatively fast; a two-minute audio file was transcribed in 50 seconds.
  • 📈 The transcription includes punctuation, capitalization, and timestamps.
  • 📑 The output files include a .txt file for text and a .srt file for subtitles, which can be used on platforms like YouTube.
  • 🖥️ For video files, the process is similar, and the video demonstrates transcribing a 12-minute video file.
  • ⚙️ After the session, you must repeat the installation process for Whisper if you return to transcribe more files later.
  • 🚀 Despite the initial installation time, the transcription speed is significantly faster than manual methods, saving potentially hours of work.

Q & A

  • What is the main topic of Jennifer Marie's video?

    -The main topic of Jennifer Marie's video is how to convert audio and video files to text for free using a machine learning model called Whisper, created by Open AI.

  • Which platform does Jennifer recommend for transcribing files without installing software on your computer?

    -Jennifer recommends using Google Colaboratory within a Google Drive account to transcribe files without the need for installation on a local computer.

  • What is the Whisper machine learning model capable of?

    -Whisper is a speech recognition and transcription machine learning model that supports 99 languages and can convert audio or video files to text.

  • How many languages does Whisper support for transcription?

    -Whisper supports transcription in 99 different languages.

  • What is the first step to start using Google Colaboratory?

    -The first step is to open Google Drive, click on 'New', then 'More', and connect more apps by searching for and installing Colaboratory.

  • How long did it take to install Whisper and FFmpeg in the video?

    -It took approximately three minutes to install Whisper and FFmpeg in the video.

  • What hardware accelerator does Jennifer recommend changing to in Google Colaboratory for better performance?

    -Jennifer recommends changing the hardware accelerator from CPU to T4 GPU for better performance.

  • What are the file formats provided for the transcribed text?

    -The transcribed text is provided in .txt format for a plain text document and .srt format for subtitle files.

  • How long did it take to transcribe a two-minute audio file in the video?

    -It took 50 seconds to transcribe a two-minute audio file in the video.

  • Can the transcription process transcribe video files as well?

    -Yes, the transcription process demonstrated in the video can also transcribe video files in addition to audio files.

  • What is the process to rename a file in Google Colaboratory?

    -To rename a file in Google Colaboratory, hover the mouse over the file, click on the three dots icon, and then click 'Rename file'.

  • What is the benefit of using Whisper for transcription compared to manual transcription?

    -Using Whisper for transcription is significantly faster than manual transcription and can handle punctuation, capitalization, and timestamps automatically.

Outlines

00:00

📚 Introduction to Online Freelancing and Whisper AI

Jennifer Marie's channel focuses on online money-making strategies and work-from-home freelancing. The video introduces Whisper, a machine learning model for speech recognition and transcription by Open AI, which also created ChatGPT. Whisper is free and supports 99 languages. Instead of installing it on a computer, the tutorial uses Google Colaboratory within Google Drive, allowing users to run code in the browser. The process includes accessing Google Drive, installing Colaboratory, and setting up the environment to transcribe audio and video files.

05:01

🔍 Transcribing Audio and Video Files with Whisper and FFmpeg

The tutorial demonstrates how to transcribe an audio file using Whisper AI in Google Colab. It guides viewers on how to change the runtime type to utilize a GPU, install necessary tools, and upload files for transcription. The process automatically detects the language and transcribes the file with punctuation, capitalization, and timestamps. The video also shows how to download the transcription as a .txt or .srt file. Additionally, it covers transcribing a longer video file, highlighting the time efficiency and accuracy of the transcription process. The tutorial concludes with instructions on how to repeat the process for new transcriptions and encourages viewers to subscribe for more content.

Mindmap

Keywords

💡Transcription

Transcription is the process of converting spoken language into written form. In the video, it is the core method taught for making money online by turning audio and video files into text. It is used as a means for work-from-home freelancers to generate income and is demonstrated using the Whisper machine learning model.

💡Whisper

Whisper is a machine learning model developed by Open AI for speech recognition and transcription. It is the tool featured in the video that allows users to convert audio or video files into text for free and without any limit. It supports 99 languages, making it a versatile tool for multilingual transcription.

💡Open AI

Open AI is the organization responsible for creating both Whisper and ChatGPT. They are mentioned in the video as the developers of the technology that enables free and unlimited transcription services. Open AI represents the cutting-edge of AI technology in the context of the video.

💡Google Colaboratory

Google Colaboratory, often referred to as Colab, is a cloud-based platform that allows users to write and run code directly in their web browsers. In the video, it is used as an alternative to installing Whisper on a local computer, making the transcription process accessible to those with limited computational resources.

💡FFmpeg

FFmpeg is a free and open-source software project that can handle multimedia data. In the context of the video, FFmpeg is installed alongside Whisper in Google Colab to facilitate the processing of both audio and video files for transcription.

💡Hardware Accelerator

A hardware accelerator is a device that speeds up certain functions in a computer system. In the video, the runtime type in Google Colab is changed from CPU to T4 GPU to enhance the performance of the transcription process, illustrating the importance of computational power in handling machine learning tasks.

💡.srt File

An .srt file is a type of subtitle file used to provide captions for video content. In the video, it is mentioned as one of the output formats that can be generated from the transcription process, which can be uploaded to platforms like YouTube for accessibility purposes.

💡.txt File

A .txt file is a plain text file format used for storing written documents. In the video, it is one of the formats in which the transcribed text from audio or video files is saved. It is highlighted as a simple and accessible format for users to download and use.

💡Machine Learning

Machine learning is a type of artificial intelligence that allows software applications to become more accurate in predicting outcomes without being explicitly programmed to do so. Whisper, the focus of the video, is a machine learning model that has learned to transcribe speech accurately, which is central to the video's tutorial.

💡Freelancer

A freelancer is an individual who is self-employed and is not committed to a particular employer long-term. In the video, becoming a work-from-home freelancer is presented as a goal, with transcription services being one of the methods to achieve financial independence in this capacity.

💡Google Drive

Google Drive is a cloud storage service that allows users to store files on remote servers. In the video, it is used as the platform to access Google Colab and manage files for transcription. It represents a convenient and accessible way to handle digital files for the transcription process.

Highlights

Jennifer Marie's channel focuses on teaching online money-making and work-from-home freelancing strategies.

The tutorial introduces Whisper, a machine learning model for speech recognition by Open AI, creators of ChatGPT.

Whisper is capable of transcribing audio and video files into text for 99 languages without any cost or limits.

Google Colaboratory is used instead of installing Whisper on a local computer, allowing for browser-based code execution.

Access to Google Drive is required, which is free and accessible with a Gmail account.

Google Colaboratory can be installed as an app within Google Drive for easy access.

The hardware accelerator in Google Colab is changed to T4 GPU for better performance.

Whisper AI and FFmpeg are installed in Google Colab to handle audio and video files.

Files are uploaded to the left section in Google Colab for transcription.

It's important to save transcribed files to your computer as Google Colab runtime files are temporary.

The transcription process automatically detects the language and transcribes with punctuation and timestamps.

Transcripts can be downloaded in .txt or .srt formats for further use, like uploading to YouTube.

A two-minute audio file was transcribed in 50 seconds, showcasing Whisper's efficiency.

The transcription process includes renaming files in Google Colab for easier reference.

A 12-minute video file was transcribed in just two minutes, highlighting the speed of Whisper.

The process needs to be repeated each time a new transcription session is started in Google Drive.

Whisper's installation in Google Colab takes around three minutes but offers significant time savings over manual transcription.

The tutorial encourages viewers to subscribe for more content and to ask questions in the comments section.

Transcripts

play00:00

Hello, everyone, and welcome back  to my channel, Jennifer Marie,

play00:03

where I teach you different  ways to make money online

play00:06

and how to become a work-from-home freelancer.

play00:08

So some of my most popular videos  talk to you about transcription,

play00:12

how to transcribe audio to  text, and in today's tutorial,

play00:16

I'm excited to show you how  you can convert audio files

play00:20

or video files to text completely  for free without any limit.

play00:25

We are going to be using something called Whisper,

play00:28

and Whisper is a machine learning model

play00:30

for speech recognition and transcription.

play00:33

And it's created by Open AI.

play00:35

Open AI are also the creators of ChatGPT.

play00:38

This is completely free, and  Whisper supports 99 languages,

play00:43

so you can convert audio or video files to text

play00:46

in 99 different languages using this method.

play00:49

Now there is a way that you can  install this on your computer.

play00:52

But I know a lot of you don't have  really fast, powerful computers.

play00:57

So in this method, we will not  be installing it on our computer.

play01:00

Instead, we're going to use Google  Colaboratory within our Google Drive account.

play01:05

And this method allows you to write  and run code directly in your browser.

play01:09

So that way you could do this  if you're on your computer,

play01:12

your friend's computer, at work,

play01:14

because you're not installing  something on the computer itself.

play01:18

Okay, so first of all, let's open Google Drive.

play01:22

All you need is your Gmail  account to access Google Drive,

play01:25

and it's also free.

play01:27

Then you're going to click here on New.

play01:30

Then go down and click More.

play01:33

Then click Connect More Apps.

play01:35

So now we have to search for  the app that we want to install.

play01:39

So click on Search apps, and type in Colaboratory.

play01:45

And you're going to click on  the first one that pops up.

play01:48

And now just click Install.

play01:51

Then click Continue.

play01:53

They may ask you to sign in  with your Google account.

play01:55

So just click on your Google account,  and it will be installed instantly.

play02:01

So now just click Done and close  off the marketplace window.

play02:07

And now we have to open Google Colaboratory.

play02:09

So to open it, just click on new once again.

play02:13

Click on More, and it will show  up right here Google Colaboratory.

play02:18

So just click that.

play02:19

So I'm going to do a demo of how we can  transcribe an audio file and a video file.

play02:25

So first, we're going to transcribe an audio file.

play02:28

Double click where it says  Untitled to rename the file,

play02:32

but keep the extension as  it is and then press Enter.

play02:36

So now click on Runtime and  click Change runtime type.

play02:41

So we want to change the hardware  accelerator from CPU to T4 GPU,

play02:48

then click Save.

play02:50

So now we need to install Whisper AI and FFmpeg

play02:54

to be able to work with  both audio and video files.

play02:58

And remember, we are not  installing this on our computer 

play03:01

but instead in Google Colab.

play03:04

And this might seem complicated,  but just follow the instructions

play03:07

and you'll see how easy it is.

play03:10

So in the description below,  I have pasted this code.

play03:14

So go into the description below  and copy and paste this exact code.

play03:20

And you're going to paste  it in this field right here.

play03:23

Then click Run Cell on this icon to run the code.

play03:28

And this will go ahead and  install Whisper and FFmpeg.

play03:31

And it should only take a few minutes.

play03:34

You can see here it took three minutes to install.

play03:38

So now we're ready to upload our file on the left,

play03:41

click here on this folder icon.

play03:44

And what you're going to do is drag and drop

play03:47

your audio or video file into  this section here on the left.

play03:52

So this warning will pop up basically  telling you to save your files

play03:56

on your computer because the runtimes files

play03:59

will be deleted when this runtime is terminated.

play04:03

So once it's finished transcribing,

play04:04

and you're finished your session on Google Colab,

play04:08

it will erase this audio or video file.

play04:11

So now we want to get the text from this file.

play04:15

So click here on Code, and we're  going to insert this code here.

play04:21

Again, I have pasted this  code in the description below.

play04:25

So paste that in here and  then replace your file name

play04:29

with your exact file name including  the spaces and the extension.

play04:35

So in my case, it was  Corporate-Sample.mp3, then click Run Cell.

play04:44

So that will begin extracting  the text from the file.

play04:48

You can see it's automatically  detecting that this file is in English.

play04:53

And right here it is transcribing it perfectly

play04:56

with punctuation, capitalization,  and even with time stamps.

play05:01

So in our first demo, this  is around a two minute file.

play05:04

So we're going to see how long it  takes to transcribe a two minute file.

play05:08

You can see it took 50 seconds.

play05:11

So in order to download this  transcript, just wait a few seconds,

play05:15

and you will have a few different  options on the side here.

play05:19

So you can see here, there's a .srt  file, which is your typical subtitle file

play05:23

that you can upload to YouTube,  for example, and a .txt file.

play05:28

If these haven't popped up for you,  just click on the Refresh icon here.

play05:33

So to download any of these  files, let's try the .txt, one,

play05:38

just hover over it, and then click  on the icon here and click Download.

play05:43

And let's do the same for the subtitle file.

play05:46

And I'll show you what they look like.

play05:48

So this is the . txt file, and you  can see it's done an amazing job.

play05:52

This is perfect.

play05:54

There's punctuation; it's broken  up the sentences correctly,

play05:58

It's even used hyphens correctly.

play06:00

And if we open up the .srt file, you  can see it has done captions for us.

play06:06

So we could go ahead and upload this to YouTube.

play06:09

So now I want to quickly show you what  it's like when you upload a video file.

play06:13

And this video file is around 12 minutes long.

play06:17

So once again, you're going to drag and  drop your file over here on the left,

play06:22

and the file will start to upload.

play06:25

And once the file has finished uploading,  you'll see it in the list here.

play06:28

So you can see types of sentences.

play06:31

And once again, you're going to click on Code.

play06:34

And we're going to paste what we did before.

play06:37

Again, you can find this code  in the description below.

play06:41

So we have to replace your file name,

play06:44

and in this case, it's a really long file name.

play06:46

And I don't feel like typing it out.

play06:48

So I can actually rename it by  hovering my mouse over the file  

play06:52

and clicking on the three dots icon.

play06:55

And then click Rename file.

play06:57

I'm going to rename this to  Sentences, so it's easier.

play07:01

So now replace your file name with sentences.

play07:04

And remember to put the extension.

play07:06

It won't work if you don't put the extension,

play07:08

so Sentences.mp4 in this  case, and then click Run cell.

play07:15

So once again, it's going to begin transcribing.

play07:20

And it only took two minutes to  transcribe this 12 minute file.

play07:25

And if you know anything about transcription,

play07:27

you know it takes a long time  to manually type this out,

play07:31

especially considering it's added punctuation  and capitalization and everything else.

play07:36

So again, if you wait a few seconds,

play07:38

you can see here we can download  the .txt file or the .srt file.

play07:43

And it's done an amazing job so quickly.

play07:47

So you can go ahead and transcribe as  many files as you like using this method.

play07:52

Now once you're done your session  and you close Google Drive,

play07:56

when you open it again to transcribe,

play07:58

you'll have to repeat this process once again.

play08:01

So it does take around three  minutes or so to install Whisper,

play08:05

but it's definitely worth it,  considering how fast it transcribes.

play08:09

And if you were to do this  manually, it could take you hours.

play08:12

So I hope you guys enjoyed this tutorial.

play08:15

Make sure to subscribe to my channel  for more videos like this one.

play08:18

If you have any questions, feel free  to ask me in the comments section.

play08:21

I really hope you enjoy this.

play08:23

Let me know if it works for you.

play08:25

And I'll see you guys in my next tutorial!

Rate This

5.0 / 5 (0 votes)

Related Tags
Transcription TutorialFree ToolWork-from-HomeFreelancer TipsWhisper AISpeech RecognitionGoogle ColabOnline MoneySubtitle CreationText ConversionMachine Learning