Voice Cloning in ElevenLabs vs. Descript

Excelerator
19 Oct 202307:25

TLDRThe video compares voice cloning technologies between ElevenLabs and Descript, two popular platforms. It explores the process of creating a cloned voice for text-to-speech purposes, highlighting the ease of use, required audio length, and the quality of the synthesized voice. The reviewer tests both platforms by uploading a 7-minute audio clip and a 1-minute script, noting the differences in authorization and training processes. While both platforms have their merits, the video invites viewers to consider their own needs and preferences when choosing a voice cloning service.

Takeaways

  • 🎤 Voice cloning technology allows users to record or upload audio for AI to learn their voice for future text-to-speech use.
  • 📱 ElevenLabs and Descript are two popular platforms offering voice cloning services, each with their own pricing and features.
  • 🚀 ElevenLabs has recently updated its voice cloning AI to be faster, easier, and of better quality.
  • 💰 To use voice cloning on ElevenLabs, a subscription of at least $5 per month is required.
  • 📂 For ElevenLabs, users need to upload an audio file of at least one minute in length to clone their voice.
  • 🗣️ After uploading, ElevenLabs generates a voice model that can be used to synthesize speech from typed text.
  • 🎧 The synthesized speech can be played directly within the ElevenLabs platform.
  • 🌟 Descript's new AI speaker technology claims to significantly reduce the amount of audio needed for voice cloning and promises improved quality.
  • 🔍 Users must record a specific script provided by Descript for authorization and training purposes.
  • 🔄 Descript requires that the uploaded audio for voice cloning matches the authorization script, limiting flexibility in audio selection.
  • 💬 Both ElevenLabs and Descript offer realistic AI voices, but there may be differences in the naturalness and emotive quality of the synthesized speech.

Q & A

  • What is voice cloning and how does it work?

    -Voice cloning is a technology that allows an AI to learn a person's voice from an audio recording. Once the AI has learned the voice, it can be used for text-to-speech, generating audio that sounds as if the person had spoken the text at that time.

  • Which software is mentioned in the transcript for voice cloning?

    -The transcript mentions two software applications for voice cloning: ElevenLabs and Descript.

  • What is the minimum subscription required to use voice cloning in ElevenLabs?

    -To use voice cloning in ElevenLabs, you need at least a $5 per month plan.

  • How long does the audio file need to be for ElevenLabs to clone a voice?

    -ElevenLabs requires an audio file that is at least one minute long to clone a voice. Going over five minutes does not significantly help the process.

  • What is the process for creating a voice clone in Descript?

    -In Descript, you need to record or upload a one-minute audio sample. The system then processes this within a couple of minutes to create your voice clone.

  • What are some limitations encountered when trying to upload an audio file to Descript for voice cloning?

    -Descript limits the audio file to under two minutes for the authorization process. Additionally, the file must be a recording of the specific script provided by Descript for authorization and training.

  • How does the AI voice generated by ElevenLabs compare to the original voice?

    -The AI voice generated by ElevenLabs is very realistic and closely resembles the original voice, although there might be slight differences in pacing and emphasis.

  • What are some additional features that Descript offers?

    -Descript offers features such as editing video by editing text and an eye contact feature that is considered impressive.

  • What is the narrator's opinion on the usability of voice cloning technology as it stands now?

    -The narrator finds the voice cloning technology usable but notes that there are areas for improvement, such as the pacing and emphasis in the generated speech.

  • How does the narrator suggest one can support their content if they find it helpful?

    -The narrator suggests that if the content is found helpful, one can support it by hitting the Subscribe button.

  • What is the narrator's affiliation with ElevenLabs and Descript?

    -The narrator is an affiliate for both ElevenLabs and Descript, which means they may receive a small commission if a purchase is made through their links.

  • What are the narrator's final thoughts on voice cloning with ElevenLabs and Descript?

    -The narrator acknowledges that while neither application may be perfect, both ElevenLabs and Descript offer useful features and the ability to create realistic AI voices, especially considering the low cost with ElevenLabs.

Outlines

00:00

🎤 Voice Cloning Technology and 11 Labs

This paragraph discusses the concept of voice cloning, where an AI learns your voice from a recording, allowing you to generate text-to-speech audio in your own voice. The focus is on testing the usability of this technology with 11 Labs, a popular app that recently improved its voice cloning AI for better performance. The user attempts to clone their voice by uploading a 7-minute podcast recording, following the app's requirement of at least a minute-long audio. They explore the text-to-speech feature by typing a line and generating the corresponding audio, noting minor issues with pacing and emphasis but overall deeming it usable. The paragraph also touches on the pricing model of 11 Labs, which requires a subscription starting at $5 per month.

05:01

💬 Comparing Voice Cloning Technologies: 11 Labs vs Descript

The second paragraph continues the exploration of voice cloning by comparing 11 Labs with Descript, another service offering similar technology. The user encounters challenges when trying to use Descript, such as the restriction on recording length and the requirement to record a specific script for authorization. They note that the only acceptable audio file for training the AI is one where the user reads the provided script, which was not immediately apparent. The paragraph also discusses the user's experience with the longer script provided by Descript, highlighting issues with the naturalness of the generated speech, such as overly long gaps and lack of emotional inflection. The user concludes by acknowledging the useful features of both applications, despite the imperfections in voice cloning, and invites the audience to share their thoughts on the technology. Additionally, the user provides affiliate links for both services in the video description.

Mindmap

Keywords

💡Voice Cloning

Voice cloning is a technology that enables the recording or uploading of an individual's voice, which is then learned by an AI system. This allows for the creation of synthetic audio in which the AI mimics the user's voice, making it seem as if the user is speaking words that were typed into the system. In the video, the main theme revolves around testing this technology in two platforms, ElevenLabs and Descript, to evaluate their usability and quality.

💡ElevenLabs

ElevenLabs is one of the early popular apps offering voice cloning technology. It requires a subscription plan starting at $5 per month. The app has recently updated its voice cloning AI to be faster, easier, and better. Users can clone their voice by uploading an audio file of at least one minute in length, and the AI system then generates a voice model that can be used for text-to-speech synthesis.

💡Descript

Descript is another platform mentioned in the video that offers voice cloning technology. It has recently announced improvements to its AI speaker technology, allowing for faster and better quality voice cloning. The process involves recording a script provided by Descript, which simultaneously authorizes and trains the AI system. However, it appears that the only audio that can be used for training is the specific script provided by Descript.

💡Text-to-Speech

Text-to-speech (TTS) is the process by which a computer system converts written text into spoken words. In the context of voice cloning, TTS is used to generate synthetic audio where the AI speaks the typed text in the user's cloned voice. This technology is useful for creating personalized audio content without the need for the user to physically speak the words.

💡AI Learning

AI learning refers to the process by which artificial intelligence systems acquire new knowledge or skills, typically through the analysis of data or user interactions. In voice cloning, AI learning involves the system's ability to analyze and replicate the unique characteristics of an individual's voice, such as tone, pitch, and speaking style.

💡Instant Voice Cloning

Instant voice cloning is a feature that allows for the rapid creation of a voice clone with minimal input, such as a short audio recording. This is in contrast to older methods that required longer recordings and more processing time. The video discusses how ElevenLabs and Descript have improved their voice cloning AI to offer instant or near-instant voice cloning capabilities.

💡Speech Synthesis

Speech synthesis is the process of generating human-like speech from text or other input data. It is a key component of voice cloning technology, as it allows the AI to produce speech that sounds like the cloned voice. The video demonstrates how speech synthesis is used in both ElevenLabs and Descript to create synthetic audio from typed text.

💡Audio File

An audio file is a digital file format that stores audio data as a series of waveforms representing the sound. In the context of voice cloning, audio files are essential as they contain the recorded voice samples that the AI system analyzes and uses to learn the user's voice characteristics.

💡Subscription Plan

A subscription plan is a type of service model where users pay a recurring fee to access a product or service. In the case of ElevenLabs, users must subscribe to a plan, with a minimum cost of $5 per month, to use the voice cloning feature. Subscription plans are common in software and app services, allowing for continuous access to updated features and functionality.

💡Authenticating

Authenticating refers to the process of verifying the identity or legitimacy of something or someone. In the context of the video, authenticating a voice clone involves the user reading a provided script to both train the AI and confirm that the user has the rights to use the voice being cloned.

💡Content Creation

Content creation is the process of producing and sharing various forms of content, such as videos, podcasts, or written articles, for the purpose of communication, education, or entertainment. In the video, content creation is discussed in the context of using voice cloning technology to generate personalized audio content without physically speaking the words.

Highlights

Voice cloning technology allows users to record or upload audio for AI to learn their voice for future text-to-speech purposes.

11 Labs is a popular app offering voice cloning technology and has recently improved its AI to be faster, easier, and better.

To use voice cloning on 11 Labs, a subscription of at least $5 per month is required.

For 11 Labs, an audio file of at least one minute is needed for voice cloning, with no need for excessively long recordings.

11 Labs' voice cloning process involves uploading audio, naming the clone, and waiting for the AI to process the voice.

The speech synthesis feature in 11 Labs allows users to type text which is then converted into audio in the user's voice.

Descript's new AI speaker technology claims to clone voices faster and with better quality than before, requiring only a minute of recording for setup.

Descript's voice cloning process involves reading a provided script to authorize and train the AI simultaneously.

The audio file used for training in Descript must be the same as the one used for authorization, and it cannot exceed two minutes in length.

Both 11 Labs and Descript offer realistic AI voices, though 11 Labs' voice cloning may not be perfect.

Descript is known for features like video editing through text and impressive eye contact adjustments.

The reviewer found that the voice cloning in both 11 Labs and Descript produced passable results, but noted some issues with emphasis and pacing.

Despite minor drawbacks, both applications provide useful tools for voice cloning and other audio-related tasks.

The reviewer invites readers to share their thoughts and try out the services, noting that they are affiliate links for both 11 Labs and Descript.

The reviewer acknowledges that voice cloning technology is not without its imperfections but offers potential for interesting applications.

For those interested in exploring voice cloning, both 11 Labs and Descript provide accessible entry points with their respective offerings.