Speech to Speech is HERE and it’s EPIC! Latest AI Feature from ElevenLabs Blows My Mind

Mike Russell
15 Nov 202305:32

TLDRElevenLabs' groundbreaking Speech to Speech feature is revolutionizing AI technology. It not only replicates your words but also captures the exact tone and emotion you desire. Users can clone voices or select from a vast library, and the AI will accurately reproduce speech with precision, offering a personalized and immersive experience. This tool is a game-changer for content creators, offering endless possibilities and a level of customization previously unattainable.


  • 🎉 ElevenLabs has introduced a new feature called Speech to Speech that allows users to input speech and have it repeated back in various voices.
  • 🎙️ The feature can replicate not only the words but also the tone and emotion of the original speech, making it highly customizable.
  • 📞 Users can select from ElevenLabs' own voices or use a cloned voice that they have created themselves.
  • 💻 The process is accessible through a laptop or any device with a microphone, making it user-friendly.
  • 🔊 There is an option to record audio within the Speech Synthesis panel, which then generates the speech in the chosen voice.
  • 🔗 A link is provided in the description for viewers to try out the feature themselves.
  • 📣 The text-to-speech feature is already of high quality, but the addition of speech-to-speech enhances the personalization aspect.
  • 🗣️ The accuracy of the speech replication, including the speaker's accent and emotion, is impressive and provides a more natural-sounding output.
  • 📈 ElevenLabs offers a variety of voices, including those with different accents, which can be tested and utilized by users.
  • 🧍‍♂️ The user demonstrated how the feature works with different voices, including a voice clone of the user himself.
  • 🤖 There was a mention of some digital AI glitching, but it is expected that the model will improve over time.
  • 🌐 The user encourages others to join ElevenLabs and share their creations made possible by the speech-to-speech feature.

Q & A

  • What is the main feature of ElevenLabs' latest AI update?

    -The main feature of ElevenLabs' latest AI update is speech to speech, which allows users to input their voice and have it repeated back in any voice, including cloned voices.

  • How does the speech to speech feature work?

    -The speech to speech feature works by allowing users to select a voice and then record their audio. The AI processes the recording and generates a response in the chosen voice, replicating the user's tone and emotion.

  • What are the advantages of using a cloned voice in speech to speech?

    -Using a cloned voice in speech to speech provides a personalized experience, allowing the user's voice to be used in various applications, such as radio station liners or announcements, with the desired tone and emotion.

  • Can the speech to speech feature mimic different accents?

    -Yes, the speech to speech feature can mimic different accents. The AI is capable of adjusting the voice to reflect the accent that is fed into it, as demonstrated by the user's attempt to input an American accent.

  • How does the accuracy and emotion of the speech to speech feature compare to traditional text to speech?

    -The speech to speech feature offers a higher level of accuracy and emotion compared to traditional text to speech. It captures the nuances of the user's voice, including tone and emotion, which results in a more natural and engaging output.

  • What is the process for using the speech to speech feature?

    -To use the speech to speech feature, users select 'speech to speech' in the Speech Synthesis panel, choose a voice, and then use the 'record audio' option to input their voice. After recording, they click 'Generate' for the AI to process and output the speech in the selected voice.

  • How does ElevenLabs ensure the quality of their AI-generated voices?

    -ElevenLabs ensures the quality of their AI-generated voices through continuous development and improvement of their models. They also provide options to adjust voice settings and models for better accuracy and performance.

  • What is the role of Mike Russell in the script?

    -Mike Russell is mentioned as an example of a person whose voice the user finds particularly appealing and effective. The user uses his name to demonstrate the capabilities of the speech to speech feature.

  • How does the user demonstrate the versatility of the speech to speech feature?

    -The user demonstrates the versatility of the speech to speech feature by using it with different voices, including a cloned voice, and by attempting to mimic different accents, showcasing the feature's ability to adapt to various speaking styles and accents.

  • What is the significance of the 'record audio' option in the Speech Synthesis panel?

    -The 'record audio' option in the Speech Synthesis panel is significant as it allows users to input their voice directly into the system. This is the first step in using the speech to speech feature, enabling the AI to generate a response in the desired voice and style.

  • How can users access and try out the speech to speech feature?

    -Users can access and try out the speech to speech feature by visiting ElevenLabs, either through the link provided in the video description or by joining the platform, which is noted for being easy to use and reasonably priced.



🗣️ Introduction to Speech-to-Speech Feature

The paragraph introduces the innovative speech-to-speech feature by ElevenLabs, which allows users to not only generate text-to-speech but also clone voices and replicate the speaker's unique style, tone, and emotion. The speaker expresses excitement about the ability to record a voice input and have it repeated back in any chosen voice, including custom cloned voices. The feature is demonstrated through the Speech Synthesis panel, where the user can select a voice, like Isabella, and record audio to be replicated with desired intonation and style. The user also shares admiration for Mike Russell and showcases the versatility of the feature by trying out different voices, including an Australian accent and a cloned version of their own voice. The paragraph emphasizes the impressive accuracy and emotional depth of the synthesized speech, highlighting the potential for various applications such as radio station liners.


🎉 Excitement and Recommendation for ElevenLabs

In this paragraph, the speaker shares their enthusiasm for the ElevenLabs platform and its capabilities, encouraging viewers to try out the speech-to-speech feature using the link provided in the video description. The speaker praises the ease of use and affordability of ElevenLabs, expressing a desire to hear about the creative ways users employ the technology. The summary emphasizes the empowerment of users to produce audio content with precise control over tone and delivery, showcasing the platform's potential for personalized and engaging voice replication.



💡Speech to Speech

Speech to Speech refers to a technology that converts spoken language into another spoken language, maintaining the original speaker's voice and tone. In the context of the video, it is an AI feature developed by ElevenLabs that allows users to input their voice and have it replicated in any voice or accent they choose, demonstrating a high level of customization and accuracy in voice replication.


ElevenLabs is the company responsible for developing the Speech to Speech feature discussed in the video. They are known for their high-quality text-to-speech services and have now expanded their capabilities to include voice cloning and speech synthesis, which allows for a more personalized and engaging user experience.

💡Text to Speech

Text to Speech, often abbreviated as TTS, is a technology that converts written text into spoken words using synthetic voices. While this technology has been around for some time, the video highlights the advancement to Speech to Speech, which offers more natural and customizable voice output.

💡Voice Cloning

Voice cloning is the process of replicating a specific person's voice, allowing it to be used in various contexts. In the video, the user demonstrates this by creating a cloned voice and using it in the Speech to Speech feature, which showcases the technology's ability to mimic not only the tone but also the emotional nuances of the original voice.

💡Accent Mimicry

Accent Mimicry refers to the ability of the Speech to Speech technology to not only replicate a voice but also to imitate specific accents. This feature is highlighted in the video when the user tests the system by using an American accent, and the AI adjusts the cloned voice to match this new accent.

💡Speech Synthesis Panel

The Speech Synthesis Panel is the user interface within the ElevenLabs software where users can interact with the Speech to Speech feature. It is where users can select voices, adjust settings, and control the speech synthesis process.

💡Record Audio

Record Audio is a function within the Speech to Speech feature that allows users to input their voice by speaking into a microphone. This recorded audio is then used as the basis for the AI to replicate the user's voice and speech patterns.


The Generate function is an action within the Speech to Speech feature that processes the recorded audio and produces the synthesized speech output. Once the user clicks Generate, the AI system works to create a voice response that matches the user's input in terms of voice, tone, and emotion.

💡Voice Settings

Voice Settings refer to the customizable options available to users within the Speech to Speech feature, allowing them to adjust parameters such as pitch, speed, and accent to fine-tune the output of the synthesized voice.

💡Digital A.I. Glitching

Digital A.I. Glitching refers to minor errors or irregularities that may occur in the output of AI-generated speech, indicating that the technology, while advanced, is not yet perfect. These glitches can manifest as unexpected changes in tone, pitch, or other audio artifacts.

💡Personalized Audio Production

Personalized Audio Production involves creating audio content that is tailored to an individual's preferences, including voice choice and speech characteristics. The Speech to Speech feature enables users to produce personalized audio by replicating voices and adjusting the output to match specific tones and emotions.


ElevenLabs has introduced a revolutionary Speech to Speech feature.

This AI technology can replicate any voice, including cloned voices.

Users can speak into a microphone and get a response in their desired voice and style.

The Speech Synthesis panel allows for easy selection and customization of voices.

The feature includes the ability to record audio and generate a speech response.

The accuracy and emotion of the replicated voices are impressive.

Multiple voices, including loud and Australian accents, are available for use.

The traditional text to speech method is compared to the new speech to speech feature.

The feature can be used for creative applications like radio station liners.

The AI can mimic different accents, even when cloning a voice.

The voice cloning feature can replicate the speaker's accent when speaking in a different dialect.

Despite some minor digital glitches, the model is continuously improving.

ElevenLabs offers an easy and cost-effective way to access these advanced features.

The Speech to Speech feature has a wide range of practical applications beyond just entertainment.

The video provides a link for viewers to try out the Speech to Speech feature themselves.

The feature allows for precise control over the tone and delivery of voice outputs.

The Speech to Speech technology is a significant advancement in voice AI.