Speech to Speech is HERE and it’s EPIC! Latest AI Feature from ElevenLabs Blows My Mind

Mike Russell
15 Nov 202305:32

TLDRElevenLabs' groundbreaking Speech to Speech feature is revolutionizing AI technology. It not only replicates your words but also captures the exact tone and emotion you desire. Users can clone voices or select from a vast library, and the AI will accurately reproduce speech with precision, offering a personalized and immersive experience. This tool is a game-changer for content creators, offering endless possibilities and a level of customization previously unattainable.


  • 🎉 ElevenLabs has introduced a new feature called Speech to Speech that allows users to input speech and have it repeated back in various voices.
  • 🎙️ The feature can replicate not only the words but also the tone and emotion of the original speech, making it highly customizable.
  • 📞 Users can select from ElevenLabs' own voices or use a cloned voice that they have created themselves.
  • 💻 The process is accessible through a laptop or any device with a microphone, making it user-friendly.
  • 🔊 There is an option to record audio within the Speech Synthesis panel, which then generates the speech in the chosen voice.
  • 🔗 A link is provided in the description for viewers to try out the feature themselves.
  • 📣 The text-to-speech feature is already of high quality, but the addition of speech-to-speech enhances the personalization aspect.
  • 🗣️ The accuracy of the speech replication, including the speaker's accent and emotion, is impressive and provides a more natural-sounding output.
  • 📈 ElevenLabs offers a variety of voices, including those with different accents, which can be tested and utilized by users.
  • 🧍‍♂️ The user demonstrated how the feature works with different voices, including a voice clone of the user himself.
  • 🤖 There was a mention of some digital AI glitching, but it is expected that the model will improve over time.
  • 🌐 The user encourages others to join ElevenLabs and share their creations made possible by the speech-to-speech feature.

💡Speech to Speech

Speech to Speech refers to a technology that converts spoken language into another spoken language, maintaining the original speaker's voice and tone. In the context of the video, it is an AI feature developed by ElevenLabs that allows users to input their voice and have it replicated in any voice or accent they choose, demonstrating a high level of customization and accuracy in voice replication.


ElevenLabs is the company responsible for developing the Speech to Speech feature discussed in the video. They are known for their high-quality text-to-speech services and have now expanded their capabilities to include voice cloning and speech synthesis, which allows for a more personalized and engaging user experience.

💡Text to Speech

Text to Speech, often abbreviated as TTS, is a technology that converts written text into spoken words using synthetic voices. While this technology has been around for some time, the video highlights the advancement to Speech to Speech, which offers more natural and customizable voice output.

💡Voice Cloning

Voice cloning is the process of replicating a specific person's voice, allowing it to be used in various contexts. In the video, the user demonstrates this by creating a cloned voice and using it in the Speech to Speech feature, which showcases the technology's ability to mimic not only the tone but also the emotional nuances of the original voice.

💡Accent Mimicry

Accent Mimicry refers to the ability of the Speech to Speech technology to not only replicate a voice but also to imitate specific accents. This feature is highlighted in the video when the user tests the system by using an American accent, and the AI adjusts the cloned voice to match this new accent.

💡Speech Synthesis Panel

The Speech Synthesis Panel is the user interface within the ElevenLabs software where users can interact with the Speech to Speech feature. It is where users can select voices, adjust settings, and control the speech synthesis process.

💡Record Audio

Record Audio is a function within the Speech to Speech feature that allows users to input their voice by speaking into a microphone. This recorded audio is then used as the basis for the AI to replicate the user's voice and speech patterns.


The Generate function is an action within the Speech to Speech feature that processes the recorded audio and produces the synthesized speech output. Once the user clicks Generate, the AI system works to create a voice response that matches the user's input in terms of voice, tone, and emotion.

💡Voice Settings

Voice Settings refer to the customizable options available to users within the Speech to Speech feature, allowing them to adjust parameters such as pitch, speed, and accent to fine-tune the output of the synthesized voice.

💡Digital A.I. Glitching

Digital A.I. Glitching refers to minor errors or irregularities that may occur in the output of AI-generated speech, indicating that the technology, while advanced, is not yet perfect. These glitches can manifest as unexpected changes in tone, pitch, or other audio artifacts.

💡Personalized Audio Production

Personalized Audio Production involves creating audio content that is tailored to an individual's preferences, including voice choice and speech characteristics. The Speech to Speech feature enables users to produce personalized audio by replicating voices and adjusting the output to match specific tones and emotions.


