How We DRASTICALLY Improved AI Vocals

Benn Jordan
4 Sept 202317:02

TLDRThe video discusses the advancements in AI technology, specifically focusing on AI vocals and voice cloning. It raises ethical concerns about voice cloning and explores the implications of turning AI voices into human-like voices. The speaker shares his seven-year journey with neural networks and music, highlighting the evolution of AI in music production. He introduces a new AI voice cloning workflow developed by his friend Dan, emphasizing the importance of quality control and fair treatment of artists. The video also addresses the economic aspect of AI-generated content, suggesting a new model where artists can license their AI-regenerated voices directly, bypassing traditional labels and publishers. Practical examples of AI in music production are demonstrated, including creating harmonies and licensing vocals for different markets. The speaker proposes a system where artists are fairly compensated for their voice data sets, allowing them to control the use of their voices in various applications. The video concludes by encouraging music producers to collaborate with artists for high-quality AI music tools and to consider the ethical implications of using AI in music.

Takeaways

  • 🎀 The speaker has been deeply involved with AI and music for seven years, starting with Google's Magenta project.
  • πŸ€– AI voice cloning technology has advanced significantly, but still has room for improvement in terms of sounding natural and convincing.
  • πŸš€ There are ethical concerns regarding voice cloning, including privacy and consent, which need to be addressed.
  • πŸ“ˆ The economics of AI voice technology is crucial for its growth and sustainability in the music industry.
  • πŸ’° Dan, a software developer and music producer, has developed an AI voice cloning workflow that maintains high-quality standards.
  • πŸ“‰ Major labels and media conglomerates may have less control over AI-generated content due to recent court decisions.
  • 🎢 Musicians can potentially benefit from AI by licensing their AI-regenerated voices, bypassing the need for traditional labels.
  • πŸ“š The speaker suggests a new model for fair compensation of artists in the AI music space, including optional royalty pools.
  • πŸ”Š High-quality voice datasets for AI music tools require the consent and terms of the vocalists or musicians involved.
  • 🚫 Currently, vocalists do not have the right to control the use of their voice data, but this could change with favorable platform terms.
  • 🌟 The speaker is passionate about ensuring AI compensates artists correctly and invites feedback from the audience on the topic.

Q & A

  • What is the main topic discussed in the video?

    -The main topic discussed in the video is the use of AI technology to replicate and manipulate human voices, specifically in the context of music production.

  • What are some ethical concerns raised about voice cloning technology?

    -The ethical concerns raised include the potential misuse of someone's voice without their consent and the implications of creating a digital replica of a person's voice for various purposes.

  • How does the speaker's relationship with neural nets and music originate?

    -The speaker's relationship with neural nets and music dates back to about seven years when Google's team was preparing the pre-release of a project called Magenta.

  • What is the significance of the court decision mentioned in the video?

    -The court decision mentioned in the video ruled that things generated with AI cannot be copyrighted, which has implications for how artists can control and be compensated for their work in the AI era.

  • What is the proposed solution to ensure fair compensation for artists in the context of AI-generated music?

    -The proposed solution involves creating a system where artists and vocalists are paid fairly for the use of their voice data sets, potentially through equity ownership, optional royalty pools, and direct licensing agreements.

  • How does the video demonstrate the practical use of AI in music production?

    -The video demonstrates the practical use of AI in music production by showing how AI can be used to create harmonies, apply natural-sounding vibrato, and transform vocals into different styles or languages.

  • What is the role of the speaker in the voice swap AI project?

    -The speaker is involved in the voice swap AI project as a member of the voting board and an equity holder. His role includes helping to design a system that ensures fair compensation for artists and vocalists.

  • What is the importance of quality control in the creation of voice data sets for AI?

    -Quality control is important because the quality of the voice data sets directly affects the quality of the AI-generated voice. High-quality data sets require careful curation and collaboration with the artists to produce studio-quality sound.

  • Why is the speaker passionate about ensuring that AI compensates artists correctly?

    -The speaker is passionate about this issue because they believe that artists have historically been exploited by the music and tech industries. They see AI as an opportunity to correct past mistakes and establish a fairer system for artists.

  • What is the potential impact of AI-generated voices on the music industry?

    -The potential impact includes a shift in control towards artists, allowing them to license their AI-replicated voices directly, bypassing traditional labels and publishers, and potentially leading to new business models in music production and distribution.

  • How does the video address the issue of unauthorized use of an artist's voice?

    -The video suggests that by having voice data sets behind favorable terms and conditions on a platform, artists can reserve the right to control the use of their voice, making unauthorized use legally actionable.

Outlines

00:00

πŸš€ Introduction to AI and Voice Cloning Technology

The speaker begins by expressing their busy schedule and introduces the topic of using artificial intelligence to manipulate someone's voice. They mention the cool factor of such technology but also the ethical concerns surrounding voice cloning. The video promises to delve into these issues, as well as the process of converting a non-human voice into a human one. The speaker also shares their personal journey with neural networks and music, dating back seven years to Google's Magenta project. They recount their experience with voice cloning and the evolution of the technology, highlighting the current limitations and the potential for improvement.

05:00

🎼 AI in Music Production: Quality and Economics

The speaker discusses the importance of quality control in AI voice cloning for music production. They mention a collaboration with DJ Fresh, who has developed an AI voice cloning workflow that produces high-quality results due to meticulous oversight. The paragraph addresses the economic aspect of the technology, emphasizing fair treatment for artists and performers. The speaker also touches on copyright law and how AI-generated content is not subject to copyright, which could empower artists like Taylor Swift to license their AI-cloned voices without interference from labels or publishers. The potential benefits and challenges of AI in music are explored, including the fear-mongering around AI's impact on musicians and the opportunities for a more equitable system.

10:02

🎀 Practical AI Applications in Music and Licensing

The speaker demonstrates the practical use of AI in music production by creating harmonies from a single monotone recording and using AI voice models to enhance the sound. They also discuss the possibility of licensing AI-cloned vocals for different markets, such as television shows in India. The paragraph outlines a proprietary workflow for adapting vocals to different languages. The speaker then addresses the financial aspect of AI in music, proposing a system where artists are compensated fairly for their voice data sets. They suggest a model where artists could have more control over licensing their voices and how they are used, which is currently not the case.

15:03

πŸ“œ The AI Manifesto: Empowering Musicians in the AI Era

In the final paragraph, the speaker presents their 'AI Manifesto,' emphasizing the utility of AI in music production and advocating for ethical practices. They argue that the best AI tools require the consent and terms of the vocalists or musicians involved. The speaker discloses their involvement in a project that aims to ensure fair compensation for artists in the AI music space. They invite feedback and questions from the audience, acknowledging the complexity of the issues and the ongoing development of solutions. The speaker concludes by encouraging music producers to use AI tools that are developed in collaboration with artists for better quality and ethical considerations.

Mindmap

Keywords

AI Vocals

AI Vocals refers to the use of artificial intelligence to replicate or generate human vocal sounds. In the video, the creator discusses how AI technology has been used to 'steal' someone's voice, which is a significant advancement in the field of music production. It's a core theme as it explores the ethical and practical implications of using AI in vocal performance.

Voice Cloning

Voice cloning is the process of creating a synthetic version of someone's voice using AI and machine learning algorithms. The video script mentions ethical issues with voice cloning, indicating the potential misuse of such technology. It is a key concept as it raises concerns about privacy and consent in the context of AI-generated voices.

Neural Nets

Neural Nets, or more formally known as Artificial Neural Networks, are a subset of machine learning that are designed to mimic the way the human brain works. In the context of the video, the speaker has been experimenting with neural nets for about seven years, particularly in the context of music production, highlighting the evolution of this technology in creating AI vocals.

Text-to-Speech Engine

A text-to-speech engine is a technology that converts written text into spoken words. The script describes the speaker's fascination with cloning his voice into a text-to-speech engine, demonstrating the personal and creative applications of AI technology in vocal generation.

AI Regeneration

AI regeneration in the context of the video refers to the process of using AI to recreate or simulate a voice. Taylor Swift is mentioned as an example of how an artist could license her AI regenerated voice, which would allow her to control the use and distribution of her voice in the digital space.

Copyright Law

Copyright law is the body of laws that give creators of literary and artistic works the right to control the ways in which their material may be used. The video discusses how recent court decisions have ruled that AI-generated content cannot be copyrighted, which has significant implications for musicians and the control over their work.

Music Production

Music production involves the process of creating and recording music in a studio. The video script provides insight into how AI can be practically used in music production, not just as a gimmick but as a tool that can enhance the creative process and offer new possibilities for artists.

Vocoder

A vocoder is a device that can analyze and synthesize the human voice, often used in electronic music to create unique vocal effects. In the script, the speaker demonstrates how AI can be used to create a vocoder effect, showcasing the potential for AI to innovate in sound design.

Data Sets

Data sets in the context of AI refer to collections of data that are used to train machine learning algorithms. The video emphasizes the importance of high-quality data sets for training voice models, which is crucial for achieving professional standards in AI-generated vocals.

Economics of AI

The economics of AI pertains to the financial aspects and business models surrounding the development and use of AI technologies. The speaker discusses the need for a fair economic model that compensates artists for their contributions to AI voice models, which is essential for the sustainable growth of AI in the music industry.

Licensing

Licensing in the video refers to the legal permission given to use a resource, such as a voice or a piece of music, under certain conditions. The script explores the concept of artists licensing their AI regenerated voices, which provides them with more control and potential financial benefits compared to traditional models.

Highlights

The video discusses the use of AI to replicate human voices, touching on the ethical issues and potential implications of voice cloning technology.

The speaker has been experimenting with neural networks and music for seven years, dating back to Google's pre-release of Magenta.

AI voice cloning has come a long way since 2016, with better algorithms, training, and computing power.

Despite advancements, many voice cloning attempts still don't sound convincing, highlighting the need for quality control in voice data sets.

The speaker's friend, Dan, a drum and bass legend and software developer, has developed an AI voice cloning workflow that prioritizes sound quality.

The project aims to be fair to artists and performers, addressing the economic aspects of voice cloning in music production.

A court decision ruled that AI-generated content cannot be copyrighted, potentially empowering artists like Taylor Swift to control their AIε†η”Ÿ voices.

The video demonstrates a practical use of AI in music production by creating harmonies and applying voice cloning to individual files.

A new workflow is presented for licensing vocals to various media, including television shows in different languages.

The speaker addresses the potential of using AI to clone the voice of a cat, showcasing the technology's versatility.

The economics of voice swap AI involve compensating artists fairly and allowing them to control the use of their voices in advertisements and other media.

The speaker proposes a system where artists can negotiate their own licensing terms and have a say in where their AIε†η”Ÿ voices are used.

The video emphasizes the importance of creating high-quality voice data sets through collaboration with artists, rather than using freely uploaded samples.

The speaker is part of the voice swap project, owning equity in the company and being passionate about ensuring fair compensation for artists.

The video concludes with a call to action for musicians and vocalists to be aware of the potential control and benefits they have with AIε†η”Ÿ voices in the music industry.

The speaker invites feedback and questions from viewers, emphasizing the ongoing and collaborative nature of the voice swap project.