AI News: Everything You Missed This Week!

Matt Wolfe
5 Jul 202419:28

TLDRThis week in AI news, despite the quiet due to the 4th of July, there's plenty to discuss. Gen 3's public release, improvements in text-to-video and voice generation, and new AI models like 11 Labs' reader app updates and the open-source voice model by Q Tai. Meta's 3D Genin for creating 3D images from text, and the open-source large language model Intern LM 2.5 from Hugging Face are highlights. Plus, updates on AI's role in content creation, lawsuits against Open AI, and exciting tech like Open Television's remote robotic operation.

Takeaways

  • πŸŽ‰ Gen 3, a text-to-video generator, was made publicly available, but it's only accessible to pro users of Runway.
  • 🎬 Luma AI remains the top choice for image-to-video generation, offering more impressive results compared to Gen 3.
  • πŸ—£οΈ 11 Labs added famous voices to their reader app, including Judy Garland and James Dean, with permissions from their estates.
  • πŸ”Š 11 Labs introduced a new voice isolator feature to clean up audio with background noise, promising crystal clear audio output.
  • 🎼 Sunno released an app for iOS that mirrors the functionality of their web app, making music creation more accessible on mobile devices.
  • πŸ€– Meta's research unveiled '3D Genin', a text-to-3D image generator that could revolutionize game development and 3D asset creation.
  • πŸ“’ An open-source AI research lab, Q Tai, released a new voice model to compete with GPT-40's advanced voice assistant, offering real-time responses.
  • 🌐 Hugging Face released 'Intern LM 2.5', an open-source large language model with a 1 million context window for developers to utilize.
  • πŸ› οΈ The Brave browser updated to allow users to integrate custom AI models into their browsing experience, enhancing personalization.
  • πŸ” Perplexity's Pro Search now supports multi-step reasoning, improved math and programming capabilities with the integration of Wolfram Alpha.
  • πŸ“š Open AI faces another lawsuit over copyright infringement, with claims that they used content without permission or compensation.

Q & A

  • What major event in the United States might have contributed to the slower pace of AI announcements during the week?

    -The 4th of July, a national holiday in the United States, likely contributed to a slower pace of AI announcements as many companies tend to slow down on major releases during holiday weeks due to factors like travel.

  • What is Gen 3 and how can it be accessed by users?

    -Gen 3 is a text-to-video generator that was made publicly available. It can be accessed by pro users of Runway, who can find it under 'generate videos' and start creating content by entering their prompts.

  • What is the limitation of Gen 3 in comparison to Luma AI?

    -While Gen 3 is considered the best text-to-video generator currently available, it does not support image-to-video generation, a feature that Luma AI excels at and provides more impressive results.

  • Which famous voices were added to the 11 Labs reader app and how were they obtained?

    -11 Labs added famous voices such as Judy Garland, James Dean, Bert Reynolds, and Sir Laurence Olivier to their reader app. These voices were obtained with permission from the respective estates, ensuring that they are used legally and the estates are compensated.

  • What new feature did 11 Labs release to improve audio quality?

    -11 Labs released a new voice isolator feature that allows users to upload audio with background noise, which the feature then cleans up to produce clearer audio.

  • What is the difference between the new voice model released by Q Tai and the current GPT-40 advanced voice assistant?

    -The new voice model from Q Tai is open-source, allowing other companies to build off of it. While it is not as advanced as GPT-40 in terms of expressiveness and sound quality, its open-source nature means it has the potential to improve significantly as it is integrated with other technologies.

  • What is the significance of the open-source large language model 'Intern LM 2.5' and how does it differ from other models?

    -Intern LM 2.5 is significant because it is open-source and has a 1 million context window, which is a large capacity for understanding and processing text. This makes it accessible for a wide range of applications and allows anyone to build upon it.

  • How does the Brave browser's update allow users to customize their AI experience?

    -The Brave browser update allows users to integrate their own AI models into the browser, alongside the built-in Leo AI. Users can now add custom models, expanding their AI capabilities beyond what is provided by the browser itself.

  • What is the controversy surrounding Open AI and its use of content from the Center for Investigative Reporting?

    -The Center for Investigative Reporting is suing Open AI and Microsoft for copyright infringement, claiming that they used their content to improve their product without permission or compensation. This raises questions about the use of copyrighted material in AI training and the need for proper licensing agreements.

  • What new feature did YouTube introduce to protect the likeness and voice of its content creators?

    -YouTube introduced a feature that allows creators to request the removal of content that simulates their likeness or voice, expanding on the existing policy that covered stolen content or IP infringement.

  • What is the significance of the upcoming release of Grock 2, and how does it address previous concerns about AI training data?

    -Grock 2, set to release in August, is significant as it promises to be a major improvement in terms of AI training data. It aims to address concerns about models training on each other's data, which can lead to a 'human centipede effect' of data contamination.

Outlines

00:00

πŸ“° AI News Roundup: Gen 3 Launch and Text-to-Video Advancements

The script begins with a roundup of AI news, highlighting the public release of Gen 3, a significant update in the text-to-video generation space. Despite a quieter week due to the 4th of July holiday in the United States, noteworthy developments include the availability of Gen 3 for pro users of Runway, offering enhanced video generation capabilities. The script also compares Gen 3 with Luma AI, noting Luma's superior image-to-video generation. Additionally, 11 Labs updates its reader app with famous voices like Judy Garland and James Dean, emphasizing the importance of obtaining permissions from estates. The introduction of a voice isolator feature by 11 Labs is also noted, showcasing its ability to clean up audio with background noise.

05:00

πŸ€– New Developments in AI: 3D Image Generation and Voice Models

This paragraph delves into advancements in AI with a focus on text-to-3D image generation, exemplified by Meta's new research, 3D Genin, which promises to accelerate game development and 3D video asset creation. The script mentions the release of a new voice model by an open-source AI research lab, Q Tai, which aims to compete with GPT-40's advanced voice capabilities. The model is open-source, allowing other companies to build upon its technology. The paragraph also introduces Intern LM 2.5, an open-source large language model with a 1 million context window, and discusses updates to the Brave browser that allow users to integrate custom AI models.

10:01

πŸ“˜ AI and Copyright: Lawsuits, Content Use, and Ethical Considerations

The script addresses ongoing legal issues surrounding AI and copyright, with the Center for Investigative Reporting suing Open AI and Microsoft for copyright infringement. It discusses the implications of using content from the open web for AI training and the differing views on what constitutes fair use. The introduction of a feature by Cloudflare to prevent AI scraping is noted, as well as Figma's approach to training AI models on user designs and the controversy over its weather app's similarity to Apple's design. YouTube's new policy to allow content creators to request the removal of AI-generated impersonations is also highlighted.

15:03

πŸ” AI Integrations and Innovations: From Social Media to Robotics

The final paragraph covers a range of AI-related news, including Instagram's adjustment to its AI labeling, the anticipated release of Grock 2 by Elon Musk, and rumors of Apple partnering with Google Gemini. It mentions WhatsApp's new feature to generate cartoon versions of users' selfies, the competition to Meta's Ray-Ban Stories with a new company using Chat GPT-40, and the introduction of Open Television, which allows remote operation of a robot. The script concludes by encouraging viewers to subscribe for more AI news and tutorials, emphasizing the channel's commitment to exploring emerging technologies.

Mindmap

Keywords

AI News

AI News refers to the latest developments and updates in the field of artificial intelligence. In the context of the video, it is the central theme that ties together various AI-related announcements and innovations discussed throughout the transcript.

Gen 3

Gen 3 is a reference to the third generation of a product or technology, specifically mentioned as 'Gen 3 access' becoming publicly available. It is a significant update in the AI space, indicating advancements in the capabilities or features of the technology being discussed.

Runway

Runway is a platform mentioned in the script, which seems to offer different tiers of access, including a 'pro' user level. Gen 3 is available to pro users of Runway, indicating it as a platform where AI advancements like Gen 3 are integrated and utilized.

Text-to-Video Generator

A text-to-video generator is an AI tool that creates videos based on text prompts. In the video script, it is used to illustrate the capabilities of Gen 3, comparing it to other similar technologies like Luma AI, to highlight the current state of AI-generated video content.

11 Labs

11 Labs is an organization that has released updates to its 'reader app,' adding famous voices and a voice isolator feature. It demonstrates the integration of AI in enhancing audio experiences and the ethical considerations of using deceased celebrities' voices with proper permissions.

Sunno

Sunno is mentioned as a music creation app that has released a new feature allowing users to create music on their phones. It represents the application of AI in the creative process of music production, making it more accessible and user-friendly.

3D Genin

3D Genin is a research project by Meta that converts text prompts into 3D images. It is highlighted as a potential game-changer for industries like game development and 3D video production, indicating the expanding capabilities of AI in visual content creation.

Open Source

Open Source refers to software or a model where the source code is available to the public, allowing anyone to view, use, modify, and distribute it. In the script, it is mentioned in relation to new AI voice models and large language models, emphasizing community collaboration and innovation in AI development.

Brave Browser

The Brave browser is highlighted for its update allowing users to integrate custom AI models into the browser. This feature represents the growing trend of personalization and the integration of AI into everyday digital tools for enhanced user experience.

Perplexity Pro

Perplexity Pro is a service mentioned for its advanced search capabilities, including multi-step reasoning and integration with Wolfram Alpha for improved math and programming queries. It exemplifies the application of AI in enhancing search functionality and providing more nuanced, intelligent answers.

Copyright Infringement

Copyright infringement is a legal issue raised in the script regarding the use of copyrighted material to train AI models without permission or compensation. It is a significant topic in the AI news discussed, reflecting the ongoing debate about the ethical and legal use of data in AI development.

Highlights

Gen 3 access was made publicly available for pro users of Runway, marking a significant update in text-to-video generation technology.

A 4th of July themed video attempt with Gen 3 showcases the current capabilities and limitations of the technology.

Luma AI's image-to-video generation is still considered superior to Gen 3's capabilities.

11 Labs updates its reader app with famous voices like Judy Garland and James Dean, obtained with permission from their estates.

11 Labs introduces a new voice isolator feature to clean up audio with background noise, demonstrating impressive results.

SunnO releases an app for iOS that mirrors the functionality of its web version, making music creation more accessible.

Meta's research on text-to-3D images, named 3D Genin, could potentially accelerate game development and 3D video asset creation.

Kotai, an open-source AI research lab, releases a new voice model to compete with GPT-40's advanced voice assistant, with real-time response capabilities.

Hugging Face releases an open-source large language model, Intern LM 2.5, with a 1 million context window for broader applications.

Brave browser updates to allow users to integrate custom AI models into their browsing experience.

Perplexity's Pro Search introduces multi-step reasoning and improved math and programming capabilities with Wolfram Alpha integration.

Apple reportedly secures a board seat in Open AI, although in an observer role without voting rights.

Open AI faces another lawsuit over copyright infringement, this time from the Center for Investigative Reporting.

Mustafa Solman's comments on the social contract of content on the open web spark debate on fair use and copyright.

Cloudflare rolls out a solution to prevent AI scraping of websites, offering users more control over their content.

Figma's AI features come under scrutiny for potential design similarities with Apple's weather app, leading to a temporary pause of the feature.

YouTube introduces a feature allowing users to request the removal of AI-generated content that simulates their likeness or voice.

Instagram modifies its AI content labeling to provide more accurate information about the use of AI in posted images.

Elon Musk teases the release of Grock 2 in August, promising significant improvements over its predecessor.

Rumors suggest Apple may partner with Google Gemini for AI integration, following their collaboration with Open AI.

WhatsApp is reportedly developing a feature to generate cartoon versions of users' selfies, similar to Apple's WWDC showcase.

A new company aims to compete with Meta's Ray-Ban Stories by integrating an LLM, specifically GPT-40, into their smart glasses.

Open Television demonstrates a remote robotic operation system, allowing control from thousands of miles away, inspired by the movie Avatar.