Elon Musk CHANGES AGI Deadline..Googles Stunning New AI TOOL, Realistic Text To Video, and More

TheAIGRID
19 Jun 202424:33

Summary

TLDRThe video script discusses pivotal developments in AI, highlighting Google DeepMind's advancements in video-to-audio technology, creating realistic soundtracks for silent clips. It also touches on Google's shift from research to AI product development, facing challenges like researcher attrition. The script explores AI's role in content creation with TikTok's Symphony, Meta's open-source models, and Runway's Gen 3 Alpha for photorealistic video generation. It concludes with Hedra's character generation and Elon Musk's vision for Tesla's AI advancements, predicting AGI by 2026.

Takeaways

  • 🧠 Google DeepMind's video to audio generative technology can add sound effects to silent video clips, enhancing the viewing experience with synchronized audio.
  • πŸŽ₯ Google showcased the capabilities of their AI model with examples including a wolf howling at the moon and a drummer on stage, demonstrating high-quality audio-video synchronization.
  • πŸ”„ Google's AI advancements have led to a shift from a research lab to an AI product factory, indicating a strategic move towards commercializing their AI technologies.
  • πŸ’‘ Google is facing internal challenges with a 'brain drain' as talented researchers leave for other companies, seeking faster product development and deployment.
  • πŸ“ˆ TikTok introduces Symphony, an AI suite designed to enhance content creation by blending human creativity with AI efficiency, aiming to streamline the video production process.
  • 🌐 Meta contributes to the open-source community by releasing numerous AI models and datasets, fostering innovation and collaboration in the AI field.
  • πŸ”Š Meta's audio seal is a new technique for watermarking audio segments, potentially identifying AI-generated speech within longer audio clips.
  • 🎨 Runway introduces Gen 3 Alpha, a text-to-video model that generates highly realistic videos, including photorealistic humans, pushing the boundaries of AI-generated content.
  • πŸ€– Hedra Labs' Character One is a foundation model for headshot generation that can create emotionally reactive characters, opening new possibilities for AI in storytelling and acting.
  • πŸš— Elon Musk discusses Tesla's future with AI, including the integration of advanced AI systems into vehicles and the potential for AGI (Artificial General Intelligence) by 2026.
  • πŸ€– The development of AI humanoid robots like Tesla's Optimus is expected to bring a level of abundance in goods and services, potentially transforming various industries.

Q & A

  • What is the main update from Google DeepMind regarding video to audio generative technology?

    -Google DeepMind has shared progress on their video to audio generative technology, which can add sound effects to silent video clips, matching the acoustics of the scene and accompanying onscreen action.

  • Can you provide examples of the audio effects generated by Google's new model?

    -Examples include a wolf howling at the moon, a slow mellow harmonica playing as the sun sets on the prairie, jellyfish pulsating underwater, and a drummer on stage at a concert with flashing lights and a cheering crowd.

  • How does Google's generative audio technology work?

    -Google's technology uses video pixels and text prompts to generate rich soundtracks, allowing it to create audio that matches the video content effectively.

  • What is the significance of the drummer on stage example in Google's demonstration?

    -The drummer on stage example is significant because it shows the model's ability to sync the audio hits with the actual onscreen actions, indicating a high level of coordination between visual and audio elements.

  • What is Google's recent shift from a research lab to an AI product factory?

    -Google has made a major shift from focusing on research to producing AI products, which reflects a change in strategy from foundational research to commercializing their breakthroughs.

  • Why have some researchers left Google DeepMind?

    -Some researchers have left Google DeepMind due to frustrations with the company's pace of product release and a perceived lack of focus on shipping features to the masses.

  • What is TikTok's Symphony, and how does it help content creators?

    -Symphony is TikTok's new creative AI suite designed to elevate content creation by blending human imagination with AI-powered efficiency, helping users make better videos and analyze trends for effective content ideas.

  • What is Meta's contribution to the open-source AI community?

    -Meta has released a large number of open models and datasets, including multi-token prediction models, Meta Chameleon for image and text reasoning, Meta Audio Seal for audio watermarking, and Prism for diverse geographic and cultural features.

  • What is Runway's Gen 3 Alpha, and how does it advance video generation?

    -Runway's Gen 3 Alpha is a text-to-video model trained on new infrastructure for large-scale multimodal training, offering highly realistic video generation, especially with photorealistic humans.

  • What is Hedra Labs' Character One, and what does it enable?

    -Hedra Labs' Character One is a foundation model that enables reliable headshot generation and storytelling with emotionally reactive characters, allowing next-level content creation.

  • What are Elon Musk's predictions regarding Tesla's AI advancements?

    -Elon Musk predicts that Tesla will achieve AGI, or Artificial General Intelligence, by 2026 at the latest, enabling humanoid robots to perform a wide range of tasks with a high level of intelligence.

Outlines

00:00

πŸŽ₯ Google DeepMind's Audio-Video Generative Technology

Google DeepMind has made significant progress in video to audio generative technology, allowing silent video clips to be matched with appropriate sound effects and theme tunes. The technology showcases four examples, including a wolf howling at the moon, a harmonica at sunset, jellyfish underwater, and a drummer at a concert. These examples demonstrate the system's ability to generate high-quality, realistic audio that syncs well with the video content. Google's approach to using video pixels and text prompts to create soundtracks is seen as more advanced than current systems, which typically rely on text prompts alone.

05:00

πŸ”„ Google's Shift from Research to AI Product Factory

Google has been undergoing a major transition from a research lab to an AI product factory, which has led to internal conflicts between the desire to prioritize safety and the push to release products. This shift has resulted in a 'brain drain', with talented researchers leaving Google for companies like Lumina and OpenAI, seeking more rapid deployment of AI technologies. Google's struggle to balance foundational research with commercialization has been highlighted by recent events, including the departure of key personnel and the underwhelming reception of Google's AI Overviews.

10:01

🎼 TikTok's Symphony: AI Suite for Content Creation

TikTok introduces Symphony, a creative AI suite designed to enhance content creation by blending human imagination with AI efficiency. Symphony is an evolution of TikTok's creative assistant, offering AI-powered tools to analyze trends and best practices to produce effective video content quickly. It includes features like AI-generated TikToks, transl for global reach with multi-language translations, and a selection of AI avatars for commercial use, aiming to break down global barriers and provide cost-effective narration for brands.

15:02

πŸ“š Meta's Contribution to Open Source AI Models

Meta has released a plethora of open models and datasets, contributing significantly to the open-source AI community. The release includes a multi-token prediction model for faster inference, Meta Chameleon for image and text reasoning, Meta Audio Seal for audio watermarking, Meta Jasco for music generation, and Prism for enhancing geographic and cultural diversity. Meta's commitment to open science and sharing their work publicly is seen as a move to lead the open-source community and foster innovation.

20:03

🎨 Runway's Gen 3 Alpha: Advanced Text-to-Video Model

Runway has introduced Gen 3 Alpha, a groundbreaking text-to-video model that sets a new standard in video generation with its ability to create photorealistic humans. The model is part of Runway's new infrastructure for large-scale multimodal training and has been praised for its high-quality output that is difficult to distinguish from real footage. The potential applications of this technology are vast, raising questions about the future of online content and the辨别ability of AI-generated media.

πŸ€– Hedra Labs' Character One: Emotionally Reactive AI Characters

Hedra Labs has released Character One, a foundation model capable of generating reliable headshots and telling emotionally engaging stories. The technology stands out for its ability to animate realistic human faces, as demonstrated in a touching Father's Day message video. This development signifies a step towards more realistic and emotionally responsive AI-generated content, blurring the lines between fantasy and reality and prompting discussions about the future of digital interactions.

πŸš— Tesla's AI and Optimus: Future Automation and Education

Elon Musk discusses Tesla's upcoming advancements in AI, including the integration of their AI system, Gro, into Tesla vehicles, enabling cars to perform tasks like picking up groceries. He also mentions the potential of Tesla's Optimus humanoid robot, which could be used for a variety of tasks, including education and language support. Musk predicts that AGI, artificial general intelligence, will be achieved by 2026, leading to an abundance of goods and services facilitated by AI and robots.

Mindmap

Keywords

πŸ’‘AI Generative Technology

AI Generative Technology refers to the use of artificial intelligence to create new content, such as images, videos, or audio, that did not exist before. In the video, Google DeepMind's update on video to audio generative technology is highlighted, showcasing how it can add sound effects to silent video clips, enhancing the viewing experience by matching the acoustics of the scene.

πŸ’‘DeepMind

DeepMind is a UK-based artificial intelligence company that was acquired by Google and is now part of Alphabet Inc. It is known for creating advanced AI systems. In the script, DeepMind's progress on a video to audio generative model is discussed, emphasizing its ability to generate sound for silent video clips, which is a significant development in the field of AI.

πŸ’‘Multimodal AI

Multimodal AI refers to systems that can process and understand multiple types of data inputs, such as text, images, audio, and video. The script mentions Google's infrastructure for multimodal AI, suggesting that they are developing systems that can integrate various forms of data to create more immersive and realistic AI outputs.

πŸ’‘AI Product Factory

The term 'AI Product Factory' is used in the script to describe Google's shift from a research-focused lab to a company that is focused on producing AI products for commercial use. This shift has implications for the pace of innovation and the approach to safety and commercialization within the company.

πŸ’‘Brain Drain

Brain drain refers to the emigration of highly trained or intelligent people from a particular country or organization. In the context of the video, Google is said to be experiencing a brain drain, with many of its top researchers leaving to join other companies or start their own ventures, which could impact Google's ability to innovate in the AI space.

πŸ’‘Open Source

Open Source in the context of AI refers to the practice of making the source code of AI models and tools freely available for others to use, modify, and distribute. The script discusses Meta's contribution to the open-source community by releasing a number of AI models and datasets, which can accelerate innovation and research.

πŸ’‘Meta Chameleon

Meta Chameleon is a model released by Meta that is capable of reasoning about images and text using an early fusion architecture. It represents an advancement in multimodal AI, as it can process and generate a combination of text and images, which is a significant step towards more unified and efficient AI systems.

πŸ’‘Text-to-Video Model

A text-to-video model is an AI system that generates video content based on textual descriptions. The script highlights Runway's Gen 3 Alpha, which is praised for its ability to create highly realistic video content, including photorealistic humans, from text inputs.

πŸ’‘Photorealistic Humans

Photorealistic humans in AI refer to the generation of human images or videos that are so realistic that they closely resemble actual photographs or footage of real people. The script emphasizes the impressive quality of photorealistic humans generated by Runway's text-to-video model, noting that they are difficult to distinguish from real humans.

πŸ’‘AGI (Artificial General Intelligence)

AGI, or Artificial General Intelligence, is the hypothetical ability of an AI to understand, learn, and apply knowledge across a broad range of tasks at a level equal to or beyond that of a human. In the script, Elon Musk is quoted discussing Tesla's advancements towards AGI, suggesting that it is closer than many might think and could have profound implications for various applications, including autonomous vehicles and humanoid robots.

πŸ’‘AI Watermarking

AI watermarking refers to techniques used to embed a digital signature or mark into AI-generated content, allowing for the identification of such content. The script mentions Meta's 'audio seal,' a new technique for watermarking audio segments, which could be used to detect AI-generated speech within an audio snippet, raising questions about the authenticity and attribution of AI content.

Highlights

Google DeepMind's update on video to audio generative technology that adds sound effects to silent clips.

Google's AI model can generate high-quality audio prompts for various scenes, including howling wolves and drummers on stage.

The release of Google's video model, 'vo', which generates drummers hitting drums in sync with audio.

Google's shift from a research lab to an AI product factory, facing challenges with balancing safety and product release.

TikTok introduces Symphony, an AI suite to enhance content creation efficiency.

Meta's release of numerous open models to foster community innovation in AI.

Meta Chameleon, a model for joint text and image processing, and its potential impact on the AI community.

Runway's Gen 3 Alpha, a breakthrough in text-to-video models featuring photorealistic humans.

Hedra Labs' character generation model that creates emotionally reactive characters for content creation.

Elon Musk's vision for Tesla's AI advancements, including autonomous tasks and multilingual support.

Meta's commitment to open science and the release of research models and datasets to accelerate community innovation.

The potential for AI to blur the lines between fantasy and reality with advancements in character animation.

The ethical and practical implications of AI-generated content and its impact on digital authenticity.

The importance of data quality in AI model training and its effect on output realism.

The race between AI companies to release the most advanced models and the potential societal impacts.

Musk's prediction of AGI's arrival and its potential to transform abundance and service provision.

Transcripts

play00:00

there were actually so many different AI

play00:02

stories that I nearly wasn't able to

play00:04

keep up but I'm going to show you guys

play00:05

some of the most important ones because

play00:08

these ones were pretty pivotal so coming

play00:11

in at number one we have Google Deep

play00:13

Mind actually giving us a very

play00:16

fascinating update you can see that

play00:18

Google Deep Mind said that we're sharing

play00:21

progress on our video to audio

play00:24

generative technology it can add silent

play00:27

clips that match the Acoustics of the

play00:29

scene accompany onscreen action and more

play00:32

and here are four examples basically

play00:35

what they're doing is if you have a

play00:36

video that doesn't have any sound

play00:38

effects you can use Google's new model

play00:41

in order to add sound effects and the

play00:43

theme tune behind it it's really really

play00:45

cool so what I will do right now is I'll

play00:48

show you a couple of the examples and

play00:50

then we'll do a deep dive onto why this

play00:53

is probably happening

play01:07

[Music]

play01:21

h

play01:30

[Music]

play01:31

[Applause]

play01:32

[Music]

play01:44

pretty fascinating about these examples

play01:45

is that we can actually see the prompts

play01:47

that they used as well so it takes a

play01:49

combination of the prompts and then of

play01:51

course the video you can see right here

play01:54

prompt for audio wolf howling at the

play01:56

moon then right here which I mean if you

play01:58

just saw the wolf howling at the Moon

play02:00

that one was pretty pretty nice it

play02:02

actually sounded like a really really

play02:04

high quality scene so I do think that

play02:07

you know Google have you know shown us

play02:09

this demo in a way that makes me believe

play02:12

that this is pretty much near ready

play02:14

because the limitations that they speak

play02:16

about they don't seem to be that crazy

play02:18

and after all this is just an audio

play02:20

model then of course we had this one a

play02:22

slow mellow harmonica plays as the sun

play02:25

goes down on the Prairie I thought this

play02:27

one was really fascinating too cuz the

play02:30

sound of this one actually sounded

play02:32

realistic the sound for the jellyfish

play02:35

pulsating underwater marine life ocean

play02:38

didn't sound as great as many other

play02:41

things could be but I do think that it

play02:43

was pretty decent in with regards to

play02:45

what it was able to do and then of

play02:47

course this one right here this is the

play02:49

one that I want to talk about the most

play02:51

because the prompt for the audio was a

play02:53

drummer on stage at a concert surrounded

play02:56

by flashing lights and a cheering crowd

play02:58

so with this one the reason I liked this

play03:00

one so much is because it actually

play03:03

synced up to the hits of the beat so I

play03:06

think that what Google have shown us

play03:08

here is that considering the fact that

play03:10

these output Generations if you didn't

play03:12

know these were also AI generated so a

play03:16

lot of the stuff that you're looking at

play03:17

here like all of these Generations they

play03:20

are generated by Google's in-house

play03:22

system vo if you aren't familiar with vo

play03:25

basically around two weeks ago Google

play03:28

sort of shown us okay some more of their

play03:31

capabilities of their video model and

play03:34

right here you can see the video model

play03:36

in action being able to generate a

play03:38

drummer hitting the drum and like I said

play03:41

the most impressive thing is that it

play03:43

didn't just take poor representations of

play03:46

what was going on in the video it

play03:48

managed to actually sync up the hits

play03:51

with the actual audio that the system

play03:53

gave it so this is something that I

play03:55

think is remarkably remarkably also do

play03:58

have an additional web page right here

play04:01

where they explain how generating audio

play04:03

for video works so essentially they talk

play04:06

about how it uses video pixels and the

play04:09

text prompts in order to generate Rich

play04:11

soundtracks and I think that this is

play04:13

where Google actually comes over the

play04:16

edge because I think Google is you know

play04:19

using systems that are a little more

play04:21

advanced than what we initially have

play04:23

most systems that we currently have

play04:25

today they only really allow us to do

play04:28

text prompts and then you have have to

play04:30

kind of generate as much as possible and

play04:32

then hopefully they sync up to what it

play04:34

is that is on screen now usually if it

play04:37

is just like an acoustic scene like for

play04:39

example a peaceful Meadow or a relaxing

play04:42

Park where you're just hearing like

play04:43

crickets or something like that those

play04:45

things don't really require much

play04:47

onscreen syncing but in certain

play04:50

scenarios Google's process here actually

play04:53

comes in very very useful so I'm

play04:55

wondering what kind of creative process

play04:58

Google are going to be having

play05:00

because you can see that they've got vo

play05:02

and they've got this and it seems that

play05:04

they're building the entire

play05:05

infrastructure for their multimodal AI

play05:08

now if we are continuing to talk about

play05:11

Google we need to talk about Google's

play05:14

major shift Google recently have made a

play05:17

major shift from research lab to AI

play05:20

product Factory now if you remember

play05:22

you've been watching the channel this is

play05:25

something that I spoke about for quite

play05:26

some time and I spoke about how Google

play05:29

has has everything they need in order to

play05:32

take on companies like chat GPT slop aai

play05:35

but the problem is that they have a

play05:37

situation on their hands where they're

play05:39

trying to focus on safety and not push

play05:42

out any products and the craziest thing

play05:44

about this all is that Google Deep Mind

play05:47

have been losing consistently

play05:49

researchers because people wouldn't push

play05:51

features out now I made a 20-minute

play05:53

video on this explaining the details of

play05:55

all the reasons and all the failings

play05:57

that Google had but if you didn't

play05:59

believe believe me this is someone who

play06:01

used to work at Google that actually

play06:03

went to join Luma you know how Luma just

play06:06

recently released their new AI dream

play06:08

machine that people can actually use for

play06:11

text a video he says now you know why I

play06:15

left Google to join Luma I was in the

play06:17

team that developed vo early on but I

play06:20

knew it would never be shipped to the

play06:22

masses for quite some time the same for

play06:25

Sora not until a company like Luma

play06:27

forces their hand that is at least I

play06:30

hope give me access so basically saying

play06:33

that you know I knew that Google weren't

play06:35

going to ship out anything for such a

play06:38

long time so I decided to leave and of

play06:40

course join Luma and remember this isn't

play06:43

the first Google employee to do this

play06:45

many of them have left to join open aai

play06:47

many of them have left to start their

play06:48

own startups so Google has had this kind

play06:51

of brain drain problem where they losing

play06:53

their best talent to other companies and

play06:56

of course people that want to start

play06:57

their own thing because they don't like

play06:59

like how Google's handling this AI craze

play07:02

now essentially you can see here it says

play07:05

over one week in miday two companies

play07:07

introduced AI products being built using

play07:10

one of Google's major breakthroughs

play07:12

openi of course announced the new model

play07:15

that underpins chat GPT and of course

play07:17

Google announced AI overviews but

play07:20

remember the overviews didn't go well

play07:22

which were pretty pretty embarrassing

play07:24

you can see here this article continues

play07:26

to State the discontentment about

play07:28

pushing to too hard on commercialization

play07:31

is a mirror image of the internal

play07:33

critique from the last 2 years when

play07:35

Google was struggling with bringing

play07:37

generative AI to Consumers researchers

play07:39

who wanted to ship products departed for

play07:42

startups because they thought the

play07:44

company was moving too slowly according

play07:47

to people familiar with the lab brain

play07:49

researchers have also mourned the loss

play07:51

of their brand and some even welcom the

play07:54

prospect of stronger leadership

play07:56

basically some of them thought that look

play07:58

the Google brand isn't what it used to

play07:59

be and maybe we need new leadership if

play08:02

someone can't navigate this AI industry

play08:05

so overall this article details a few

play08:08

things about how the company has

play08:10

combined its two AI labs to develop

play08:12

Commercial Services A Move That Could

play08:15

undermine its long running strength in

play08:17

foundational research so the problem is

play08:20

is that Google is one of those labs that

play08:22

like they are so good at doing research

play08:25

and making breakthroughs like I've

play08:26

already said they made the Breakthrough

play08:28

that powers chat G PT but they're

play08:30

thinking do we now spend more of our

play08:33

time focusing on actually

play08:35

commercializing some of our

play08:36

breakthroughs rather than just

play08:38

experimenting in the research Labs that

play08:41

could potentially lead to more

play08:42

interesting things overall it's a very

play08:45

very hard thing it's you know very very

play08:48

hard to prioritize whatever it is at a

play08:50

company that you know could decide the

play08:53

fate of your business for the next 10 to

play08:56

20 years on one side if you don't

play08:58

prioritize product people get frustrated

play09:01

they're like okay we're just going to

play09:02

you know all flock to chat GPT or

play09:05

whatever state-of-the-art system there's

play09:06

going to be but of course if you don't

play09:09

make incredible breakthroughs you're

play09:10

never going to get to that next level of

play09:12

AI where you truly have outstanding

play09:15

products that stand the test of time and

play09:17

have an incredible mode it's a very hard

play09:20

position for Google to be in especially

play09:22

since they're a bigger company and the

play09:24

problem is they don't have the newness

play09:28

to be able to make mistakes and for

play09:30

people to not critique them if open AI

play09:32

kind of does something wrong it's like

play09:34

ah they're kind of like a newer kid on

play09:35

the Block but Google's a behemoth a

play09:37

company that's been around for years

play09:39

that has a pretty pretty impressive

play09:42

reputation to uphold so it will be

play09:44

interesting to see how Demis aabis and

play09:47

Sundar Pai managed to work together to

play09:50

get Google off the ground I do hope that

play09:52

they manag to ship good products and

play09:54

keep a steady hold on their research

play09:57

initiatives because someone wants said

play09:59

and I don't remember who said it but

play10:01

they said that if there's any company

play10:02

that should achieve AGI and that You'

play10:04

probably trust with AGI it's most

play10:06

certainly going to be Google in a bit of

play10:09

strange news we have Tik Tok actually

play10:11

introducing Symphony their new creative

play10:14

AI Suite now I don't usually cover AI

play10:17

social media tools sometimes I do but I

play10:19

wanted to cover a broad range of things

play10:22

that are going on in the AI industry so

play10:24

basically symphony is designed to

play10:26

elevate your content creation Journey

play10:28

every step step of the way it basically

play10:30

Blends human imagination with AI powered

play10:33

efficiency now this tool is basically an

play10:36

evolution of their creative assistant

play10:39

it's an AI powered virtual assistant

play10:41

that basically just helps you make

play10:43

better videos overall so you know how

play10:46

sometimes people look on social media

play10:48

they'll try and find different Trends

play10:50

they'll be like okay what is going on

play10:51

here what are the best practices what

play10:53

kind of ideas should I be making well

play10:56

this is a platform that actually uses

play10:58

generative a to be able to analyze all

play11:01

of those things and then come up with

play11:03

something very effective so you can see

play11:05

right here this is the thing in action

play11:08

you can see you have your product you

play11:10

have the description then of course you

play11:12

have the media assets then of course you

play11:14

can see you're able to actually create

play11:18

Tik toks that are AI driven in just

play11:20

seconds you can see that once you manage

play11:22

to import your content it actually

play11:25

allows you create AI generated content

play11:27

pretty quickly now this is relatively

play11:30

different because it's not creating AI

play11:32

generated content it's synthesizing your

play11:35

content with AI in order to produce that

play11:37

content at Mass scale so I think this is

play11:40

probably the right approach because I

play11:42

think if you can get content out faster

play11:44

with the help of AI that helps everyone

play11:47

who wouldn't want to save time but of

play11:49

course people do not want AI slop on

play11:52

their timelines which is why stuff like

play11:54

this I guess it still works and

play11:56

something that they do have as well is

play11:58

that they have transl for Global reach

play12:00

tell your message for audiences around

play12:03

the world by translating the script and

play12:05

dubbing the voice over in multiple

play12:07

languages with just a few clicks this is

play12:10

something that's going to bring down the

play12:11

global barrier and of course they have

play12:14

this a prebuilt AI Avatar selection that

play12:18

allows you to use it for commercial use

play12:21

so like stock photos they provide quick

play12:23

accessible costeffective narration for

play12:25

Brands bringing your product to Life

play12:28

Next we had meta release a huge amount

play12:31

of open models it was so many that it's

play12:34

hard to even fathom what's going on at

play12:36

meta now these aren't game-changing

play12:38

models but they will change the

play12:41

community a lot of times a lot of the

play12:43

Community Innovations that we see are

play12:45

built off the back of Open Source models

play12:47

and this entire ecosystem of Open Source

play12:50

development is continuing to thrive

play12:53

thanks to meta take a listen to what

play12:55

they had to say and then we'll dive into

play12:57

some of the more intricate details of

play12:59

what they released today at meta we are

play13:02

sharing some of our latest research

play13:04

models and data sets we've shared some

play13:07

of the papers for this work but um by

play13:09

sharing more of the artifacts we should

play13:11

enable the community to innovate faster

play13:14

and also develop new research now this

play13:17

is part of our decades long commitment

play13:19

to open science and sharing our work

play13:22

publicly what's included in the release

play13:24

today there's a multi-token prediction

play13:26

model that can actually reason about

play13:30

multiple outputs at a time enabling

play13:32

faster inference there is meta chameleon

play13:35

a model that reasons about images and

play13:37

text using an early Fusion architecture

play13:40

there is a meta audio seal a new

play13:42

technique for watermarking audio

play13:45

segments there's meta Jasco a technique

play13:47

for music generation that allows better

play13:50

conditioning on chords and Tempo and

play13:52

there's prism a data set that enables

play13:55

better diversity from Geographic and

play13:57

cultural features there's a few more

play13:59

things as well uh I really uh am excited

play14:02

to continue our work towards open

play14:05

research I look forward to seeing what

play14:08

the community will build with these uh

play14:10

latest artifacts and of course look

play14:12

forward to sharing more with you over

play14:13

time so here you can see Meta sharing

play14:17

new research models and data sets from

play14:19

meta fair this is pretty remarkable

play14:22

stuff meta seems to be one of the only

play14:25

companies backing The open- Source area

play14:28

from a position of strength whilst yes

play14:31

companies like Google have released

play14:33

things like Gemma 7 billion parameters

play14:36

and those smaller models meta are taking

play14:38

a clear stand I think meta's initiative

play14:41

is that they want to be the leaders in

play14:43

the open source community and it makes

play14:46

sense because what they've released are

play14:48

things that are truly truly impressive

play14:50

you can see right here they've released

play14:52

meta chameleon it's a family of models

play14:55

that can combine text as images as input

play14:57

and output any combination of text and

play15:00

images with a single unified

play15:02

architecture for both encoding and

play15:04

decoding so this is very very

play15:07

fascinating take a look at this video

play15:09

that they had on their page CU I think

play15:11

meta is going to become one of the major

play15:13

players that sneak up on everyone

play15:15

especially since they're supposed to

play15:16

release llama 3 meta chameleon is a

play15:19

unified multimodal model with joint

play15:21

modeling of text and images in one

play15:24

Transformer it's able to perform any

play15:26

combination of inle text and images and

play15:29

its input and output without the need

play15:31

for modality specific

play15:33

modules most current late Fusion models

play15:36

use diffusion based learning for image

play15:38

tasks and tokenization for language

play15:42

tasks built upon an early Fusion

play15:44

architecture meta chameleon uses

play15:46

tokenization for text and images making

play15:50

for a more unified approach we believe

play15:52

this approach can scale better than late

play15:54

Fusion or modular models while being

play15:57

easier to design and maintain

play16:00

[Music]

play16:03

so yeah meta have released some pretty

play16:06

interesting stuff audio seal being the

play16:09

first audio watermarking technique

play16:12

specifically designed for the

play16:13

localization of detection of AI

play16:16

generated speech making it actually

play16:19

possible to pinpoint AI generated

play16:21

segments within a longer audio snippet

play16:23

so all of these things that they

play16:25

released like even the prism data set

play16:28

which actually increases the diversity

play16:31

of certain things I think meta is taking

play16:34

a different stance you know people have

play16:36

realized that you look didn't mean to do

play16:39

that people have realized that look

play16:40

opening eye has the llm area down meta

play16:44

they're doing things that are truly

play16:46

Innovative because of course they have V

play16:47

jeppa if you don't know that is a unique

play16:50

architecture that apparently might

play16:52

actually lead us to systems that truly

play16:54

understand what's going on and of course

play16:56

they have the open source area where

play16:58

people are going to be building upon it

play16:59

and then of course we're going to get

play17:01

Innovations on top of that now if you

play17:03

were living under a rock you may have

play17:05

actually missed one of the most

play17:07

important announcements in video

play17:09

generation Runway introduced gen 3 Alpha

play17:13

so gen 3 Alpha is the first of an

play17:15

upcoming series of models trained by

play17:17

runway on a new infrastructure built for

play17:20

large scale multimodal training long

play17:22

story short their text to video model is

play17:25

absolutely insane now I did cover this

play17:28

earlier this week it was truly truly

play17:30

impressive what we saw was something

play17:32

that was so amazing and I think one of

play17:34

the key features about Runway that you

play17:37

probably should be paying attention to

play17:39

is the fact that they have

play17:41

photorealistic humans everything about

play17:43

their system is really really good but I

play17:45

have to be honest with you the photo

play17:47

realistic humans genuinely looks better

play17:50

than open AI Sora and that isn't some

play17:53

sort of clickbait some sort of you know

play17:55

exaggeration I've looked at both videos

play17:58

now and every time I look at the

play18:00

photorealistic humans that comes from

play18:03

Runway I struggle to see any true issues

play18:07

with the quality of the content in terms

play18:11

of just the you know nature of how it

play18:13

looks like it does look extremely

play18:16

realistic and I still struggle to

play18:19

believe that what I'm looking at is

play18:20

truly text a video so yeah it's pretty

play18:23

strange that right now we have text a

play18:25

video that's so high quality and there

play18:28

aren't any any imperfections that would

play18:30

make you think otherwise but this thing

play18:32

is truly truly impressive with as to

play18:35

what it's able to do because I mean like

play18:37

I said before I just can't see enough

play18:40

mistakes in these clips for me to

play18:41

realize that it's AI generated whereas

play18:44

with other models you can actually see

play18:46

that so I would argue that this AI

play18:49

system with the photo realistic humans

play18:51

which is a special feature that they

play18:53

decided to train on I'm guessing that

play18:56

whatever kind of training mechanism they

play18:58

used for this one they had a lot of high

play19:00

quality data sets and as we know data

play19:03

definitely affects the output so with

play19:06

this right here what we can see is that

play19:08

Runway is going to be the leader in

play19:10

terms of these photorealistic humans and

play19:13

I think it's also going to be pretty

play19:15

fascinating because this marks a new

play19:18

stage where not even I can tell if

play19:20

something is AI generated and for sure

play19:23

I'm going to be questioning nearly

play19:24

everything I see online especially when

play19:27

there's a HD video of a human human

play19:29

because I'm not going to be able to know

play19:31

if that is real or if it is AI generated

play19:34

now unfortunately with Runway we don't

play19:37

actually have access to this yet it does

play19:39

look very very interesting and very very

play19:42

effective as it's able to cover a wide

play19:45

range of styles and it's able to

play19:47

essentially merge things that haven't

play19:50

been merged before in a very effective

play19:52

way so it will be interesting to see who

play19:55

releases first will it be Runway will it

play19:57

be Sora I mean we've got a very

play20:00

fascinating you know few months for the

play20:03

rest of the year in terms of what the

play20:04

releases are going to be like and of

play20:06

course in terms of AI development so it

play20:09

will be interesting to see what kinds of

play20:10

models do come out of this area now in

play20:13

AI video news we got introducing the

play20:15

research of our foundation model

play20:17

character one available today at hr.com

play20:21

basically this is something that can

play20:24

actually do reliable head shot

play20:27

generation tell stories that if you

play20:29

didn't know before this is something

play20:31

that AI particularly struggles with and

play20:33

we've seen that AI systems have been

play20:36

able to do this with Microsoft Versa one

play20:38

but hedra have gone ahead and of course

play20:41

released this for the public to use for

play20:43

free it's something that's going to

play20:45

allow Next Level content Creation with

play20:48

emotionally reactive characters now

play20:50

there is one example that I want to show

play20:52

you that shows you how crazy this is

play20:55

because it's pretty pretty incredible

play20:57

with as to what you can can do with this

play20:59

kind of Technology let me show you you

play21:02

can see here uncanny Harry AI says meet

play21:05

my mate Dave he's had a belated Father's

play21:07

Day message for everyone sound on please

play21:09

I made Dave on Father's Day with a mid

play21:11

Journey image and 11 Labs voice to voice

play21:14

and a new tool called hedra Labs it's

play21:17

the closest thing to acting that I've

play21:18

seen from an AI generated video and out

play21:21

of all the examples even the one on

play21:23

hedra's page this by far seems like the

play21:26

most interesting example that I've seen

play21:29

so far my old dad he was old school you

play21:32

know British stiff up a li he loved me

play21:34

and I loved him but we didn't talk about

play21:36

it much didn't have that sort of

play21:38

relationship so when he got to cancer

play21:41

and my mom was looking after him I'd see

play21:43

him but I always thought I'd had time

play21:45

one day I got the call and I rushed to

play21:47

the hospital I was going to tell him

play21:49

everything but it was too late he'd

play21:50

already passed we never got the chance

play21:52

to tell him thank you for being a good

play21:54

dad thank you for teaching me how to be

play21:56

a good dad thank you for teaching me how

play21:58

to be a man thought dad still land go

play22:00

and tell him now that you love him

play22:02

before it's too late so yeah that is

play22:04

pretty pretty incredible I mean if we

play22:06

look at how the face is moving how the

play22:08

skin is folding it just looks remarkably

play22:12

impressive and it doesn't just look like

play22:14

they've isolated the face you can also

play22:16

see that the the rest of the area is

play22:19

moving as well so this is a truly truly

play22:22

pivotal moment because I think maybe

play22:24

they're going to have cartoon versions

play22:26

be a lot more realistic because of

play22:28

course of course if you could just put

play22:29

in any realistic human's face and it be

play22:32

able to animate it it's going to present

play22:34

a lot of issues and maybe even there

play22:37

could be some liability for hedra

play22:39

themselves so this is the kind of

play22:41

technology that starts to blur the lines

play22:43

between fantasy and reality and this

play22:45

also brings the question what kind of

play22:47

things do you think people are going to

play22:48

be in the future when we are actually

play22:51

looking at these kinds of content if we

play22:53

can imagine a stage where AI gets to the

play22:56

level where the latency drops down to

play22:59

let's say a few milliseconds are we

play23:01

going to be interacting with humans or

play23:03

these things over the internet that are

play23:05

basically digital twins something to

play23:09

think about and for the last clip we

play23:10

have Elon Musk talking about Tesla's new

play23:13

announcements and of course his AGI

play23:16

revised date Tesla so you'll be able to

play23:19

access Gro through Tesla through your

play23:20

Tesla and you also be able to ask it to

play23:23

do whatever you want you could uh ask

play23:25

your car to go go pick up uh a friend or

play23:29

anything you can think of it it'll the

play23:31

car will be able to do it yeah you'll be

play23:33

able to ask your Tesla to go pick up

play23:36

groceries pretty much anything op

play23:38

Optimus is really going to be next level

play23:39

you'll be able to uh skin Optimus in a

play23:42

wi you know pretty much with anything so

play23:44

people on the internet have asked me to

play23:46

make cat GS real and actually you can

play23:48

make cat gills real if you you have a

play23:50

robot you have a robot catg yeah Optimus

play23:53

will will be able to pick up your kids

play23:54

from school and Optimus will be able to

play23:57

be school if you want be able to teach

play23:59

kids anything yeah it'll support any

play24:01

language too I think AI will be next

play24:03

year probably it's not next year it's I

play24:06

say

play24:07

2026 at the latest for AGI at the latest

play24:10

oh hope it's nice to us so when defines

play24:13

AGI is uh smarter than any human I think

play24:17

it's we're less we're less than 24

play24:19

months away from that yeah please be

play24:21

nice to us AI humanoid robots will user

play24:24

in a level of abundance that was hard to

play24:27

hard to imagine there will be no

play24:29

shortage of goods and services

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI TechnologyDeepMindVideo GenerationAudio SynthContent CreationAI InnovationGoogle AIMultimodal AIAI ResearchAI Products