Microsoft's new "Embodied AI" SHOCKS the Entire Industry! | Microsoft's Robots, Gaussian Splat & EMO

AI Unleashed - The Coming Artificial Intelligence Revolution and Race to AGI
29 Feb 202426:07

Summary

TLDRこの動画スクリプトは、AIの世界で起こった衝撃的な出来事、特にFigureという企業がNvidia、Jeff Bezos、Microsoft、OpenAIなどから莫大な投資を受け、人間型ロボットの商業展開を加速させるニュースに焦点を当てています。OpenAIはFigureと協力して、ロボットの認識、推論、および相互作用を拡張するための次世代AIモデルを開発します。ビデオはまた、中国からのSoraという革新的なビデオ生成AIの紹介、そして人工知能の分野での最新の進歩とそれが世界に与える影響についても触れています。最終的に、動画は、AIとロボティクスの統合がどのように現実の世界に革命をもたらす可能性があるかを示唆しています。

Takeaways

  • 😊 人工知能の進歩についてのビデオです。
  • 🤖 Figure社の人型ロボットについて説明しています。
  • 😮 OpenAIとFigureが協力する驚くべき発表がありました。

Q & A

  • 質問1

    -回答1

Outlines

00:00

🤖 AIと人間型ロボットの進化

AIの世界で衝撃的な出来事がサブレディットで話題になっており、特にFigureという企業がNvidia、Jeff Bezos、Microsoft、OpenAIなどから莫大な投資を受けていることが注目されています。OpenAIはFigureとの協力を発表し、ロボットの認識、推論、相互作用を拡張するためのマルチモーダルモデルの開発に取り組んでいます。FigureはMicrosoft Azureを活用してAIインフラのトレーニングとストレージを強化し、人間型ロボットの商業展開の加速を目指しています。この動きは、AIとロボティクスの統合が進んでおり、具体的なデモンストレーションが行われていることを示しています。

05:00

🌐 3Dシーンの革新的レンダリング

最新のAIモデル「Sora」は、従来のモデルを大きく上回る3Dシーン生成能力を持っています。従来のテキストからビデオへのAIプラットフォームと比べて、Soraは幾何学的一貫性において顕著な進歩を遂げています。このモデルは、実世界の現象をシミュレートする能力に関して活発な議論を呼んでおり、それが単なるビデオ生成ツールではなく、世界シミュレーター、物理シミュレーションの新しい形態であることが強調されています。Soraは物理的なルールや3D変換を学習し、高度なビデオ生成を実現しています。

10:01

🔍 AIによるデータアノテーションの未来

OpenAIの新しいプロジェクト「Feather」は、画像、音声、ビデオ、テキストなどのデータの自動ラベリングと注釈を可能にするシステムです。特に、GPT-4を使用してビデオゲームや医療データのシーンを解析し、その内容を正確に説明する例が示されています。この技術は、セキュリティカメラの映像分析や医療手順の確認など、多岐にわたる業界での応用が期待されており、次世代AIモデルの開発にも貢献する可能性があります。

15:01

🎭 ディープフェイク技術の進化

Alibaba GroupとIntelligent Computing研究所によって開発された「Emote Portrait Alive」は、単一の参照画像から表情豊かなポートレートビデオを生成する技術です。このモデルは、音声クリップと画像を組み合わせて任意の長さのビデオを生成することができ、リアルタイムでの表情の変化やリップシンクにおいて顕著な進歩を遂げています。この技術は、エンターテイメント業界やセキュリティ分野での応用が期待されています。

20:02

🕹 YouTubeのメタゲームとコンテンツ制作

YouTubeでの成功は、効果的なタイトルとサムネイルの使用に大きく依存しています。これは「メタゲーム」として知られ、最も効果的な戦略を採用することでコンテンツの視聴率を最大化します。クリエイターは、視聴者を引き付けるために衝撃的なタイトルや驚きの表情をサムネイルに使用することが多いですが、これは現在のYouTubeのアルゴリズムに最適化された戦略です。内容の質と視聴者からの反応が最終的にコンテンツの成功を決定します。

25:04

💡 クリエイター支援の重要性とYouTube戦略

YouTubeクリエイターは、視聴者に価値を提供するために努力していますが、成功するためには効果的なタイトルとサムネイルが不可欠です。これは、コンテンツが広く見られるようにするための「装備」であり、クリエイターの努力をサポートすることが視聴者にとって重要です。クリエイターを評価する際には、タイトルやサムネイルではなく、コンテンツの質とその提供する価値に焦点を当てるべきです。

Mindmap

Keywords

💡シンギュラリティ

シンギュラリティとは、人工知能(AI)が人間の知能を超える瞬間を指し、それによって技術的な進歩が加速し、予測不能な未来がもたらされる概念です。このビデオでは、AIの世界での衝撃的な出来事を紹介する文脈でこの用語が使われています。これは、AI技術の急速な進化とその社会への影響を示唆しています。

💡フィギュアロボット

フィギュアロボットは、ビデオで言及されている特定の人型ロボットを指します。OpenAIとMicrosoft Azureを活用し、次世代AIモデルの開発によって、商用展開を加速することが目指されています。このロボットは、AIの身体化された応用の一例であり、自律的な動作能力を示すデモンストレーションが強調されています。

💡マルチモーダルモデル

マルチモーダルモデルとは、複数の種類のデータ入力(テキスト、画像、音声など)を統合して処理できるAIモデルのことを指します。ビデオでは、OpenAIがフィギュアロボットとの共同で、ロボティクスの知覚、推論、および相互作用を拡張するためにこの種のモデルを開発していることが言及されています。

💡オートノミー

オートノミー、または自律性は、ロボットやシステムが人間の介入なしに独立してタスクを実行できる能力を指します。ビデオでは、フィギュアロボットが完全に自律的であり、プリスクリプトされた動きや遠隔操作なしに動作することが強調されています。

💡AGI(汎用人工知能)

AGIは、あらゆる知的タスクを人間と同等またはそれ以上のレベルで実行できるAIの形態を指します。ビデオでは、OpenAIがAGIを単一のエンティティとしてではなく、複数のピースが組み合わさることで形成されるというアプローチを取っていることが説明されています。

💡Sora

Soraは、ビデオ内で言及されているテキストからビデオを生成するAIプラットフォームです。このモデルは、他のプラットフォームよりも優れた性能を持ち、現実世界の現象をシミュレートする能力が特に強調されています。Soraは、3D空間をリアルに再現する技術として詳細に説明されています。

💡ガウス分布

ガウス分布は、ビデオ内でSoraが3Dシーンをレンダリングする際に使用される技術的な概念です。3D空間の表現にガウス(正規分布)を応用し、シーンの各部分の位置、色、透明度を定義する方法として説明されています。

💡物理シミュレーション

物理シミュレーションは、実世界の物理法則をコンピュータ上でモデル化し、シミュレートするプロセスを指します。ビデオでは、Soraが実世界の物理現象を模倣する能力、特に3D空間と物理ルールを暗黙的に学習することで、現実に近いシミュレーションを実現することが強調されています。

💡メタゲーム

メタゲームは、特定のゲームや環境内で最も効果的な戦略や行動様式を指す用語です。ビデオでは、YouTubeのタイトルとサムネイルの作成における現在の「メタ」が言及され、視聴者の注意を引くために使われる手法が説明されています。

💡ディープフェイク

ディープフェイクは、人工知能を使用して人々の顔や声を操作し、実際には存在しないビデオや音声クリップを作成する技術を指します。ビデオでは、この技術の進歩が議論され、特にリアルな映像生成能力の向上が強調されています。

Highlights

Figure is making headlines again with massive funding from Nvidia, Jeff Bezos, Microsoft, and OpenAI.

OpenAI and Figure signed a collaboration agreement to develop Next Generation AI models.

Figure leverages Microsoft Azure for AI infrastructure, training, and storage.

Investment from big firms and support from Microsoft Azure will ramp up Figure's timeline for humanoid commercial deployment.

Jeff Bezos invests heavily in robotics, not just through Amazon but also through Bezos Expeditions.

OpenAI is seemingly building AGI in pieces, which will eventually come together to form a greater whole.

OpenAI focuses more on building autonomous AI agents and a web search product, possibly similar to Perplexity.

Sora, an OpenAI video model, generates videos of stunning geometrical consistency.

Google's model Lumiere is anticipated to compete with Sora, but comparisons are yet to be made.

Sora's video generation capabilities suggest it's more of a world simulator, learning physical rules implicitly.

OpenAI has collected pieces of AGI, including technologies for creating more realistic and autonomous AI models.

Sam Altman discusses the need for a $7 trillion investment to build the infrastructure for AI services.

GPT-5 rumors hint at significant advancements in AI, automating many tasks and providing powerful analytics capabilities.

Feather, an OpenAI project, focuses on automated labeling and annotation, potentially revolutionizing data processing.

The integration of OpenAI and Figure highlights a significant step towards embodied AI, with Figure becoming a crucial part of OpenAI's AGI development.

Transcripts

play00:00

so the other day I'm scrolling through

play00:01

the singularity subreddit and a headline

play00:04

jumps out at me they're talking about

play00:06

something shocking and stunning that

play00:08

happened today in the world of AI now of

play00:11

course I click on it and what they

play00:12

revealed in there shocked me to my core

play00:15

well come back to that in just a bit but

play00:17

in other news figure is making the

play00:19

headlines once again we've already

play00:21

talked about the massive amounts of

play00:23

money that it's raising from Nvidia Jeff

play00:25

Bezos Microsoft openi Etc in addition

play00:29

openi and figure signed a collaboration

play00:31

agreement to develop Next Generation AI

play00:33

models here's open AI open AI plus

play00:36

humanoid robots we're collaborating with

play00:38

figure robot to expand our multimodal

play00:41

models to robotic perception reasoning

play00:44

and interaction now again we already

play00:46

knew that there's a lot of money going

play00:48

into this company from the big players

play00:51

but here's the new plot twist if you

play00:53

will so figure will leverage Microsoft

play00:55

Azure for AI infrastructure training and

play00:58

storage we are excited to partner with

play01:00

open Ai and Microsoft to to bring

play01:02

embodied AI to the real world and

play01:05

they're saying that this investment from

play01:06

these big firms as well as the added

play01:08

support from Microsoft Azure open AI

play01:11

this will ramp up figures timeline for

play01:13

humanoid commercial deployment and will

play01:16

be used for AI training manufacturing

play01:18

deploying more robots expanding

play01:20

engineering headcount advancing

play01:22

commercial deployment efforts here's

play01:24

February

play01:25

2024 this is the robot this is what that

play01:28

looks like so you can see it here up a

play01:30

box walking around so he's carrying an

play01:33

object this I got to say is an

play01:34

incredibly good demonstration of its

play01:37

abilities the coffee maker demo was it

play01:39

was a little bit hard to tell how

play01:41

complicated that was this is definitely

play01:43

a lot more compelling they're saying

play01:45

that this is fully autonomous so there's

play01:47

no teleoperation there's no prescripted

play01:51

movements now Jeff Bezos has been

play01:53

pouring a lot of money into robotics not

play01:55

just from kind of the Amazon fund the

play01:58

Amazon side but also through other

play01:59

investment arms of his as well so

play02:01

bezos's Expeditions is one of them and

play02:04

so this Amazon industrial Innovation

play02:05

fund is a 1 billion Venture investment

play02:07

program that's putting money into

play02:09

supporting robotics but definitely

play02:11

figure is the one that a lot of people

play02:13

are buzzing around there's definitely a

play02:15

lot of attention on it and the fact that

play02:17

opene is going to help develop the

play02:18

models behind it and Microsoft will use

play02:21

Azure to kind of power a lot of the AI

play02:24

infrastructure training and storage this

play02:26

kind of seems that figures now the deao

play02:29

of open AI robot the GPT robot in a

play02:32

previous video we talked about how open

play02:35

AI seemingly is building AGI not as one

play02:38

thing but rather in pieces these pieces

play02:40

they will come together to become one

play02:42

thing there's a similar Concept in these

play02:44

card games right where you have one one

play02:46

part representing an arm a leg the Torso

play02:48

and head Etc together when you get all

play02:50

of them on the board then the big thing

play02:52

sort of emerges so each of these things

play02:54

has its own use its own stats it's its

play02:57

own thing combined it kind of merges

play03:00

into this greater thing and that's kind

play03:02

of my take on how openai seemingly is

play03:05

building the AGI pieces openai focus

play03:08

more of its efforts on building

play03:10

autonomous AI agents Peter Willer open

play03:12

AI vice president of product remarked on

play03:14

X that the product new house described

play03:17

will change everything this thing that

play03:19

will change everything that really seems

play03:22

to be this autonomous agentic structure

play03:24

that they're developing right now call

play03:26

it this arm of AGI at the same time

play03:28

they're developing a web search product

play03:31

probably something similar to perplexity

play03:33

that may or may not be part of sort of

play03:36

Chad GPT it might be its own Standalone

play03:39

thing we of course seen Sora open AI

play03:41

video model capable of producing some

play03:44

pretty stunning footage that's uh almost

play03:47

near lifelike at this point here's a

play03:49

paper out of China just recently

play03:51

published Sora generates a video of

play03:53

stunning geometrical consistency

play03:56

stunning they say these are Gen 2 and

play03:58

paika the other two text to video AI

play04:01

platforms probably the most exciting

play04:02

ones up until now and here this massive

play04:06

massive thing is Sora right it's this

play04:09

massive Pentagon that's maxing out all

play04:11

the abilities that it's measured on

play04:13

speaking of pentagons or I guess in this

play04:15

case a hexagon but why is why is the

play04:18

North Pole on Saturn a hexagon it's a

play04:21

persistent hexagonal Cloud pattern

play04:23

around the North Pole of the planet

play04:24

Saturn the size of which is longer than

play04:27

the diameter of Earth probably nothing

play04:30

right but back to Sora it's

play04:32

significantly better than everything

play04:34

else out there now Google recently

play04:36

announced their model Lumiere so I'd be

play04:38

curious to see how that compares to Sora

play04:40

but so far we don't have access to them

play04:43

so here they're talking about gausin

play04:45

splatting renderings G in here you can

play04:48

think of as like little units from which

play04:50

the image is made so like pixels on a

play04:52

screen but instead of like 2D pixels for

play04:56

2D space we're using kind of a 3D

play04:58

representation so so in computer

play05:00

Graphics where in video games often

play05:01

times we use triangles to represent

play05:04

various scenes to represent various 3D

play05:06

models so here instead of triangles it's

play05:08

Gans here's kind of what that looks like

play05:10

this border is just for clarity but it's

play05:13

basically made up of the position like

play05:15

where it is covering so how it's

play05:16

stretched over some distance color and

play05:19

how transparent it is so here's multiple

play05:21

gossans drawn at once I God I hope I'm

play05:24

pronouncing that word right now what do

play05:25

you think 7 million gossans would look

play05:28

like well here that's what it would look

play05:31

like here's how that would look like if

play05:33

they're kind of more fully opaque but

play05:35

what's the point well the point is We

play05:36

Can Make Scenes like this we can create

play05:39

3D scenes where everything is kind of

play05:41

render on screen we're able to float

play05:44

around zoom in zoom out it's able to

play05:47

render these 3D spaces kind of from

play05:50

images you can give it a 2D image and

play05:52

it's able to kind of try to recreate

play05:54

that 3D space but p and Gen 2 don't

play05:56

really produce great results when this

play05:59

technique is is used to kind of render

play06:00

that c in 3D there's limited

play06:03

reconstruction scope which suggests that

play06:05

P and Gen 2 they kind of have a poor

play06:07

understanding those models have kind of

play06:09

a poor model of the 3D space they don't

play06:12

really get how it works Sora on the

play06:14

other hand seemingly does here's for

play06:17

example that uh that Coast as you can

play06:20

see here that 2D image that Sora renders

play06:23

is easily made into a 3D gausian sort of

play06:27

3D space Splats as they're called here's

play06:30

another example as you can see like the

play06:31

buildings kind of remain consistent and

play06:34

I mean you can tell what you're looking

play06:36

at you can see the different levels of

play06:38

of elevation that are shown in the image

play06:40

here's an example of that sort of

play06:43

western town I think it was so you can

play06:45

see here as we're kind of floating

play06:47

through it I mean you can see 3D space

play06:49

you know where the mountains are where

play06:50

the buildings are where the little creek

play06:52

is here's that sort of uh Museum scene

play06:56

right so the important thing to

play06:57

understand is that the video videos that

play06:59

sort of generate yeah I mean they're 2D

play07:02

images right or pixels in a 2d plane but

play07:05

when we try to recreate the 3D space

play07:08

we're able to recreate a lifelike

play07:10

realistic accurate 3D space and of

play07:13

course if you've been following up

play07:14

what's happening of Sora behind the

play07:15

scenes you kind of this is not a

play07:17

surprise right CU it's not just a little

play07:20

video producing machine it's a world

play07:23

simulator here they match sort of the

play07:25

geometric consistency between two views

play07:28

red represents discarded matching

play07:30

results green represents high quality

play07:32

matches Sora is really good Gen 2 and P

play07:35

not so much paa seems to be really good

play07:37

at keeping its logo in the same place

play07:39

and this paper points out that this

play07:41

newly developed Sora model has exhibited

play07:44

remarkable capabilities in video

play07:46

generation sparking intense discussions

play07:49

regarding its ability to simulate real

play07:51

world phenomena their point is we don't

play07:53

really have established metrics to

play07:54

evaluate its Fidelity how well does it

play07:57

create the real world physics the 3

play07:59

spaces Etc again the reason I bring this

play08:01

up is because I think people who think

play08:03

of Sora as just video generation might

play08:05

be missing the bigger picture it's more

play08:08

accurately something like a world

play08:10

simulator like a physics simulator but

play08:13

instead of something like an Unreal

play08:14

Engine where we kind of explicitly tell

play08:16

the engine we program the engine to

play08:18

create something some 3D structure or

play08:21

whatever other physics engine might be

play08:23

used in video games where we

play08:24

specifically kind of coded what what

play08:25

needs to be done sores soft physics

play08:28

simulation is an emerging property it

play08:30

learns implicitly the various things it

play08:32

needs to create those scenes 3D

play08:34

Transformations R trace renderings and

play08:37

physical rules so that's yet another

play08:39

part of sort of openi pieces of AGI

play08:43

which again some of openi employees kind

play08:44

of hinted at this but still this is kind

play08:46

of like conjecture right but if you're

play08:48

looking at what they're doing it

play08:50

certainly seems like they're like

play08:51

collecting all the various pieces and

play08:53

then Sam alman's quest to build much

play08:56

more compute much more various gpus and

play08:59

other the chips that are needed to power

play09:01

all the AI services that we're going to

play09:03

need right we've we've heard that7

play09:05

trillion doll figure quoted uh recently

play09:08

he came out he kind of explained that

play09:10

that's that's probably the number that

play09:11

the entire world all all the countries

play09:13

will need to invest over a period of how

play09:16

however many years right to build up the

play09:18

infrastructure and that whole entire

play09:20

sort of tech stack that we need to

play09:22

produce everything that we need so not

play09:24

just chips but everything everything

play09:26

everything that we need to sort of fully

play09:28

deploy all the AI services that we want

play09:30

to have right so that might be this

play09:33

piece then the much awaited GPT 5 let's

play09:35

say that's the head and torso so the

play09:37

rumors are that it will be able to

play09:39

automate a lot of work it will be

play09:41

smarter and better it's not going to be

play09:43

this revolutionary thing that will

play09:45

change everything right but it will be

play09:48

another powerful Leap Forward for those

play09:50

models and if you've seen if you've

play09:52

played around with Advanced data

play09:53

analytics you can imagine what that

play09:56

might look like for I mean anything you

play09:58

can do with Excel a lot of programming

play10:00

tasks a lot of writing tasks analytics

play10:03

Etc I've mentioned open eyes feather

play10:06

that they're still stubbornly refusing

play10:08

to let me in so this is what that looks

play10:10

like feather open.com so you need a

play10:13

feather account to continue but what is

play10:15

it here's the trademark application for

play10:17

it it's a systematic process systematic

play10:19

service using automated labeling and

play10:21

annotation of images audio video text

play10:23

and various other forms of data so I

play10:25

think it has something to do with so

play10:27

this is Microsoft an Interactive agent

play10:29

Foundation model so Microsoft Stanford

play10:32

UCLA but at the end here they described

play10:35

this agent sort of providing synthetic

play10:37

data labeling of these pictures right so

play10:40

they they show them a picture and it

play10:41

says this is what's happening in that

play10:43

picture the patient is awake and calm so

play10:45

this is for medical data for example

play10:47

they also do it for Minecraft what is

play10:49

happening in this Frame of Minecraft

play10:52

what do you think the next action will

play10:53

be here's another game that they're

play10:55

playing with it's a multiplayer game

play10:57

right and the model predict what's going

play10:59

to happen and they're using GPT 4 with

play11:02

vision to create these descriptions so

play11:04

here's a video a many frame video from a

play11:08

game called bleeding edge and we're

play11:09

asking GPT 4 Vision to give a simple but

play11:12

precise description of what the player

play11:13

is doing and GPT for vision outputs this

play11:16

so it describes what the player is doing

play11:18

it's running around interacting with

play11:19

different checkpoints fights against

play11:21

enemy players and captures the

play11:23

objectives so think about that for just

play11:25

a second here's a whole video that is

play11:28

completely sort of narrated by AI the

play11:31

descriptions the annotations are

play11:33

provided by AI the only sort of human

play11:35

involvement is writing this prompt right

play11:37

how useful would this be across many

play11:40

different Industries to review security

play11:42

camera footage to review what's

play11:44

happening in hospitals to make sure that

play11:46

the doctors and nurses are doing the

play11:48

correct procedures not only that but it

play11:50

creates really valuable data for

play11:52

training future AI models for example

play11:54

with Sora it sounded like what they did

play11:56

is they used something like this some

play11:58

something very very similar to annotate

play12:00

a bunch of various videos that Sora was

play12:03

trained on so they had a large amounts

play12:05

of video so they could have a human

play12:07

being sit there and kind of write here's

play12:08

what happens in this Frame here's what

play12:10

happens in this Frame right that would

play12:11

be slow super expensive might be not

play12:14

very accurate instead gp4 goes through

play12:17

and annotates everything then that data

play12:19

those those pairs right the video and

play12:21

the description that's given to Sora to

play12:24

train on and now it's able to do sort of

play12:27

the the opposite cuz before give it

play12:29

video with text descriptions and now

play12:31

we're able to give it just the text

play12:33

description and it'll spit out a video

play12:35

it will kind of make that video appear

play12:36

so all these pieces that open ey is

play12:39

building and collecting not only is it

play12:41

going to be automating a lot of things

play12:43

that we do helping people complete task

play12:46

faster being more effective at work but

play12:48

a lot of it also helps to build the next

play12:51

generation of AI models but opening eye

play12:54

has been kind of missing that that final

play12:55

piece the actual physical embodiment of

play12:58

of AI in the world AKA robots so this is

play13:02

from July 16th 2021 opening ey disbands

play13:06

its robotics research team after years

play13:09

of research into machines these hands

play13:11

that can solve a Rubik's Cube for

play13:13

example open AI said it shifted its

play13:15

focus to other domains where data is

play13:17

more readily available so this is a

play13:19

piece that they were going after but

play13:21

decided to not pursue it by the way

play13:23

since then since 2021 I think this piece

play13:26

became more readily available there's

play13:29

been a lot of progress on this front I

play13:30

think it's fair to say but now it seems

play13:32

like open Ai and Microsoft both are

play13:34

pouring more of their resources more of

play13:36

their talent all the tech that they have

play13:37

available into figure robot so open AI

play13:41

they've entered into a collaboration

play13:43

agreement to develop the Next Generation

play13:44

AI models for humanoid robots combining

play13:47

open AI research with figures deep

play13:49

understanding of robotic hardware and

play13:51

software and Microsoft Azure for various

play13:53

AI infrastructure training storage which

play13:55

kind of reminds me of this quote by run

play13:58

azure the endless stretch of digital

play14:00

ocean where leviathans swim the

play14:03

birthplace of AGI civilization in other

play14:06

news our ability to create deep fakes

play14:08

just got a whole lot better check this

play14:11

out so this is the reference image and

play14:13

that's the generated video here's that

play14:15

sort of image from Sora right where the

play14:18

ladies walking down the streets of Tokyo

play14:20

and here's I believe Mira Madi sea level

play14:23

executive at open AI we collaborate with

play14:26

and uh maybe we have sever of them and

play14:29

maybe they all have different

play14:31

competences and maybe we have a general

play14:34

one that kind of follows us around

play14:36

everywhere knows everything about uh you

play14:39

know my context what I've been up to

play14:41

today um what my goals are um sort of so

play14:45

I got to say that's pretty good so they

play14:47

use a lot of popular music in this so

play14:49

I'll link this page if you want to check

play14:50

it out cu the music is excellent I just

play14:52

can't I can't show it right cuz then

play14:54

I'll get hit with some copyright issues

play14:56

but you kind of got to see to believe it

play14:58

it's very accurate it's really good all

play15:01

right so there's no way I can play this

play15:02

but yeah it's realistic video it's

play15:04

realistic songs very realistic speech

play15:07

the thing you heard the term don't cry

play15:10

you don't need to cry just for from from

play15:12

a simple image which is really

play15:14

interesting because it captures her

play15:16

expression like that's a kind of a

play15:18

unique expression that she has and

play15:20

translates it to that entire sort of

play15:23

speech that she's doing the conversation

play15:24

crying is the most beautiful thing you

play15:26

can do I encourage people to cry I cry

play15:29

all the time and basically what to me

play15:32

this means is you can create images like

play15:34

this in mid journey of whatever

play15:36

characters you want and pretty much have

play15:38

fullblown videos stories that revolve

play15:41

around them and the paper behind this so

play15:43

it's the Alibaba group and the institute

play15:46

for intelligent Computing and it's

play15:47

called emo emote portrait alive

play15:50

generating expressive portrait videos

play15:52

with audio to video diffusion model

play15:54

under weak conditions and they're saying

play15:56

using a single reference image this is

play15:59

the reference image that you put in

play16:00

there they can create any duration video

play16:03

depending on the size of the audio clip

play16:05

so audio clip Plus image is turned into

play16:08

any length video so we don't have access

play16:11

to stuff like this yet like we can't use

play16:14

it yet but it's coming fast and it's

play16:16

probably very quickly going to be open

play16:18

source in the paper they actually

play16:20

mention in the related work kind of uh

play16:23

they mentioned stable diffusion so

play16:25

that's an open- Source model so I feel

play16:28

like once these once people see what

play16:30

it's capable of I mean they kind of

play16:32

describe what they did here so they talk

play16:33

about sort of what kind of data they've

play16:35

used 250 hours of footage and more than

play16:37

150 million images across multiple

play16:40

languages such as Chinese and English I

play16:42

would not be surprised if we see

play16:44

something like this you know either as a

play16:46

closed model with a monthly fee or

play16:49

perhaps even open source you know maybe

play16:51

before the year is out pabs has this

play16:54

kind of lip sync technology right now

play16:56

but this seems much much much better

play16:59

much more accurate able to run for

play17:02

longer periods of time I mean this to me

play17:03

is mind-blowing when I was a kid I feel

play17:06

like you heard the thing you heard the

play17:08

term like that looks like you know

play17:10

footage shot in whatever year that was

play17:12

shot in somebody doing an interview with

play17:14

just really high fidelity sound so AB

play17:17

absolutely stunning this isn't quite as

play17:19

good if you notice the lip movements me

play17:22

watching takes a night but it's not bad

play17:24

and keep in mind that this is probably

play17:26

difficult to replicate just because of

play17:27

the mascara and all that stuff this is

play17:29

like near perfect reality now you might

play17:32

be wondering okay but what was that

play17:34

shocked stunning thing that you saw that

play17:36

you clicked on that was so crazy well

play17:38

here it was right shocked and stunned

play17:41

clickbait titles with my name on it so

play17:44

my first thought was I am become meme

play17:47

shocker of Worlds and my second thought

play17:49

was why aren't I Schwarzenegger in this

play17:51

image but okay so I just wanted to very

play17:54

quickly kind of touch on this just to

play17:56

give my perspective because obviously

play17:58

this this is becoming a meme more and

play18:00

more people are sort of talking about it

play18:02

the reason I was trying to kind of avoid

play18:03

talking about it is because I think that

play18:06

that might actually make it worse but I

play18:08

kind of just wanted to tell people

play18:09

what's going on behind the scenes but

play18:12

okay I'll try to explain what's

play18:13

happening so a lot of you might have

play18:15

heard of Tim Ferris he was kind of one

play18:17

of the original Tech influencers I guess

play18:20

his 4-Hour workweek book was kind of

play18:23

groundbreaking at the time but in 1999

play18:25

he won the gold medal at the Chinese

play18:28

kickboxing National Championship while

play18:30

he had a martial arts background he

play18:32

probably wasn't the best person at that

play18:34

Championship but what he did was is he

play18:37

brought a technique that was fairly well

play18:39

known in the US in the wrestling and

play18:42

taekwan doe community and the trick was

play18:44

that the weigh-ins were the day prior to

play18:46

the competition so he used these

play18:48

dehydration techniques that I mean my

play18:51

understanding is it's very common in

play18:53

most martial arts competitions that you

play18:55

see on TV for example and so what he did

play18:57

was he he lost 28 lb in the 18 hours

play19:01

prior to weigh-in so he clocked in at

play19:03

165 lb at the weigh-in and then showed

play19:05

up on the day of ready to fight at 193

play19:09

lbs so of course it's pretty hard to

play19:11

fight somebody from three weight classes

play19:13

above you and then he proceeded to win

play19:15

by technical knockout and went home as

play19:17

the national champion now what do you

play19:20

think of that would you say that he

play19:21

cheated I mean technically everything

play19:23

was within the rules do you think what

play19:26

he did was unethical or do you what he

play19:28

did was you know disruptive and genius

play19:30

and wonderful and does your opinion

play19:32

change if I tell you that this spurred

play19:36

most of the fighters in the future

play19:38

championships of this nature to start

play19:40

using those same techniques it became

play19:42

the common practice now different people

play19:44

will have different opinions about it

play19:45

but there's actually a name for this in

play19:47

gaming it's called The Meta game so

play19:50

basically in essence a meta in gaming

play19:52

terminology is the generally agreed upon

play19:55

strategy by the community the strategy

play19:57

is considered to be the most oble way to

play19:59

win and has the best performance at a

play20:01

specific task some people even said that

play20:03

the acronym that it's that it's an

play20:05

acronym for most effective tactics

play20:07

available you've probably seen it if you

play20:08

played games different seasons have

play20:10

different metas League of Legends DOTA

play20:12

and stuff like that different

play20:13

combinations of characters can be great

play20:15

for particular times for example this is

play20:17

James alcher so he used to be I think he

play20:20

was like a master Grandmaster in chess

play20:22

as a kid then stopped playing for a long

play20:23

time came back as an adult and so he's

play20:25

saying that it's really different now I

play20:27

don't know if he used the word meta but

play20:30

I think the reason why it's different

play20:31

now is the meta game has changed a lot

play20:33

of it due to ai ai Unearthed new

play20:36

strategies new ways of playing and the

play20:38

new generation of chess players learn

play20:40

from that and the game changes with

play20:43

YouTube there's sort of two parts of how

play20:45

content gets shared distributed

play20:47

recommended Etc and the first is the

play20:50

title and thumbnail so kind of the

play20:52

packaging if you will the cover of the

play20:55

content right and the second part is the

play20:57

actual video kind of the long form

play20:59

content and if you nail this and this

play21:02

then your video does well but if you

play21:05

mess up the title or your video or if

play21:07

you have a great title and your video

play21:08

kind of sucks then it it doesn't do too

play21:11

well and I got to give credit to YouTube

play21:13

for this I think their algorithm while

play21:15

you know maybe not perfect is pretty

play21:17

good at serving up interesting stuff to

play21:19

watch especially if you're interacting

play21:21

with it quite a bit right if you're

play21:23

clicking on videos and you're watching

play21:25

them over time it figures out okay I

play21:27

know what you like here's some of the

play21:29

stuff that other people like that are

play21:31

similar to you so I got to say they do a

play21:33

pretty good job and if you if you're on

play21:35

YouTube watching I'm sure you can point

play21:36

out some issues with it but feel like

play21:38

overall they they kind of nailed it but

play21:40

here's a thing that I think people might

play21:42

not understand who are not in the sort

play21:44

of YouTube ecosystem the people that

play21:47

make the videos we have a lot of control

play21:49

over the content of the video I can say

play21:51

or do pretty much whatever I want in the

play21:53

video and as long as you like it you get

play21:56

some value out of it then it continues

play21:58

to get shared and and shown across the

play22:00

network like if I sit here for an hour

play22:01

and draw like a little like a little

play22:03

bear or something right and if as long

play22:05

as you're enjoying that Google's like

play22:07

yeah dude whatever that's fine as long

play22:09

as people are watching liking it

play22:11

commenting subscribing you can do

play22:12

whatever you want within limits

play22:14

obviously but so here we have a lot of

play22:16

control over what we do what we say Etc

play22:19

but on this side the title and thumbnail

play22:22

not quite as much we don't have as much

play22:24

control here there's a very specific

play22:26

metag game why do you think all those

play22:28

YouTubers put those stupid YouTube faces

play22:31

on their thumbnails do you think it's

play22:32

cuz they love making these faces can't

play22:34

they just smile and make a normal face

play22:36

instead of this sort of face why is this

play22:38

popping up in my feed why do Mr Beast

play22:41

one of the top YouTubers why does he

play22:42

have the same glazed over eyes and weird

play22:45

smile in every single video is because

play22:47

those little weird faces and whatnot

play22:49

they'll pull a little bit more clicks

play22:51

and that'll kind of tip the balance in

play22:52

favor of that video it'll do a little

play22:55

bit better and what happens if you don't

play22:56

play the game well then Tim Ferris comes

play22:58

along and just kind of manhandles you

play23:00

out of the Arena those are the rules of

play23:03

YouTube and they are unflinchingly rigid

play23:06

now I wasn't the first to do those shock

play23:07

titles I saw other people doing it and

play23:10

it was extremely effective and the

play23:12

reason that a lot of people are doing it

play23:14

you know the goal isn't to deceive you

play23:16

or annoy you or anything like that the

play23:18

goal is to just make sure the video gets

play23:20

seen and these shocking titles is just a

play23:22

kind of this fun way to do that to make

play23:24

sure that it's that it gets out there to

play23:26

the people that find that annoy just

play23:28

understand that's the current meta and

play23:30

it too shall pass but then there's going

play23:32

to be another meta and it's probably

play23:33

going to annoy you still right maybe the

play23:36

next one will be us making those stupid

play23:37

surprised faces going I can't believe

play23:40

this this just happened or whatever but

play23:42

the point that I would like to make is

play23:44

this for all the creators that you like

play23:46

whether that's me or somebody else the

play23:48

people that are informing entertaining

play23:50

you you know I just ask this judge them

play23:52

on this the long form content they

play23:55

produce that's what they have control

play23:56

over that's what they make make for you

play23:59

you might notice I don't have too many

play24:00

sponsorships I don't think I had a

play24:01

single sponsorship on this channel right

play24:03

I'm not pitching you a paid course I'm

play24:06

not pitching you a paid Discord Channel

play24:09

my stuff is just 100% content would you

play24:11

like it better if these were you know

play24:14

toned down but I spent half the video

play24:16

pitching you some product probably not

play24:19

so my take on this and this is not just

play24:21

for AI but anything else you know for

play24:23

the people that you like what they do

play24:25

the content engage with that content

play24:27

judg it based on that long form content

play24:30

this is what we make for you this the

play24:33

title and thumbnail this is more like

play24:36

our lone little Gladiator right it's

play24:38

this Gladiator with a sword and shield

play24:41

and we have to give it all the resources

play24:43

and equipment that it needs to go out

play24:45

there and fight all the other YouTube

play24:48

videos that have their thumbnails and

play24:50

titles and if we make a boring thumbnail

play24:52

and a boring title that's like sending

play24:54

this guy out there with no armor or

play24:56

weapons it has no no chance it would

play24:59

break my heart to do that so I armed

play25:01

them with the sword of shock and the

play25:03

shield of stun and then send them out

play25:06

there to battle all the other YouTube

play25:08

videos and this makes sure that the

play25:10

video that I made gets seen it would

play25:12

break my heart to send them out there

play25:13

without equipment so please do not ask

play25:16

me to so to all the people that were

play25:18

super cool about this and just having a

play25:20

lot of fun with this I really appreciate

play25:22

you thank you so much to the people that

play25:24

maybe didn't like this so much but we

play25:27

still kind of cool about it I hope this

play25:28

kind of explains why we're doing it it's

play25:31

not to be deceptive it's just how the

play25:34

game is played which is giving our

play25:36

little guy the best chance to win and I

play25:38

think just the best sort of thing that

play25:39

you can do is just judge us by the

play25:42

content right don't judge the book by

play25:44

its cover right that old cliche like

play25:46

just don't judge the video by the

play25:48

thumbnail those are two separate things

play25:51

and to those of you that say that me

play25:53

using the word shocking in the title

play25:56

makes me the wor YouTuber you've ever

play25:59

heard of I would just like to point out

play26:01

one thing but you have heard of me my

play26:04

name is Wes Roth and thank you for

play26:06

watching

Rate This

5.0 / 5 (0 votes)

Do you need a summary in English?