10 Things About OpenAI SORA You Probably Missed

The AI Advantage
22 Feb 202423:17

Summary

TLDRSora,由OpenAI发布的AI视频生成器,正在革新视频制作领域。它通过文本提示生成视频,降低了制作成本,提高了效率。尽管目前Sora生成的视频缺乏音频和细节编辑功能,但结合其他AI工具,未来有望实现音视频一体化生成。Sora的出现预示着视频制作的新时代,可能会对传统视频制作行业产生深远影响。

Takeaways

  • 🎥 Sora是由OpenAI于2024年2月15日发布的视频生成器,它能够根据文本提示生成视频内容。
  • 🔍 视频生成器的发布伴随着大量的炒作,但通过深入研究和探索,发现了许多不为人知的功能。
  • 🎶 Sora目前仅生成视频,不包含音乐或声音效果,但11 Labs已经发布了一个能够从文本提示生成整个音景的音频生成器。
  • 💡 音频和视频生成器的结合将创造出全新的音视频生产工具,极大地降低了个人制作商业广告等视频内容的成本。
  • 🌟 Sora的新技术包括视频扩展和视频循环生成,这些功能未来可能会成为视频编辑软件的标准配置。
  • 🚀 Sora的出现大幅降低了视频制作的门槛,使得高质量视频制作变得更加容易和可访问。
  • 📌 Sora目前还无法进行细节编辑,但随着技术的发展,未来可能会实现对视频细节的精确控制和编辑。
  • 📖 通过单个文本提示,Sora能够生成完整的故事,预示着未来可能仅通过文本提示就能生成完整的电影或电视节目。
  • 🕒 Sora的发展阶段类似于GPT-3的前身,预示着AI视频技术将迅速发展并在未来几年内实现更多功能。
  • 📷 Sora将改变视频素材和库的生产方式,使得个人和小团队能够以极低的成本生成定制的视频素材。
  • 🌐 Sora的技术最终可能实现3D世界的生成,为视频制作、游戏开发和其他领域带来革命性的变化。

Q & A

  • Sora是由哪家公司发布的?

    -Sora是由OpenAI公司发布的。

  • Sora发布日期是什么时候?

    -Sora发布于2024年2月15日。

  • Sora视频生成器在发布时缺少了哪个重要元素?

    -Sora视频生成器在发布时缺少了音频元素,所有示例视频都是静音的,没有背景音乐或声效。

  • 11 Labs发布了什么来补充Sora的视频生成能力?

    -11 Labs发布了一个新的声音生成器,它能够根据文本提示生成整个音景。

  • Sora的一个新功能是什么,这在以前是不可能的?

    -Sora的一个新功能是可以延长视频,这意味着可以无缝地过渡到视频的开头,生成全新的视频内容。

  • Sora如何改变了视频制作的成本?

    -Sora极大地降低了视频制作的成本,使得原本需要大量时间和资源的高质量视频制作变得简单快捷,甚至一个人就可以完成整个制作过程。

  • Sora的编辑性如何?

    -目前Sora在编辑性方面存在限制,如果客户有反馈需要修改细节,可能需要重新生成整个场景。但随着技术的发展,未来可能会有更多编辑工具出现,以满足编辑需求。

  • Sora能否从单个提示生成故事?

    -是的,Sora能够从单个文本提示生成完整的故事,这展示了其在内容创作方面的潜力。

  • Sora在AI视频发展中处于什么位置?

    -Sora在AI视频发展中类似于GPT-3在大型语言模型中的位置,它是一个强大的工具,但还有待进一步的完善和普及。

  • Sora对视频制作行业有哪些潜在的影响?

    -Sora可能会改变视频制作行业的许多方面,包括降低成本、提高制作效率、创造新的视频内容形式,甚至可能影响到视频素材和版权的管理和销售。

  • Sora能否生成3D世界?

    -Sora有潜力生成3D世界,通过称为Goshan Splatting的技术,可以将视频转换成3D模型,并在游戏引擎如Unity中进一步使用。

Outlines

00:00

🤖 Sora AI视频生成器的潜力与挑战

本段介绍了Sora AI视频生成器的发布背景和作者Eigor对它的深入研究。Sora由OpenAI发布,引发了大量讨论。Eigor花费大量时间研究Sora的功能和潜力,发现除了视觉效果外,音频同样重要。虽然Sora生成的视频没有音乐或声效,但11 Labs发布了一个声音生成器,可以与Sora结合,生成完整的音景。Eigor预测,不久的将来,AI将能够自动生成背景音乐、声效和角色对话,彻底改变音视频制作流程。

05:01

🎥 Sora的创新功能与成本降低

这一部分讨论了Sora带来的创新功能,如视频扩展和循环生成,以及它如何降低视频制作成本。Eigor提到,Sora可以无缝扩展视频片段,创造出全新的帧,这是前所未有的。此外,Sora还能生成可以无限循环的视频,这可能会成为互联网上新的梗。Eigor还指出,Sora的出现将大幅降低制作高质量视频的成本,使得原本需要大量人力物力的项目变得易于实现。

10:03

🌐 视频编辑的新时代与个性化内容

在这一段中,Eigor探讨了Sora对视频编辑行业的影响,以及如何通过单个文本提示生成故事。他提到,Sora和类似的AI工具将使得视频编辑变得更加简单,甚至可以生成具有特定格式和风格的内容。Eigor还预测,随着技术的发展,AI将能够根据用户的反馈和需求,对视频进行细致的编辑和调整。

15:04

📹 视频制作的未来趋势与个性化库

Eigor在这一段中讨论了Sora对视频制作行业的长远影响,特别是在个性化视频库的创建上。他预测,视频制作者将能够利用Sora生成特定的视频片段和背景,而不需要购买或拍摄实际的素材。这种能力将极大地降低成本,提高效率,并且使得视频制作更加个性化和创新。

20:05

🌍 3D世界构建与Sora的无限可能

最后一段聚焦于Sora在3D世界构建和模拟方面的潜力。Eigor提到,Sora不仅可以生成逼真的视频,还能将这些视频转换成3D模型,为游戏引擎如Unity提供素材。此外,Sora通过简单的文本提示就能生成特定风格的世界,如Minecraft。Eigor对这项技术的未来发展充满期待,同时也对未来的可能性感到既兴奋又畏惧。

Mindmap

Keywords

💡Sora

Sora是由OpenAI发布的AI视频生成器,它能够根据文本提示生成视频内容。在视频中,Sora代表了视频制作领域的一次重大技术突破,它不仅能够降低视频制作的门槛,还有可能改变整个行业的工作方式。

💡AI视频生成器

AI视频生成器是一种利用人工智能技术根据文本或其他输入自动生成视频内容的工具。这类工具的出现,预示着视频制作行业的重大变革,使得个人和小团队也能制作出高质量的视频内容。

💡音频生成

音频生成是指通过人工智能技术,根据文本提示或其他输入生成相应的声音和音乐。在视频制作中,音频与视觉效果同等重要,音频生成技术的发展使得音频创作变得更加便捷和高效。

💡视频编辑

视频编辑是指对拍摄的视频素材进行剪辑、调整和优化,以达到预期的视觉效果和叙事目的。随着AI技术的应用,视频编辑变得更加高效,AI能够自动完成一些复杂的编辑任务。

💡成本降低

成本降低是指通过技术进步或效率提升,减少完成某项任务或生产某种产品所需的费用。在视频中,Sora的出现大幅降低了视频制作的门槛和成本,使得个人和小团队也能负担得起高质量的视频制作。

💡视频制作

视频制作是指创作视频内容的过程,包括策划、拍摄、编辑等多个环节。随着AI技术的发展,视频制作变得更加便捷和高效,AI视频生成器如Sora能够自动完成许多传统上需要人工完成的任务。

💡技术报告

技术报告是对某一技术或产品进行详细分析和评估的文档,通常包含了技术规格、性能指标、应用案例等信息。在视频中,技术报告被用来深入了解Sora的功能和潜力。

💡社交媒体

社交媒体是指允许用户创建、分享或交换信息、想法、图片/视频以及网络上的内容的平台和应用。在视频中,社交媒体是讨论和传播Sora等AI技术的重要渠道。

💡视频库

视频库是指存储和组织视频内容的集合,通常用于提供素材供视频制作者使用。随着AI技术的发展,视频库的概念可能会发生变化,因为AI能够生成大量定制化的视频内容。

💡3D世界生成

3D世界生成是指利用计算机图形学和人工智能技术创建三维虚拟世界的process。这些技术可以用来模拟现实世界或构建完全虚构的环境,广泛应用于游戏开发、虚拟现实和电影制作等领域。

Highlights

Sora视频生成器由OpenAI发布,于2024年2月15日发布。

Sora目前仅生成视频,所有示例都是静音的,没有音乐或声音效果。

11 Labs发布了一个新的声音生成器,可以从文本提示生成整个音景。

Sora可以扩展视频,这是之前不可能的,它从头开始生成视频。

Sora能够生成额外的帧,让视频无缝循环。

Sora大幅降低了制作视频的成本,这可能会改变视频制作行业。

Sora的编辑性有限,目前无法对生成的视频进行细节修改。

Runway ML引入了多动作刷工具,允许在视频中仅对特定部分进行动画处理。

Sora可以从单个提示生成整个故事。

Sora的发展速度相当于从GPT-2直接跳到GPT-3,预示着AI视频技术的快速发展。

Sora可能会成为库存视频的终结者,因为它能以极低的成本生成视频。

Sora能够生成任何格式的视频,从手机格式到宽屏格式。

Sora可以用于3D世界和世界生成,将视频转换为3D模型。

Sora生成的视频在时间上是一致的,可以用于3D环境和角色的创建。

Sora的发布标志着AI视频生成技术的一个新时代,预示着未来的无限可能。

Sora的能力和应用正在迅速发展,未来可能会有更多的创新和突破。

Sora的发布和演示视频可以在OpenAI的官方页面上找到,供人们尝试和体验。

Transcripts

play00:00

Sora the video generator by open AI

play00:02

released on February 15th 2024 and I've

play00:05

spent pretty much every hour of my life

play00:06

scouring the internet and researching

play00:09

what else this could do and there's

play00:11

actually a lot of things that weren't

play00:12

obvious in the middle of all the hype

play00:14

that accompanied the release of this AI

play00:16

video generator I studied a technical

play00:18

report on detail watched all the YouTube

play00:19

videos spent an unhealthy amount of time

play00:21

on Twitter looking for all the

play00:23

discussions and the little findings

play00:25

people had matter of fact since release

play00:27

I didn't even leave the

play00:28

apartment

play00:31

if we haven't met yet I'm eigor I made

play00:32

it my full-time calling to research what

play00:35

AI has to offer and how to put it to

play00:36

work in your everyday life and before

play00:38

doing that with the a Advantage I had a

play00:40

video production company that operated

play00:41

for eight years in Central Europe I

play00:43

helped clients with everything from

play00:45

corporate video trainings to directing

play00:46

smaller commercials and even shooting

play00:48

festivals nightclub videos when it comes

play00:49

to videography I've really seen it all

play00:51

and this stuff is exactly in the middle

play00:53

between technology and video production

play00:55

so I can't wait to dive into all of this

play00:57

all right so without further Ado let's

play00:58

look at all the implications of Sora

play01:00

that you might have not been aware of

play01:01

right away okay so first of all I want

play01:03

to talk about audio because Sora only

play01:05

generates video right all the example we

play01:06

saw

play01:08

were muted without music or sound

play01:11

effects in the background and a lot of

play01:13

people rightfully pointed out that hey

play01:14

in film it's really 50/50 at the very

play01:17

least it's 50% visuals and another 50%

play01:21

audio and there's many layers to that

play01:23

right you might have the actor's voice

play01:25

as one track but then there's also sound

play01:27

effects of things happening around them

play01:28

and then you have foli which is the

play01:30

background sound that just persists

play01:32

you're not really consciously aware of

play01:33

it but it's there and if it's not there

play01:35

the shot is missing something so surely

play01:37

audio must be a complicated issue too

play01:39

right well not really because 11 Labs

play01:41

actually reacted to the Sora release and

play01:43

they released a new sound generator that

play01:45

from text prompts is able to generate an

play01:47

entire soundscape okay so today we don't

play01:50

have access right but if open AI hooked

play01:52

up Sora to this audio generator you

play01:55

would have a audio visual generator

play01:57

where you create full soundscapes have a

play01:59

quick listen

play02:05

and sure a sound designer could do this

play02:07

manually but again if you're a oneman

play02:09

show and you're producing a commercial

play02:11

like I did so so many times you're doing

play02:13

everything yourself from planning to

play02:14

recording editing doing the sound design

play02:16

doing the color grading doing feedback

play02:18

rounds with the client invoicing and

play02:20

often times you don't have budget for a

play02:21

sound designer so you bet that there's

play02:23

going to be models I don't know if Sora

play02:25

or others that combine both they're

play02:26

going to give you audio visual outputs

play02:28

this is not a question that's just a

play02:30

straight fact at this point and with

play02:32

tools like sun AI out there already that

play02:34

can generate full songs including lyrics

play02:37

at a decent quality with AI well you're

play02:40

going to be able to generate the

play02:40

background music the background sound

play02:42

effects the voices that are in the scene

play02:45

because voice generators are thing and

play02:46

they're virtually indistinguishable

play02:48

already right and now the video

play02:49

components so we really have the full

play02:51

stack for audiovisual production it's

play02:54

just a question of time now and from my

play02:55

estimate it looks to be months not years

play02:57

till we'll get there okay my next point

play02:58

is all about the capab abilities of Sora

play03:00

that are actually brand new because a

play03:02

lot of the stuff that we saw just

play03:04

drastically reduce the cost of what it

play03:06

takes to produce a clip like this or an

play03:09

animated video like this you might be

play03:11

aware that movies like this exist right

play03:13

it just cost a lot of money to produce

play03:14

this so first of all let's talk about

play03:15

the things that are actually brand new

play03:17

and not just a cost reduction although

play03:19

that has its implications too and we'll

play03:21

talk about that but the things that are

play03:23

actually new are first of all you can

play03:25

extend videos okay so this is

play03:27

beautifully outlined in a technical

play03:28

paper here and it shows the example of a

play03:30

San Francisco subway car so as you can

play03:32

see this clip is the same in all three

play03:34

instances but if you back up a little

play03:36

bit then extended the beginning of it

play03:38

okay so as you can see the video

play03:39

generated by Sora is different every

play03:42

single time and it seamlessly

play03:44

transitions into the subway car so this

play03:46

is something that was not possible up

play03:48

until now okay it generates this video

play03:49

from scratch now I guess you could argue

play03:51

that you could recreate this entire

play03:53

scene in 3D and then create the frames

play03:54

before that and seamlessly transition

play03:56

into it but you have to realize that at

play03:57

a certain point this is going to become

play03:58

a feature in every editing software

play04:01

right you'll have just an image and it

play04:03

will turn it into a video and then you

play04:05

can extend it to any duration you can

play04:07

add a clip before add a clip after

play04:09

you'll be able to turn your old family

play04:10

photos into Vivid memories sort of that

play04:13

is really scary but it's going to be a

play04:15

thing and you bet apps like Instagram at

play04:17

one point I don't know when are going to

play04:19

have a feature where you're going to be

play04:21

able to turn a photo into video and then

play04:23

extend that indefinitely another new

play04:25

capability is you're going to be able to

play04:26

Lo videos okay and this is also

play04:28

something that you could kind of but not

play04:30

really achieved today definitely not in

play04:32

this form okay you'll give it a video

play04:33

clip and it will be generating extra

play04:35

frames that will seamlessly let the

play04:37

footage loop I had a good chat with a

play04:39

friend and we kind of talked about how

play04:40

this could be the new Rick rolling on

play04:42

the

play04:47

internet because if you do this to a

play04:49

longer clip you just don't realize that

play04:50

it's looping and that it's just playing

play04:52

forever so you could send somebody a

play04:54

clip and it might take them minutes to

play04:55

realize that the whole thing is looping

play04:57

and just repeating over and over again

play04:58

anyway this is something that was was

play04:59

not really possible and some people went

play05:01

ahead and tried this anyway in

play05:02

videography there was this whole Trend a

play05:04

few years back where people were trying

play05:05

to seamlessly transition one thing into

play05:07

another like for

play05:10

example and my shirt is gone magic now

play05:14

those are the simplest way to do it but

play05:15

here we will have the capability of

play05:17

generating brand new frames and things

play05:18

will be able to Loop indefinitely okay

play05:20

so those are the new features you can

play05:22

expect in editing software somewhere

play05:24

down the line but then there's a lot of

play05:25

the ones that are just simple cost

play05:27

reduction this is why people refer to it

play05:28

as the death of Hollywood in many cases

play05:30

now I don't know if that's an accurate

play05:31

assessment in my opinion I think they're

play05:33

going to use this Tech to Advantage to

play05:35

lower the prices of production and pump

play05:37

out even more content we'll also talk

play05:39

about that soon but let's finish up the

play05:40

segment and talk about the things that

play05:42

were already available but now it's just

play05:43

a 10,000x reduction cost for that

play05:45

calculation I see a subscription price

play05:47

that is somewhere around the GPT plus

play05:49

plan so what's going to be possible at

play05:50

this super low cost is first of all

play05:52

generating images we're able to do that

play05:53

with other image generators right sure

play05:55

these are hyper realistic and very high

play05:57

quality just like M journey and so but

play05:59

then it's capability to turn images into

play06:01

videos that is very very big in my

play06:03

opinion because it's going to make it so

play06:05

easy to craft compelling videos like I

play06:08

feel like most people that talk about

play06:09

this don't appreciate how much this is

play06:11

going to lower the barrier for entry for

play06:13

videography and high quality videography

play06:15

that is because you're going to get

play06:16

access to things like this so even if

play06:18

you've seen this before I think I have a

play06:19

bit of a different perspective here so

play06:20

look here on the left you have the Drone

play06:22

image here on the right you have this

play06:23

butterfly right and here in the middle

play06:24

you have the mix of the two where the

play06:25

Drone is flying through something like

play06:27

the Coliseum and then it morphs into a

play06:29

butterly fly and look I could do this

play06:30

today okay this just takes about 3 to 5

play06:33

hours of work dependent on your skill

play06:35

level you just go into after effects and

play06:36

you rotoscope out this butterfly meaning

play06:39

you go frame by frame that's 25 frames

play06:41

every single second and you make sure

play06:43

you animate a mask exactly in the form

play06:45

of the Butterflies wings and you redo

play06:47

that for every movement now yes there's

play06:49

tools that help you but a lot of times

play06:50

you're stuck with manual labor there so

play06:52

it might just turn out that the 3 to 5

play06:54

hour task turns it into 15 20

play06:57

hours and then you can bring the

play06:59

butterfly into here and morph it into

play07:01

the Drone with something like a morph

play07:03

cut inside of Premiere Pro now if none

play07:04

of that means anything to you that's

play07:06

fine I'm just saying hours of work are

play07:08

going to be done like

play07:10

this and this is just one simple example

play07:13

in many others a oneman crew could never

play07:15

do this right all these animation

play07:17

related examples where they turn an

play07:19

image into an animation like this are

play07:21

usually just not feasible for a oneman

play07:22

show it takes too much time to animate

play07:24

all the little things you might be able

play07:26

to do it for a few shots but if you do a

play07:27

whole one minute trailer you'll find

play07:29

that you spend 2 weeks at the computer

play07:31

if you really animate all the little

play07:32

details like in this shot and you have a

play07:34

lot of different shots so that's my

play07:35

second point it lowered the bar by a

play07:37

factor that is larger than most people

play07:40

realize I don't know if it's 1,000x or

play07:42

10,000x but a lot of these things were

play07:45

Unthinkable for small Crews or oneman

play07:47

shows and now they will be doable like

play07:49

for example before

play07:55

after Okay so this point is all about

play07:57

the editability of the video and here in

play07:59

Twitter Owen Fern went ahead and he

play08:01

criticized the fact that hey yes these

play08:03

Generations are absolutely incredible

play08:04

but what if the client has feedback and

play08:07

this is very very appropriate criticism

play08:09

in my opinion because clients always

play08:11

have feedback and if you're going to use

play08:12

this for job if this is supposed to be

play08:14

the death of Hollywood just between

play08:16

directors and producers there is so much

play08:18

feedback going on in the post-production

play08:20

of any advertisement movie heck even if

play08:22

it's an event video I had clients that

play08:24

went back and forth 10 times and gave

play08:26

feedback over and over again and I had

play08:28

to adjust things so one points out here

play08:30

that yeah there's going to be a lot of

play08:31

little details that will need to be

play08:32

changed about these scenes and with Sora

play08:35

you're not really able to go back and

play08:36

change little details right you're going

play08:38

to have to regenerate the whole scene

play08:40

and maybe you like the character here

play08:42

but you just don't like the fact that

play08:43

this is not a Thum it just looks like a

play08:45

fifth finger and we would like to give

play08:46

it a look of a Thum can we do that and

play08:48

his point is the answer has to be no and

play08:50

then you have a dissatisfied client

play08:52

which is a very fair point but as I've

play08:54

been following this very closely over

play08:56

the last months there's one tool and one

play08:58

research that needs to to be pointed out

play09:00

here okay first things first Runway ml

play09:02

the previous so to say leader in AI

play09:04

video a few weeks ago introduced a

play09:06

feature called multi motion brush tool

play09:08

which allowed you to use multiple

play09:10

brushes on the video to just animate

play09:12

specific parts now that is for animation

play09:15

but over in M journey and many other

play09:16

image generators you're able to do

play09:18

something called inpainting where you

play09:20

just paint in a little part of the image

play09:22

and then edit just that you can reprompt

play09:24

it so on images today you could actually

play09:27

go in and just paint in this Thum and

play09:29

say regenerate the Thum why would that

play09:32

not be possible on video eventually it

play09:34

will be and further than that bite Dan

play09:36

the creator of Tik Tok actually

play09:38

published a research paper less than a

play09:40

week ago about this so-called boxor okay

play09:42

so I didn't cover it on the channel

play09:44

because I like to cover things that are

play09:45

available today or truly truly

play09:47

revolutionary this kind of Falls in this

play09:49

in between zone of hey really

play09:50

interesting but it's not available and

play09:52

in my eyes probably not worth a

play09:54

dedicated video but look the whole point

play09:55

of this is you draw different boxes in

play09:57

the scene and thereby you can control

play09:58

the seen in great detail so if you

play10:00

select the balloon and say it's going to

play10:02

fly away in this direction and then you

play10:04

select a girl and she's going to run in

play10:05

a different direction exactly that is

play10:07

going to happen so between tools like

play10:09

the box imator and inating in mid

play10:11

Journey it's just a question of time

play10:12

where you're going to be able to use a

play10:14

mix of these tools and also in paint on

play10:17

top of AI video now sure there's going

play10:18

to be a temporal axis there right

play10:20

because on images you only have the X

play10:22

and Y AIS and in video there's also the

play10:24

time axis and sometimes you even have

play10:26

movement in zspace but between This

play10:28

research and painting I can totally see

play10:30

that happening for AI video 2 down the

play10:33

line plus as we know with prompt

play10:34

engineering today for language based

play10:36

models there's a lot of control that you

play10:39

have in the text prompt you just have to

play10:40

be really detailed if you look at a lot

play10:42

of these prompts they're good but

play10:43

they're not as detailed as they could be

play10:45

some of the best stable diffusion

play10:47

prompting is extremely detailed also in

play10:49

mid journey in stable diffusion if you

play10:50

keep your prompts relatively simple

play10:52

you're going to get varied results even

play10:53

if you roll the dice and create a new

play10:55

scene it's going to be very similar plus

play10:56

let's refer back to Mid Journey again

play10:58

they just recent recently announced a

play10:59

new character tool where it's going to

play11:01

maintain character consistency based on

play11:03

a character that you pick in a tool so

play11:05

all of these AI image features that

play11:07

we've been talking about and I've been

play11:08

tracking regularly they're going to

play11:09

apply to video tool it's just going to

play11:11

take longer but I absolutely believe

play11:13

that we'll be able to implement all of

play11:14

this little feedback into AI video and

play11:17

therefore this actually being production

play11:19

ready at some point okay so my next

play11:20

Point here is that I didn't expect right

play11:22

in a beginning is that you can prompt

play11:24

stories into existence from a single

play11:26

prompt okay so here's an example from

play11:28

Bill PE from the open AI team and he

play11:30

generated an entire story of two dogs

play11:33

that should walk through NYC then a taxi

play11:35

should stop to let the dogs pass across

play11:37

walk then they should walk past the

play11:38

pretzel and hot dog stand and finally

play11:40

they should end up at Broadway signs and

play11:42

if you follow this channel you might

play11:44

know how much context you can add text

play11:45

prompts to achieve exceptionally

play11:47

accurate results from things like chat

play11:49

GPT if you added way more details here I

play11:51

believe they would be reflected in it

play11:52

and then the story can develop and as

play11:54

right now you already have tools that

play11:56

can manipulate someone's mouth to speak

play11:58

in another language so it looks

play11:59

naturally also that will be possible

play12:01

here so you will be able to create these

play12:03

long shots like they have in movies

play12:05

which are incredibly difficult to

play12:06

achieve I mean some movies like Dunkirk

play12:08

took it so far where they turned the

play12:10

movie into a single Take It All flows

play12:12

seamlessly and Sora is able to do it too

play12:15

and that I didn't expect at the

play12:16

beginning also they didn't share this

play12:17

example right off the bat I think this

play12:19

is actually very very impressive and if

play12:21

now we're already able to generate

play12:22

stories from a single Simple Text prompt

play12:25

it's just a question of time until we

play12:27

arrive at something like this where you

play12:29

just type in a prompt and you get a full

play12:30

movie back or a full show I mean at some

play12:32

point it's just a question of having

play12:34

enough gpus this is obviously just a

play12:35

mockup but something to think about

play12:37

especially because this is the worst

play12:39

teack is ever going to be and you know

play12:40

what let's talk about that point that is

play12:42

actually my next one so where are we in

play12:44

the timeline of this okay it was really

play12:46

helpful to look into some of the

play12:47

discussions that are happening online to

play12:49

orient myself in terms of where we

play12:51

actually are today soad most St from

play12:53

stability AI actually had a fantastic

play12:55

take here he compared this to the gpt3

play12:58

of video model models so if you didn't

play12:59

know gpt3 was the predecessor to chat

play13:02

GPT okay it was available before but the

play13:04

interface was not as intuitive and you

play13:06

actually had to prompt it differently

play13:08

rather than cat gbt that had

play13:09

reinforcement learning for human

play13:10

feedback which means a lot of humans

play13:12

feedbacked the outputs to make it more

play13:15

user friendly for humans and that's

play13:17

where this is at right now okay it's not

play13:19

at the cat GPD point where it's going to

play13:20

be really easy to use and it's going to

play13:22

gain Mass popularity and then we got

play13:24

gbd4 and all the additional features and

play13:26

it's just crazy capable now and he even

play13:27

said that all the images generators like

play13:29

stable diffusion were more comparable to

play13:31

gpt2 where the quality of the output was

play13:33

not nearly as good as gpt3 so as in

play13:35

large language models this puts us on

play13:37

the timeline somewhere in the middle of

play13:39

2022 because the chat gbt gbt 4 llama

play13:43

and mistrals will come over the next few

play13:46

years we Rems at the pace that we're

play13:48

moving ahead right and on this topic

play13:49

there's another fantastic Fred by Nick

play13:51

samier here on X and he ran all the

play13:54

exact prompts that Sora generated

play13:56

through my journey and then paired them

play13:57

with the results and the thing is

play13:59

they're shockingly similar right so

play14:01

people are already joking that hey is my

play14:03

journey just open AI disguised probably

play14:05

they're just using very similar training

play14:06

data right but look at that all of these

play14:08

examples are very similar now I'm sure

play14:10

these are the ones that were the most

play14:12

similar right to create this illusion of

play14:14

it essentially being the same model here

play14:16

I mean if you look closer the beaver is

play14:18

very different but the point is these

play14:19

are not night and day right sure these

play14:22

helmets are completely different but the

play14:24

Cinematic look is very similar with

play14:25

slightly different color grading down

play14:27

here fair but the point that I'm trying

play14:29

to make here is that we literally

play14:30

skipped two to three years ahead in AI

play14:33

video because what we had up pela was

play14:35

something like gpt1 or

play14:39

gpt2 oh that's hot now we got gpt3 that

play14:42

is actually usable and can create useful

play14:44

outputs that are essentially hyper

play14:46

realistic but we're not even at the chat

play14:48

GPT moment yet where you get editability

play14:50

and things like audio generation that we

play14:52

talked about here that is all yet to

play14:54

come but again at this pace of

play14:55

development we should probably be

play14:57

thinking in days and and weeks and maybe

play14:59

months and not years or decades I guess

play15:02

that poses the question at which point

play15:03

in the development do we reach the

play15:05

Matrix and I don't know the answer to

play15:07

that question I'm turning 30 next month

play15:09

and it does feel like it will happen in

play15:10

this lifetime or something akin to that

play15:12

right who knows moving on okay so my

play15:14

next Point goes back to my original

play15:16

video where I stated that you know this

play15:17

is going to be the death of stock

play15:19

footage I sell it myself since almost

play15:21

decade and there's just no way people

play15:22

are going to be paying $50 or $100 per

play15:24

clip if they can just generate them for

play15:26

a few cents and yeah I think that one is

play15:27

an obvious one but beyond that it really

play15:29

got me thinking about what this means

play15:31

for video creation especially for the

play15:33

smaller cruise and oneman shows well

play15:36

you're going to be able to generate

play15:37

entire video libraries for yourself hear

play15:41

me out so right now if you have a video

play15:43

let's say this is the a roll right this

play15:45

is the main story of the video me

play15:47

talking presenting to you all my

play15:49

findings and then on top of that we have

play15:51

something that we refer to as broll

play15:52

these are the clips that are there to

play15:54

add an additional layer of information

play15:56

they add visual interest keep you more

play15:58

more engaged and really allow us to get

play16:00

the most out of this audiovisual medium

play16:02

and right at this very moment you're

play16:03

consuming both audio and video at the

play16:05

same time so we're trying to make the

play16:07

most out of all these layers I do my

play16:09

best to keep my speech and presentation

play16:11

concise because I value your time and

play16:13

then in the editing we do our best to

play16:15

add as much information on top and right

play16:18

now that is done for boll so we pay for

play16:20

various libraries where we take these

play16:21

shots that enhance our videos and we

play16:23

also pay for various music libraries to

play16:25

add the right type of music to enhance

play16:27

the atmosphere of the video but with

play16:29

models like Sora this will really change

play16:31

the game because you're going to be able

play16:32

to generate an entire library for

play16:35

yourself for that specific project

play16:37

because the cost goes down so much

play16:39

you're going to be able to prompt things

play16:41

into existence that beforehand you would

play16:43

have to research download and compile

play16:45

and usually they don't even match and

play16:46

you have to do color correction and

play16:47

color grading on top of them and here as

play16:49

you can see from a single text prompt we

play16:51

got five video frames and all of these

play16:53

can be upscaled with something like

play16:54

topas video AI right that tool is paid

play16:56

they cost a few hundred doar but you can

play16:58

upscale 1080p Clips to 4K with AI really

play17:02

effectively but here you're just going

play17:03

to be able to prompt them and then again

play17:05

just looking over at all the AI Imaging

play17:07

tools all the features that we see in

play17:09

the Imaging tools are going to be

play17:10

available to the video tools so

play17:12

something like a oneclick upscale to 4K

play17:14

quality is going to be there can you

play17:16

regenerate this or can you generate four

play17:18

more just like this is going to be there

play17:21

you can think about the whole mour

play17:22

interface in Discord being something

play17:24

that you can do with these videos

play17:25

upscale reroll more like this use a

play17:28

different ver version of the model and

play17:29

after a few minutes you'll have a whole

play17:31

library of Boll that can enhance your

play17:33

video now I as a video creator can't

play17:36

wait for this I know that eventually the

play17:37

end point of all of this is the

play17:39

technology really replacing a lot of

play17:40

content and who knows if I'll be sitting

play17:43

here and presenting the news to you if

play17:44

an AI can do it in real time minutes

play17:46

after the release of something and you

play17:48

will be able to get it exactly in the

play17:49

voice that you prefer while it also

play17:51

respects your context right so in this

play17:53

video I kind of have to assume your

play17:55

knowledge level right so at certain

play17:56

points I also have to assume that

play17:57

somebody never created a video before

play17:59

but some of you might be experienced

play18:01

directors that know all these Concepts

play18:03

and know how the industry works well the

play18:05

AI is eventually going to be able to

play18:07

create that exactly for your context but

play18:10

I digress the point here is that at

play18:12

least for the footage at least for the

play18:13

production of this video I could have a

play18:14

custom library that is going to enhance

play18:16

all the visuals and maybe we could be

play18:18

taking a trip through Tokyo as of now

play18:20

where I present these ideas there's

play18:22

going to be some point where I'm just

play18:23

going to be able to take my voice and

play18:25

use my digital Avatar let him walk

play18:27

through Tokyo and explain these Concepts

play18:29

in a very practical manner without ever

play18:31

leaving my desk I don't think at this

play18:33

point that is a stretch a week or two

play18:35

ago it seemed a bit unreal to think of

play18:37

lifelike video the best we had was

play18:39

animations that were good and talking

play18:42

head videos that looked okay they looked

play18:44

convincing for a second or two if you

play18:45

weren't looking for AI but again if this

play18:47

is the gpt3 of AI video then what is the

play18:50

chat GPT and the GPT 4 going to look

play18:52

like that's what I'm already thinking

play18:54

about and some of these Advanced

play18:55

capabilities are outlined in the

play18:57

technical paper too here here it clearly

play18:58

states that you're going to be able to

play19:00

create videos in any format okay so from

play19:02

1920 * 1080 to 1080 * 1920 so you know

play19:05

phone format all the way to WID screen

play19:08

and then cropping into cinematic formats

play19:10

from this is easy right all you need to

play19:12

do is add black bars at the top and

play19:13

bottom and you have all the Cinematic

play19:15

format so really there's going to be a

play19:16

lot of variability and you're going to

play19:18

be able to get exactly the b-roll that

play19:20

you need for your project and then

play19:22

eventually AI is going to be creating

play19:23

the scripts and editing the video itself

play19:25

according to all the other videos it saw

play19:28

and how they were edited right I mean

play19:30

that might take a lot of time and we do

play19:31

so much manual work with these videos

play19:34

that there's always going to be a style

play19:35

expression and a handwriting to the

play19:37

post- production of a video I think but

play19:39

it's crazy to see that you know a week

play19:41

ago thinking about the fact that you

play19:43

would have a library of b-roll for a

play19:45

specific video well you had to go out

play19:47

there and shoot it in the real world or

play19:49

you had to purchase stock footage and

play19:50

then it was scattered and all over the

play19:52

place here you're going to be able to

play19:53

get the best of both worlds going to get

play19:55

great b-roll and all from the same scene

play19:58

and it's going to cost virtually nothing

play19:59

or if you have some b-roll that you

play20:00

already use going to be able to extend

play20:02

that or maybe you have some phone

play20:03

pictures and you're going to turn those

play20:05

into b-roll it's really a whole new

play20:07

world for video production I I can't

play20:09

overstate that but it doesn't end there

play20:11

and this brings me to my last point

play20:12

which is 3D World and World Generation

play20:15

because in the technical paper they

play20:17

actually refer to this as a world

play20:19

simulator and I think that's a big claim

play20:21

but it's also a Justified one because if

play20:23

you take some of the clips at face value

play20:25

it's incredible it's temporarily

play20:27

consistent the these houses are not

play20:29

warping right you're moving through the

play20:30

scene like a drone would you have these

play20:32

people on their horses going about their

play20:34

daily business it's incredible but what

play20:36

you have to realize is that beyond that

play20:38

you can apply this in something like

play20:40

goshan splatting which simply put is a

play20:43

technology that creates this so-called

play20:44

Gan Splat that is a 3D representation of

play20:48

the video in even simpler terms it turns

play20:50

a video into a 3D model and this is what

play20:53

it looks like in practice now look this

play20:55

is a simple video that wasn't even

play20:56

intended for this purpose but you could

play20:58

easily imagine a drone shot where the

play21:00

Drone parallaxes around the subject and

play21:02

gets it from all angles and then you can

play21:04

create 3D objects of something that

play21:06

doesn't even exist so right here manov

play21:08

Vision took exactly this drone clip and

play21:10

he recreated it as a goshan Splat and

play21:12

then brought it into Unity a real-time

play21:14

game engine and then you can animate the

play21:16

camera and insert characters and do all

play21:18

sorts of things right the important fact

play21:20

here is that Sora doesn't have to do

play21:22

everything from A to Z you can still

play21:24

have a human write the script you can

play21:25

still have a human in front of a green

play21:27

screen acting it out you can have your

play21:29

favorite actors in these scenes but it's

play21:30

going to be so much cheaper to produce

play21:32

because you're just going to generate

play21:34

old environments like this and then

play21:36

everything is going to be shot in front

play21:37

of a green screen until AI perfectly

play21:39

synthesizes the actor's voices which if

play21:41

you follow this channel you know that it

play21:42

already has and then the last missing

play21:44

piece is really the human part it's

play21:46

character consistency and the ability to

play21:48

edit little details so it aligns with

play21:50

the vision of everybody involved in the

play21:52

movies creation and then if you take

play21:54

that thought experiment even a step

play21:55

further you end up in Minecraft because

play21:58

in the technical paper you can see these

play22:00

that are not recorded from with in

play22:01

Minecraft these have been generated by

play22:03

Sora by simply including the word

play22:05

Minecraft in the prompt it saw so much

play22:08

Minecraft footage that it was able to

play22:10

recreate Minecraft perfectly and if it

play22:12

can do it with Minecraft now how long

play22:15

until it will do it to all of this world

play22:18

I don't know but I'm scared and excited

play22:20

at the same time but one thing is for

play22:22

sure I want to stay on top of all of

play22:23

this I'm going to keep my eye on it and

play22:25

if you want to follow me along for the

play22:26

ride subscribe to this channel subscribe

play22:28

to our Weekly Newsletter that is

play22:29

completely free and keeps you up to date

play22:31

once a week with all the Revolutionary

play22:33

breakthroughs and that's really all I

play22:35

got for today except if you want to try

play22:36

out Sora there is actually a very very

play22:39

limited demo here on this page if you

play22:41

haven't tried this yet I recommend it

play22:42

because it's the closest you can get to

play22:44

trying it and it's this little interface

play22:45

here where you can change these

play22:47

variables so you can go from an old man

play22:49

to an adorable kangaroo and then there's

play22:51

a few more variables that you can change

play22:52

out here okay Antarctica and for now

play22:54

this is the closest we get to playing

play22:56

with this thing so I hope you enjoy this

play22:58

let me know which one of these was new

play22:59

or interesting to you and if you have

play23:01

even more facts that I might have not

play23:03

considered yet also leave those below

play23:04

and if you haven't seen the original

play23:06

video about the announcements and all

play23:07

the video clips they presented that is

play23:09

over here all right I can't wait to see

play23:11

how this develops and what the

play23:12

competition comes up with this is a

play23:14

whole new world and I'm here for it see

play23:16

you soon

Rate This

5.0 / 5 (0 votes)

相关标签
AI视频生成Sora技术视频制作革新成本降低无限创造音频视觉生成视频编辑新功能AI视频编辑3D世界生成未来趋势