10 Things About OpenAI SORA You Probably Missed

The AI Advantage
22 Feb 202423:17

Summary

TLDREigor delves deep into the revolutionary capabilities of OpenAI's video generator, Sora, exploring its potential beyond the initial hype. He highlights Sora's unique features, such as extending videos and creating seamless loops, and the profound implications for audiovisual production, including cost reduction and the democratization of high-quality content creation. Eigor also discusses emerging tools like 11 Labs' sound generator, which, combined with Sora, could offer a comprehensive audiovisual experience. He predicts a future where AI can generate not just images but entire videos, transforming videography, content creation, and possibly the entire entertainment industry. This exploration offers insights into the current state and exciting future of AI in video production.

Takeaways

  • 😲 Sora, an AI video generator released by OpenAI on February 15th, 2024, has capabilities beyond the initial hype, such as generating audio and soundscapes, extending and looping videos, and generating entire stories from a single text prompt.
  • 🤯 Sora's capabilities are comparable to the GPT-3 stage of AI development, skipping ahead 2-3 years from previous AI video models, but still not as user-friendly as ChatGPT.
  • 💸 AI video generation will drastically reduce the cost of video production, potentially leading to the 'death of Hollywood' as we know it, or at least a significant decrease in the cost of production.
  • 🖌️ Sora and future AI video models will enable detailed editing and inpainting of generated videos, allowing users to make granular changes to the output based on client feedback.
  • 🎥 AI video generators will enable users to create custom libraries of B-roll footage and music specifically tailored for their projects, eliminating the need for expensive stock footage.
  • 🌎 Sora is described as a 'world simulator', capable of generating temporally consistent 3D environments that can be translated into real-time game engines or Minecraft-like worlds.
  • ⏳ The development of AI video technology is progressing rapidly, and capabilities like audio generation, inpainting, and 3D world creation are expected to become available in the near future, potentially within months.
  • 🔍 AI video generators will enable users to search for specific elements within videos and extend or loop them seamlessly, creating new possibilities for creative expression.
  • 🎬 The emergence of AI video technology will necessitate a reconsideration of traditional video production roles, as AI takes on more tasks traditionally performed by humans.
  • 🚀 The potential of AI video technology is both exciting and daunting, with the possibility of AI generating entire movies or shows from a single text prompt being a potential future scenario.

Q & A

  • What is Sora and what does it generate?

    -Sora is a video generator created by OpenAI, designed to generate videos from textual prompts.

  • Why is audio considered important in film production according to the script?

    -Audio is deemed crucial in film production because it accounts for 50% of the experience, enhancing visuals with layers like actor voices, sound effects, and ambient sounds.

  • How does 11 Labs relate to Sora's release?

    -11 Labs released a sound generator in response to Sora, aiming to complement Sora's video generation with audio creation for a full audiovisual experience.

  • What is the significance of being able to extend videos with Sora?

    -Extending videos with Sora represents a novel capability, allowing for the creation of seamless transitions and extensions of video content that were previously not possible without extensive manual work.

  • What is the potential impact of Sora on video editing costs?

    -Sora has the potential to drastically reduce video editing costs by simplifying complex processes like turning images into videos and creating high-quality content that would otherwise require significant time and resources.

  • How does Sora's editability challenge relate to client feedback?

    -Sora's current limitation in making detailed edits based on client feedback could be a challenge, as it may not allow for minor adjustments without regenerating entire scenes.

  • What future tool integration could improve Sora's editability?

    -Future tools could include features like inpainting and detailed prompting for video, similar to current AI image editing capabilities, to allow specific scene modifications without needing to regenerate everything.

  • How does Sora enable the creation of 'stories' from prompts?

    -Sora can generate coherent and detailed stories from single text prompts, creating sequences of events or actions in video form that unfold according to the input narrative.

  • What does the script suggest about the future of individual video libraries?

    -The script suggests that individuals will be able to generate bespoke video libraries tailored to specific projects, drastically lowering production costs and enhancing creative possibilities.

  • What implications does Sora have for the field of 3D world and world generation?

    -Sora's capabilities suggest it could act as a 'world simulator,' offering the ability to generate consistent and detailed 3D environments, which could revolutionize fields like gaming, virtual reality, and film production.

Outlines

00:00

📽️ Introduction to AI Video Generator Sora

The video discusses Sora, an AI video generator released by OpenAI on February 15, 2024. The speaker, Eigor, a researcher in AI technology and a former video production company owner, shares his in-depth findings on Sora's capabilities beyond the initial hype. He has spent considerable time studying technical reports, watching YouTube videos, and scouring discussions on Twitter to uncover lesser-known aspects of this AI tool.

05:01

🔊 Sora and Audio Generation

Eigor explains that while Sora currently only generates muted videos, audio generation is a crucial component of video production. He notes that 11 Labs has already released a sound generator capable of creating entire soundscapes from text prompts. Eigor predicts that OpenAI will likely integrate Sora with an audio generator, resulting in an audiovisual generator that can produce complete videos with background music, sound effects, and even synthesized voices, providing a full-stack solution for audiovisual production.

10:03

🆕 New Capabilities of Sora

Eigor highlights two new capabilities of Sora that were not previously possible. First, Sora can extend videos, seamlessly generating frames before or after an existing clip, allowing users to expand the duration of a video. Second, Sora can create looping videos, generating additional frames that allow footage to loop indefinitely. These features open up new possibilities for creating animations and interactive content.

15:04

💰 Cost Reduction and Editing Capabilities

Eigor discusses how Sora's capabilities lead to significant cost reductions in video production. He explains that tasks like rotoscoping and animating images, which previously required hours of manual labor, can now be achieved much more efficiently with AI. Eigor also addresses concerns about editability, citing research on tools like the Multi Motion Brush Tool and inpainting techniques that allow for fine-tuning and editing of AI-generated videos.

20:05

🌎 World Generation and 3D Visualization

Eigor explores Sora's potential for generating entire 3D worlds and environments. He discusses technologies like Goshen splatting, which converts videos into 3D models, allowing for further manipulation and animation in game engines like Unity. Eigor also mentions Sora's ability to recreate environments like Minecraft, suggesting that it may eventually be capable of generating entire virtual worlds. He expresses both excitement and apprehension about these possibilities, emphasizing the need to stay up-to-date with AI advancements.

Mindmap

Keywords

💡Sora

Sora is a new artificial intelligence (AI) video generator developed by OpenAI and released on February 15th, 2024. It is capable of generating realistic and high-quality videos from textual prompts. The video script revolves around the implications and capabilities of Sora, as the narrator (Eigor) shares his extensive research and analysis on how this technology could revolutionize video production.

💡Audio

Audio refers to the sound component of multimedia, including voices, sound effects, background sounds, and music. The script discusses the importance of audio in film production, as it complements the visuals and creates a more immersive experience. Sora, initially, focused on generating video without audio, but the narrator mentions the potential integration with AI audio generators to create complete audiovisual outputs.

💡Editing

Editing refers to the process of reviewing, adjusting, and refining video footage to create a polished final product. The script addresses the editability of videos generated by Sora, as clients often provide feedback and require changes to specific details. The narrator explores solutions like inpainting, multi-motion brushes, and bounding box techniques to enable more precise editing of AI-generated videos.

💡B-roll

B-roll, also known as supplemental footage or cutaways, refers to additional video clips used to enhance or support the primary footage (A-roll) in video production. The script discusses how Sora could revolutionize the creation of B-roll by generating entire libraries of relevant footage for a specific project, drastically reducing the cost and effort required for video creators to obtain and use B-roll.

💡World Generation

World generation refers to the capability of Sora to create entire 3D environments and worlds from textual prompts. The script mentions how Sora is described as a "world simulator" in the technical paper, as it can generate temporally consistent scenes with characters and objects behaving naturally in a virtual space. This opens up possibilities for creating 3D models from video footage and integrating AI-generated environments with real-world elements.

💡Prompt Engineering

Prompt engineering refers to the practice of carefully crafting textual prompts to guide AI models like Sora to generate desired outputs. The script highlights the importance of detailed and specific prompts to achieve accurate and consistent results. As AI video tools evolve, prompt engineering will become increasingly crucial in controlling the generation process and incorporating client feedback.

💡Stock Footage

Stock footage refers to pre-recorded video clips that can be licensed and used in various video productions. The script suggests that the low cost and accessibility of AI video generation could make traditional stock footage obsolete, as creators will be able to generate their own custom footage tailored to their specific projects with minimal effort and expense.

💡Upscaling

Upscaling refers to the process of increasing the resolution of a video or image to a higher quality. The script mentions the use of AI upscaling tools like Topaz Video AI to enhance the resolution of Sora-generated videos from 1080p to 4K. This capability further improves the quality and usefulness of AI-generated footage for professional video production.

💡Deepfakes

Deepfakes refer to the use of AI technology to generate synthetic media, such as videos, images, or audio, that depict events or individuals in a realistic but fabricated manner. The script touches on the potential of AI voice synthesis and character consistency in Sora, which could lead to the creation of highly realistic deepfake videos. This raises concerns about the ethical implications and misuse of such technology.

💡Augmented Reality (AR)

Augmented Reality (AR) is a technology that overlays virtual elements, such as graphics or animations, onto the real-world environment. The script discusses the potential of using AI-generated video in AR applications, where digital avatars could present information or narrate concepts while appearing to walk through real-world environments captured by the user's device.

Highlights

Sora, the AI video generator by OpenAI, was released on February 15th, 2024, sparking significant interest in its capabilities beyond initial expectations.

11 Labs responded to Sora's release with a new sound generator capable of creating detailed soundscapes from text prompts, suggesting the possibility of fully generated audio-visual content.

Sora's ability to extend videos by generating new, seamless content before or after a given clip introduces a groundbreaking feature for video production.

The potential for looping videos created by Sora opens up new creative possibilities for content creators, including the idea of infinite, seamless video loops.

Sora's capabilities significantly lower the cost and technical barriers to producing high-quality video content, democratizing access to videography.

The integration of AI in video editing software is anticipated, offering features like video extension, looping, and possibly even detailed editing adjustments.

The ability to prompt entire narratives into existence with Sora marks a significant advancement in storytelling, potentially revolutionizing scriptwriting and content creation.

Current limitations in editing AI-generated videos, such as making minor adjustments, are expected to be overcome as technology evolves, mirroring advancements seen in AI image generation.

The comparison of Sora's current stage to the GPT-3 model of text AI suggests that we are on the brink of more advanced, intuitive video AI technologies.

Sora's release has prompted comparisons with existing AI technologies, indicating that AI video generation may have leapfrogged years ahead in terms of development.

The potential impact of Sora on the stock footage market and the ability for creators to generate custom video libraries for projects highlights a shift in how content is produced and sourced.

Sora's world simulator capabilities suggest a future where virtually any environment can be generated for video production, reducing the need for on-location shooting.

The prospect of generating 3D models from AI-generated videos opens new avenues for integrating AI content into gaming and virtual reality applications.

The rapid development pace of AI technologies like Sora raises questions about the future of content creation and the role of human creators in a predominantly AI-driven industry.

Sora's ability to generate content that closely mimics real-world environments and narratives from simple text prompts signifies a major leap forward in AI's creative potential.

Transcripts

play00:00

Sora the video generator by open AI

play00:02

released on February 15th 2024 and I've

play00:05

spent pretty much every hour of my life

play00:06

scouring the internet and researching

play00:09

what else this could do and there's

play00:11

actually a lot of things that weren't

play00:12

obvious in the middle of all the hype

play00:14

that accompanied the release of this AI

play00:16

video generator I studied a technical

play00:18

report on detail watched all the YouTube

play00:19

videos spent an unhealthy amount of time

play00:21

on Twitter looking for all the

play00:23

discussions and the little findings

play00:25

people had matter of fact since release

play00:27

I didn't even leave the

play00:28

apartment

play00:31

if we haven't met yet I'm eigor I made

play00:32

it my full-time calling to research what

play00:35

AI has to offer and how to put it to

play00:36

work in your everyday life and before

play00:38

doing that with the a Advantage I had a

play00:40

video production company that operated

play00:41

for eight years in Central Europe I

play00:43

helped clients with everything from

play00:45

corporate video trainings to directing

play00:46

smaller commercials and even shooting

play00:48

festivals nightclub videos when it comes

play00:49

to videography I've really seen it all

play00:51

and this stuff is exactly in the middle

play00:53

between technology and video production

play00:55

so I can't wait to dive into all of this

play00:57

all right so without further Ado let's

play00:58

look at all the implications of Sora

play01:00

that you might have not been aware of

play01:01

right away okay so first of all I want

play01:03

to talk about audio because Sora only

play01:05

generates video right all the example we

play01:06

saw

play01:08

were muted without music or sound

play01:11

effects in the background and a lot of

play01:13

people rightfully pointed out that hey

play01:14

in film it's really 50/50 at the very

play01:17

least it's 50% visuals and another 50%

play01:21

audio and there's many layers to that

play01:23

right you might have the actor's voice

play01:25

as one track but then there's also sound

play01:27

effects of things happening around them

play01:28

and then you have foli which is the

play01:30

background sound that just persists

play01:32

you're not really consciously aware of

play01:33

it but it's there and if it's not there

play01:35

the shot is missing something so surely

play01:37

audio must be a complicated issue too

play01:39

right well not really because 11 Labs

play01:41

actually reacted to the Sora release and

play01:43

they released a new sound generator that

play01:45

from text prompts is able to generate an

play01:47

entire soundscape okay so today we don't

play01:50

have access right but if open AI hooked

play01:52

up Sora to this audio generator you

play01:55

would have a audio visual generator

play01:57

where you create full soundscapes have a

play01:59

quick listen

play02:05

and sure a sound designer could do this

play02:07

manually but again if you're a oneman

play02:09

show and you're producing a commercial

play02:11

like I did so so many times you're doing

play02:13

everything yourself from planning to

play02:14

recording editing doing the sound design

play02:16

doing the color grading doing feedback

play02:18

rounds with the client invoicing and

play02:20

often times you don't have budget for a

play02:21

sound designer so you bet that there's

play02:23

going to be models I don't know if Sora

play02:25

or others that combine both they're

play02:26

going to give you audio visual outputs

play02:28

this is not a question that's just a

play02:30

straight fact at this point and with

play02:32

tools like sun AI out there already that

play02:34

can generate full songs including lyrics

play02:37

at a decent quality with AI well you're

play02:40

going to be able to generate the

play02:40

background music the background sound

play02:42

effects the voices that are in the scene

play02:45

because voice generators are thing and

play02:46

they're virtually indistinguishable

play02:48

already right and now the video

play02:49

components so we really have the full

play02:51

stack for audiovisual production it's

play02:54

just a question of time now and from my

play02:55

estimate it looks to be months not years

play02:57

till we'll get there okay my next point

play02:58

is all about the capab abilities of Sora

play03:00

that are actually brand new because a

play03:02

lot of the stuff that we saw just

play03:04

drastically reduce the cost of what it

play03:06

takes to produce a clip like this or an

play03:09

animated video like this you might be

play03:11

aware that movies like this exist right

play03:13

it just cost a lot of money to produce

play03:14

this so first of all let's talk about

play03:15

the things that are actually brand new

play03:17

and not just a cost reduction although

play03:19

that has its implications too and we'll

play03:21

talk about that but the things that are

play03:23

actually new are first of all you can

play03:25

extend videos okay so this is

play03:27

beautifully outlined in a technical

play03:28

paper here and it shows the example of a

play03:30

San Francisco subway car so as you can

play03:32

see this clip is the same in all three

play03:34

instances but if you back up a little

play03:36

bit then extended the beginning of it

play03:38

okay so as you can see the video

play03:39

generated by Sora is different every

play03:42

single time and it seamlessly

play03:44

transitions into the subway car so this

play03:46

is something that was not possible up

play03:48

until now okay it generates this video

play03:49

from scratch now I guess you could argue

play03:51

that you could recreate this entire

play03:53

scene in 3D and then create the frames

play03:54

before that and seamlessly transition

play03:56

into it but you have to realize that at

play03:57

a certain point this is going to become

play03:58

a feature in every editing software

play04:01

right you'll have just an image and it

play04:03

will turn it into a video and then you

play04:05

can extend it to any duration you can

play04:07

add a clip before add a clip after

play04:09

you'll be able to turn your old family

play04:10

photos into Vivid memories sort of that

play04:13

is really scary but it's going to be a

play04:15

thing and you bet apps like Instagram at

play04:17

one point I don't know when are going to

play04:19

have a feature where you're going to be

play04:21

able to turn a photo into video and then

play04:23

extend that indefinitely another new

play04:25

capability is you're going to be able to

play04:26

Lo videos okay and this is also

play04:28

something that you could kind of but not

play04:30

really achieved today definitely not in

play04:32

this form okay you'll give it a video

play04:33

clip and it will be generating extra

play04:35

frames that will seamlessly let the

play04:37

footage loop I had a good chat with a

play04:39

friend and we kind of talked about how

play04:40

this could be the new Rick rolling on

play04:42

the

play04:47

internet because if you do this to a

play04:49

longer clip you just don't realize that

play04:50

it's looping and that it's just playing

play04:52

forever so you could send somebody a

play04:54

clip and it might take them minutes to

play04:55

realize that the whole thing is looping

play04:57

and just repeating over and over again

play04:58

anyway this is something that was was

play04:59

not really possible and some people went

play05:01

ahead and tried this anyway in

play05:02

videography there was this whole Trend a

play05:04

few years back where people were trying

play05:05

to seamlessly transition one thing into

play05:07

another like for

play05:10

example and my shirt is gone magic now

play05:14

those are the simplest way to do it but

play05:15

here we will have the capability of

play05:17

generating brand new frames and things

play05:18

will be able to Loop indefinitely okay

play05:20

so those are the new features you can

play05:22

expect in editing software somewhere

play05:24

down the line but then there's a lot of

play05:25

the ones that are just simple cost

play05:27

reduction this is why people refer to it

play05:28

as the death of Hollywood in many cases

play05:30

now I don't know if that's an accurate

play05:31

assessment in my opinion I think they're

play05:33

going to use this Tech to Advantage to

play05:35

lower the prices of production and pump

play05:37

out even more content we'll also talk

play05:39

about that soon but let's finish up the

play05:40

segment and talk about the things that

play05:42

were already available but now it's just

play05:43

a 10,000x reduction cost for that

play05:45

calculation I see a subscription price

play05:47

that is somewhere around the GPT plus

play05:49

plan so what's going to be possible at

play05:50

this super low cost is first of all

play05:52

generating images we're able to do that

play05:53

with other image generators right sure

play05:55

these are hyper realistic and very high

play05:57

quality just like M journey and so but

play05:59

then it's capability to turn images into

play06:01

videos that is very very big in my

play06:03

opinion because it's going to make it so

play06:05

easy to craft compelling videos like I

play06:08

feel like most people that talk about

play06:09

this don't appreciate how much this is

play06:11

going to lower the barrier for entry for

play06:13

videography and high quality videography

play06:15

that is because you're going to get

play06:16

access to things like this so even if

play06:18

you've seen this before I think I have a

play06:19

bit of a different perspective here so

play06:20

look here on the left you have the Drone

play06:22

image here on the right you have this

play06:23

butterfly right and here in the middle

play06:24

you have the mix of the two where the

play06:25

Drone is flying through something like

play06:27

the Coliseum and then it morphs into a

play06:29

butterly fly and look I could do this

play06:30

today okay this just takes about 3 to 5

play06:33

hours of work dependent on your skill

play06:35

level you just go into after effects and

play06:36

you rotoscope out this butterfly meaning

play06:39

you go frame by frame that's 25 frames

play06:41

every single second and you make sure

play06:43

you animate a mask exactly in the form

play06:45

of the Butterflies wings and you redo

play06:47

that for every movement now yes there's

play06:49

tools that help you but a lot of times

play06:50

you're stuck with manual labor there so

play06:52

it might just turn out that the 3 to 5

play06:54

hour task turns it into 15 20

play06:57

hours and then you can bring the

play06:59

butterfly into here and morph it into

play07:01

the Drone with something like a morph

play07:03

cut inside of Premiere Pro now if none

play07:04

of that means anything to you that's

play07:06

fine I'm just saying hours of work are

play07:08

going to be done like

play07:10

this and this is just one simple example

play07:13

in many others a oneman crew could never

play07:15

do this right all these animation

play07:17

related examples where they turn an

play07:19

image into an animation like this are

play07:21

usually just not feasible for a oneman

play07:22

show it takes too much time to animate

play07:24

all the little things you might be able

play07:26

to do it for a few shots but if you do a

play07:27

whole one minute trailer you'll find

play07:29

that you spend 2 weeks at the computer

play07:31

if you really animate all the little

play07:32

details like in this shot and you have a

play07:34

lot of different shots so that's my

play07:35

second point it lowered the bar by a

play07:37

factor that is larger than most people

play07:40

realize I don't know if it's 1,000x or

play07:42

10,000x but a lot of these things were

play07:45

Unthinkable for small Crews or oneman

play07:47

shows and now they will be doable like

play07:49

for example before

play07:55

after Okay so this point is all about

play07:57

the editability of the video and here in

play07:59

Twitter Owen Fern went ahead and he

play08:01

criticized the fact that hey yes these

play08:03

Generations are absolutely incredible

play08:04

but what if the client has feedback and

play08:07

this is very very appropriate criticism

play08:09

in my opinion because clients always

play08:11

have feedback and if you're going to use

play08:12

this for job if this is supposed to be

play08:14

the death of Hollywood just between

play08:16

directors and producers there is so much

play08:18

feedback going on in the post-production

play08:20

of any advertisement movie heck even if

play08:22

it's an event video I had clients that

play08:24

went back and forth 10 times and gave

play08:26

feedback over and over again and I had

play08:28

to adjust things so one points out here

play08:30

that yeah there's going to be a lot of

play08:31

little details that will need to be

play08:32

changed about these scenes and with Sora

play08:35

you're not really able to go back and

play08:36

change little details right you're going

play08:38

to have to regenerate the whole scene

play08:40

and maybe you like the character here

play08:42

but you just don't like the fact that

play08:43

this is not a Thum it just looks like a

play08:45

fifth finger and we would like to give

play08:46

it a look of a Thum can we do that and

play08:48

his point is the answer has to be no and

play08:50

then you have a dissatisfied client

play08:52

which is a very fair point but as I've

play08:54

been following this very closely over

play08:56

the last months there's one tool and one

play08:58

research that needs to to be pointed out

play09:00

here okay first things first Runway ml

play09:02

the previous so to say leader in AI

play09:04

video a few weeks ago introduced a

play09:06

feature called multi motion brush tool

play09:08

which allowed you to use multiple

play09:10

brushes on the video to just animate

play09:12

specific parts now that is for animation

play09:15

but over in M journey and many other

play09:16

image generators you're able to do

play09:18

something called inpainting where you

play09:20

just paint in a little part of the image

play09:22

and then edit just that you can reprompt

play09:24

it so on images today you could actually

play09:27

go in and just paint in this Thum and

play09:29

say regenerate the Thum why would that

play09:32

not be possible on video eventually it

play09:34

will be and further than that bite Dan

play09:36

the creator of Tik Tok actually

play09:38

published a research paper less than a

play09:40

week ago about this so-called boxor okay

play09:42

so I didn't cover it on the channel

play09:44

because I like to cover things that are

play09:45

available today or truly truly

play09:47

revolutionary this kind of Falls in this

play09:49

in between zone of hey really

play09:50

interesting but it's not available and

play09:52

in my eyes probably not worth a

play09:54

dedicated video but look the whole point

play09:55

of this is you draw different boxes in

play09:57

the scene and thereby you can control

play09:58

the seen in great detail so if you

play10:00

select the balloon and say it's going to

play10:02

fly away in this direction and then you

play10:04

select a girl and she's going to run in

play10:05

a different direction exactly that is

play10:07

going to happen so between tools like

play10:09

the box imator and inating in mid

play10:11

Journey it's just a question of time

play10:12

where you're going to be able to use a

play10:14

mix of these tools and also in paint on

play10:17

top of AI video now sure there's going

play10:18

to be a temporal axis there right

play10:20

because on images you only have the X

play10:22

and Y AIS and in video there's also the

play10:24

time axis and sometimes you even have

play10:26

movement in zspace but between This

play10:28

research and painting I can totally see

play10:30

that happening for AI video 2 down the

play10:33

line plus as we know with prompt

play10:34

engineering today for language based

play10:36

models there's a lot of control that you

play10:39

have in the text prompt you just have to

play10:40

be really detailed if you look at a lot

play10:42

of these prompts they're good but

play10:43

they're not as detailed as they could be

play10:45

some of the best stable diffusion

play10:47

prompting is extremely detailed also in

play10:49

mid journey in stable diffusion if you

play10:50

keep your prompts relatively simple

play10:52

you're going to get varied results even

play10:53

if you roll the dice and create a new

play10:55

scene it's going to be very similar plus

play10:56

let's refer back to Mid Journey again

play10:58

they just recent recently announced a

play10:59

new character tool where it's going to

play11:01

maintain character consistency based on

play11:03

a character that you pick in a tool so

play11:05

all of these AI image features that

play11:07

we've been talking about and I've been

play11:08

tracking regularly they're going to

play11:09

apply to video tool it's just going to

play11:11

take longer but I absolutely believe

play11:13

that we'll be able to implement all of

play11:14

this little feedback into AI video and

play11:17

therefore this actually being production

play11:19

ready at some point okay so my next

play11:20

Point here is that I didn't expect right

play11:22

in a beginning is that you can prompt

play11:24

stories into existence from a single

play11:26

prompt okay so here's an example from

play11:28

Bill PE from the open AI team and he

play11:30

generated an entire story of two dogs

play11:33

that should walk through NYC then a taxi

play11:35

should stop to let the dogs pass across

play11:37

walk then they should walk past the

play11:38

pretzel and hot dog stand and finally

play11:40

they should end up at Broadway signs and

play11:42

if you follow this channel you might

play11:44

know how much context you can add text

play11:45

prompts to achieve exceptionally

play11:47

accurate results from things like chat

play11:49

GPT if you added way more details here I

play11:51

believe they would be reflected in it

play11:52

and then the story can develop and as

play11:54

right now you already have tools that

play11:56

can manipulate someone's mouth to speak

play11:58

in another language so it looks

play11:59

naturally also that will be possible

play12:01

here so you will be able to create these

play12:03

long shots like they have in movies

play12:05

which are incredibly difficult to

play12:06

achieve I mean some movies like Dunkirk

play12:08

took it so far where they turned the

play12:10

movie into a single Take It All flows

play12:12

seamlessly and Sora is able to do it too

play12:15

and that I didn't expect at the

play12:16

beginning also they didn't share this

play12:17

example right off the bat I think this

play12:19

is actually very very impressive and if

play12:21

now we're already able to generate

play12:22

stories from a single Simple Text prompt

play12:25

it's just a question of time until we

play12:27

arrive at something like this where you

play12:29

just type in a prompt and you get a full

play12:30

movie back or a full show I mean at some

play12:32

point it's just a question of having

play12:34

enough gpus this is obviously just a

play12:35

mockup but something to think about

play12:37

especially because this is the worst

play12:39

teack is ever going to be and you know

play12:40

what let's talk about that point that is

play12:42

actually my next one so where are we in

play12:44

the timeline of this okay it was really

play12:46

helpful to look into some of the

play12:47

discussions that are happening online to

play12:49

orient myself in terms of where we

play12:51

actually are today soad most St from

play12:53

stability AI actually had a fantastic

play12:55

take here he compared this to the gpt3

play12:58

of video model models so if you didn't

play12:59

know gpt3 was the predecessor to chat

play13:02

GPT okay it was available before but the

play13:04

interface was not as intuitive and you

play13:06

actually had to prompt it differently

play13:08

rather than cat gbt that had

play13:09

reinforcement learning for human

play13:10

feedback which means a lot of humans

play13:12

feedbacked the outputs to make it more

play13:15

user friendly for humans and that's

play13:17

where this is at right now okay it's not

play13:19

at the cat GPD point where it's going to

play13:20

be really easy to use and it's going to

play13:22

gain Mass popularity and then we got

play13:24

gbd4 and all the additional features and

play13:26

it's just crazy capable now and he even

play13:27

said that all the images generators like

play13:29

stable diffusion were more comparable to

play13:31

gpt2 where the quality of the output was

play13:33

not nearly as good as gpt3 so as in

play13:35

large language models this puts us on

play13:37

the timeline somewhere in the middle of

play13:39

2022 because the chat gbt gbt 4 llama

play13:43

and mistrals will come over the next few

play13:46

years we Rems at the pace that we're

play13:48

moving ahead right and on this topic

play13:49

there's another fantastic Fred by Nick

play13:51

samier here on X and he ran all the

play13:54

exact prompts that Sora generated

play13:56

through my journey and then paired them

play13:57

with the results and the thing is

play13:59

they're shockingly similar right so

play14:01

people are already joking that hey is my

play14:03

journey just open AI disguised probably

play14:05

they're just using very similar training

play14:06

data right but look at that all of these

play14:08

examples are very similar now I'm sure

play14:10

these are the ones that were the most

play14:12

similar right to create this illusion of

play14:14

it essentially being the same model here

play14:16

I mean if you look closer the beaver is

play14:18

very different but the point is these

play14:19

are not night and day right sure these

play14:22

helmets are completely different but the

play14:24

Cinematic look is very similar with

play14:25

slightly different color grading down

play14:27

here fair but the point that I'm trying

play14:29

to make here is that we literally

play14:30

skipped two to three years ahead in AI

play14:33

video because what we had up pela was

play14:35

something like gpt1 or

play14:39

gpt2 oh that's hot now we got gpt3 that

play14:42

is actually usable and can create useful

play14:44

outputs that are essentially hyper

play14:46

realistic but we're not even at the chat

play14:48

GPT moment yet where you get editability

play14:50

and things like audio generation that we

play14:52

talked about here that is all yet to

play14:54

come but again at this pace of

play14:55

development we should probably be

play14:57

thinking in days and and weeks and maybe

play14:59

months and not years or decades I guess

play15:02

that poses the question at which point

play15:03

in the development do we reach the

play15:05

Matrix and I don't know the answer to

play15:07

that question I'm turning 30 next month

play15:09

and it does feel like it will happen in

play15:10

this lifetime or something akin to that

play15:12

right who knows moving on okay so my

play15:14

next Point goes back to my original

play15:16

video where I stated that you know this

play15:17

is going to be the death of stock

play15:19

footage I sell it myself since almost

play15:21

decade and there's just no way people

play15:22

are going to be paying $50 or $100 per

play15:24

clip if they can just generate them for

play15:26

a few cents and yeah I think that one is

play15:27

an obvious one but beyond that it really

play15:29

got me thinking about what this means

play15:31

for video creation especially for the

play15:33

smaller cruise and oneman shows well

play15:36

you're going to be able to generate

play15:37

entire video libraries for yourself hear

play15:41

me out so right now if you have a video

play15:43

let's say this is the a roll right this

play15:45

is the main story of the video me

play15:47

talking presenting to you all my

play15:49

findings and then on top of that we have

play15:51

something that we refer to as broll

play15:52

these are the clips that are there to

play15:54

add an additional layer of information

play15:56

they add visual interest keep you more

play15:58

more engaged and really allow us to get

play16:00

the most out of this audiovisual medium

play16:02

and right at this very moment you're

play16:03

consuming both audio and video at the

play16:05

same time so we're trying to make the

play16:07

most out of all these layers I do my

play16:09

best to keep my speech and presentation

play16:11

concise because I value your time and

play16:13

then in the editing we do our best to

play16:15

add as much information on top and right

play16:18

now that is done for boll so we pay for

play16:20

various libraries where we take these

play16:21

shots that enhance our videos and we

play16:23

also pay for various music libraries to

play16:25

add the right type of music to enhance

play16:27

the atmosphere of the video but with

play16:29

models like Sora this will really change

play16:31

the game because you're going to be able

play16:32

to generate an entire library for

play16:35

yourself for that specific project

play16:37

because the cost goes down so much

play16:39

you're going to be able to prompt things

play16:41

into existence that beforehand you would

play16:43

have to research download and compile

play16:45

and usually they don't even match and

play16:46

you have to do color correction and

play16:47

color grading on top of them and here as

play16:49

you can see from a single text prompt we

play16:51

got five video frames and all of these

play16:53

can be upscaled with something like

play16:54

topas video AI right that tool is paid

play16:56

they cost a few hundred doar but you can

play16:58

upscale 1080p Clips to 4K with AI really

play17:02

effectively but here you're just going

play17:03

to be able to prompt them and then again

play17:05

just looking over at all the AI Imaging

play17:07

tools all the features that we see in

play17:09

the Imaging tools are going to be

play17:10

available to the video tools so

play17:12

something like a oneclick upscale to 4K

play17:14

quality is going to be there can you

play17:16

regenerate this or can you generate four

play17:18

more just like this is going to be there

play17:21

you can think about the whole mour

play17:22

interface in Discord being something

play17:24

that you can do with these videos

play17:25

upscale reroll more like this use a

play17:28

different ver version of the model and

play17:29

after a few minutes you'll have a whole

play17:31

library of Boll that can enhance your

play17:33

video now I as a video creator can't

play17:36

wait for this I know that eventually the

play17:37

end point of all of this is the

play17:39

technology really replacing a lot of

play17:40

content and who knows if I'll be sitting

play17:43

here and presenting the news to you if

play17:44

an AI can do it in real time minutes

play17:46

after the release of something and you

play17:48

will be able to get it exactly in the

play17:49

voice that you prefer while it also

play17:51

respects your context right so in this

play17:53

video I kind of have to assume your

play17:55

knowledge level right so at certain

play17:56

points I also have to assume that

play17:57

somebody never created a video before

play17:59

but some of you might be experienced

play18:01

directors that know all these Concepts

play18:03

and know how the industry works well the

play18:05

AI is eventually going to be able to

play18:07

create that exactly for your context but

play18:10

I digress the point here is that at

play18:12

least for the footage at least for the

play18:13

production of this video I could have a

play18:14

custom library that is going to enhance

play18:16

all the visuals and maybe we could be

play18:18

taking a trip through Tokyo as of now

play18:20

where I present these ideas there's

play18:22

going to be some point where I'm just

play18:23

going to be able to take my voice and

play18:25

use my digital Avatar let him walk

play18:27

through Tokyo and explain these Concepts

play18:29

in a very practical manner without ever

play18:31

leaving my desk I don't think at this

play18:33

point that is a stretch a week or two

play18:35

ago it seemed a bit unreal to think of

play18:37

lifelike video the best we had was

play18:39

animations that were good and talking

play18:42

head videos that looked okay they looked

play18:44

convincing for a second or two if you

play18:45

weren't looking for AI but again if this

play18:47

is the gpt3 of AI video then what is the

play18:50

chat GPT and the GPT 4 going to look

play18:52

like that's what I'm already thinking

play18:54

about and some of these Advanced

play18:55

capabilities are outlined in the

play18:57

technical paper too here here it clearly

play18:58

states that you're going to be able to

play19:00

create videos in any format okay so from

play19:02

1920 * 1080 to 1080 * 1920 so you know

play19:05

phone format all the way to WID screen

play19:08

and then cropping into cinematic formats

play19:10

from this is easy right all you need to

play19:12

do is add black bars at the top and

play19:13

bottom and you have all the Cinematic

play19:15

format so really there's going to be a

play19:16

lot of variability and you're going to

play19:18

be able to get exactly the b-roll that

play19:20

you need for your project and then

play19:22

eventually AI is going to be creating

play19:23

the scripts and editing the video itself

play19:25

according to all the other videos it saw

play19:28

and how they were edited right I mean

play19:30

that might take a lot of time and we do

play19:31

so much manual work with these videos

play19:34

that there's always going to be a style

play19:35

expression and a handwriting to the

play19:37

post- production of a video I think but

play19:39

it's crazy to see that you know a week

play19:41

ago thinking about the fact that you

play19:43

would have a library of b-roll for a

play19:45

specific video well you had to go out

play19:47

there and shoot it in the real world or

play19:49

you had to purchase stock footage and

play19:50

then it was scattered and all over the

play19:52

place here you're going to be able to

play19:53

get the best of both worlds going to get

play19:55

great b-roll and all from the same scene

play19:58

and it's going to cost virtually nothing

play19:59

or if you have some b-roll that you

play20:00

already use going to be able to extend

play20:02

that or maybe you have some phone

play20:03

pictures and you're going to turn those

play20:05

into b-roll it's really a whole new

play20:07

world for video production I I can't

play20:09

overstate that but it doesn't end there

play20:11

and this brings me to my last point

play20:12

which is 3D World and World Generation

play20:15

because in the technical paper they

play20:17

actually refer to this as a world

play20:19

simulator and I think that's a big claim

play20:21

but it's also a Justified one because if

play20:23

you take some of the clips at face value

play20:25

it's incredible it's temporarily

play20:27

consistent the these houses are not

play20:29

warping right you're moving through the

play20:30

scene like a drone would you have these

play20:32

people on their horses going about their

play20:34

daily business it's incredible but what

play20:36

you have to realize is that beyond that

play20:38

you can apply this in something like

play20:40

goshan splatting which simply put is a

play20:43

technology that creates this so-called

play20:44

Gan Splat that is a 3D representation of

play20:48

the video in even simpler terms it turns

play20:50

a video into a 3D model and this is what

play20:53

it looks like in practice now look this

play20:55

is a simple video that wasn't even

play20:56

intended for this purpose but you could

play20:58

easily imagine a drone shot where the

play21:00

Drone parallaxes around the subject and

play21:02

gets it from all angles and then you can

play21:04

create 3D objects of something that

play21:06

doesn't even exist so right here manov

play21:08

Vision took exactly this drone clip and

play21:10

he recreated it as a goshan Splat and

play21:12

then brought it into Unity a real-time

play21:14

game engine and then you can animate the

play21:16

camera and insert characters and do all

play21:18

sorts of things right the important fact

play21:20

here is that Sora doesn't have to do

play21:22

everything from A to Z you can still

play21:24

have a human write the script you can

play21:25

still have a human in front of a green

play21:27

screen acting it out you can have your

play21:29

favorite actors in these scenes but it's

play21:30

going to be so much cheaper to produce

play21:32

because you're just going to generate

play21:34

old environments like this and then

play21:36

everything is going to be shot in front

play21:37

of a green screen until AI perfectly

play21:39

synthesizes the actor's voices which if

play21:41

you follow this channel you know that it

play21:42

already has and then the last missing

play21:44

piece is really the human part it's

play21:46

character consistency and the ability to

play21:48

edit little details so it aligns with

play21:50

the vision of everybody involved in the

play21:52

movies creation and then if you take

play21:54

that thought experiment even a step

play21:55

further you end up in Minecraft because

play21:58

in the technical paper you can see these

play22:00

that are not recorded from with in

play22:01

Minecraft these have been generated by

play22:03

Sora by simply including the word

play22:05

Minecraft in the prompt it saw so much

play22:08

Minecraft footage that it was able to

play22:10

recreate Minecraft perfectly and if it

play22:12

can do it with Minecraft now how long

play22:15

until it will do it to all of this world

play22:18

I don't know but I'm scared and excited

play22:20

at the same time but one thing is for

play22:22

sure I want to stay on top of all of

play22:23

this I'm going to keep my eye on it and

play22:25

if you want to follow me along for the

play22:26

ride subscribe to this channel subscribe

play22:28

to our Weekly Newsletter that is

play22:29

completely free and keeps you up to date

play22:31

once a week with all the Revolutionary

play22:33

breakthroughs and that's really all I

play22:35

got for today except if you want to try

play22:36

out Sora there is actually a very very

play22:39

limited demo here on this page if you

play22:41

haven't tried this yet I recommend it

play22:42

because it's the closest you can get to

play22:44

trying it and it's this little interface

play22:45

here where you can change these

play22:47

variables so you can go from an old man

play22:49

to an adorable kangaroo and then there's

play22:51

a few more variables that you can change

play22:52

out here okay Antarctica and for now

play22:54

this is the closest we get to playing

play22:56

with this thing so I hope you enjoy this

play22:58

let me know which one of these was new

play22:59

or interesting to you and if you have

play23:01

even more facts that I might have not

play23:03

considered yet also leave those below

play23:04

and if you haven't seen the original

play23:06

video about the announcements and all

play23:07

the video clips they presented that is

play23:09

over here all right I can't wait to see

play23:11

how this develops and what the

play23:12

competition comes up with this is a

play23:14

whole new world and I'm here for it see

play23:16

you soon

Rate This

5.0 / 5 (0 votes)

Вам нужно краткое изложение на английском?