10 Things About OpenAI SORA You Probably Missed
Summary
TLDREigor delves deep into the revolutionary capabilities of OpenAI's video generator, Sora, exploring its potential beyond the initial hype. He highlights Sora's unique features, such as extending videos and creating seamless loops, and the profound implications for audiovisual production, including cost reduction and the democratization of high-quality content creation. Eigor also discusses emerging tools like 11 Labs' sound generator, which, combined with Sora, could offer a comprehensive audiovisual experience. He predicts a future where AI can generate not just images but entire videos, transforming videography, content creation, and possibly the entire entertainment industry. This exploration offers insights into the current state and exciting future of AI in video production.
Takeaways
- 😲 Sora, an AI video generator released by OpenAI on February 15th, 2024, has capabilities beyond the initial hype, such as generating audio and soundscapes, extending and looping videos, and generating entire stories from a single text prompt.
- 🤯 Sora's capabilities are comparable to the GPT-3 stage of AI development, skipping ahead 2-3 years from previous AI video models, but still not as user-friendly as ChatGPT.
- 💸 AI video generation will drastically reduce the cost of video production, potentially leading to the 'death of Hollywood' as we know it, or at least a significant decrease in the cost of production.
- 🖌️ Sora and future AI video models will enable detailed editing and inpainting of generated videos, allowing users to make granular changes to the output based on client feedback.
- 🎥 AI video generators will enable users to create custom libraries of B-roll footage and music specifically tailored for their projects, eliminating the need for expensive stock footage.
- 🌎 Sora is described as a 'world simulator', capable of generating temporally consistent 3D environments that can be translated into real-time game engines or Minecraft-like worlds.
- ⏳ The development of AI video technology is progressing rapidly, and capabilities like audio generation, inpainting, and 3D world creation are expected to become available in the near future, potentially within months.
- 🔍 AI video generators will enable users to search for specific elements within videos and extend or loop them seamlessly, creating new possibilities for creative expression.
- 🎬 The emergence of AI video technology will necessitate a reconsideration of traditional video production roles, as AI takes on more tasks traditionally performed by humans.
- 🚀 The potential of AI video technology is both exciting and daunting, with the possibility of AI generating entire movies or shows from a single text prompt being a potential future scenario.
Q & A
What is Sora and what does it generate?
-Sora is a video generator created by OpenAI, designed to generate videos from textual prompts.
Why is audio considered important in film production according to the script?
-Audio is deemed crucial in film production because it accounts for 50% of the experience, enhancing visuals with layers like actor voices, sound effects, and ambient sounds.
How does 11 Labs relate to Sora's release?
-11 Labs released a sound generator in response to Sora, aiming to complement Sora's video generation with audio creation for a full audiovisual experience.
What is the significance of being able to extend videos with Sora?
-Extending videos with Sora represents a novel capability, allowing for the creation of seamless transitions and extensions of video content that were previously not possible without extensive manual work.
What is the potential impact of Sora on video editing costs?
-Sora has the potential to drastically reduce video editing costs by simplifying complex processes like turning images into videos and creating high-quality content that would otherwise require significant time and resources.
How does Sora's editability challenge relate to client feedback?
-Sora's current limitation in making detailed edits based on client feedback could be a challenge, as it may not allow for minor adjustments without regenerating entire scenes.
What future tool integration could improve Sora's editability?
-Future tools could include features like inpainting and detailed prompting for video, similar to current AI image editing capabilities, to allow specific scene modifications without needing to regenerate everything.
How does Sora enable the creation of 'stories' from prompts?
-Sora can generate coherent and detailed stories from single text prompts, creating sequences of events or actions in video form that unfold according to the input narrative.
What does the script suggest about the future of individual video libraries?
-The script suggests that individuals will be able to generate bespoke video libraries tailored to specific projects, drastically lowering production costs and enhancing creative possibilities.
What implications does Sora have for the field of 3D world and world generation?
-Sora's capabilities suggest it could act as a 'world simulator,' offering the ability to generate consistent and detailed 3D environments, which could revolutionize fields like gaming, virtual reality, and film production.
Outlines
📽️ Introduction to AI Video Generator Sora
The video discusses Sora, an AI video generator released by OpenAI on February 15, 2024. The speaker, Eigor, a researcher in AI technology and a former video production company owner, shares his in-depth findings on Sora's capabilities beyond the initial hype. He has spent considerable time studying technical reports, watching YouTube videos, and scouring discussions on Twitter to uncover lesser-known aspects of this AI tool.
🔊 Sora and Audio Generation
Eigor explains that while Sora currently only generates muted videos, audio generation is a crucial component of video production. He notes that 11 Labs has already released a sound generator capable of creating entire soundscapes from text prompts. Eigor predicts that OpenAI will likely integrate Sora with an audio generator, resulting in an audiovisual generator that can produce complete videos with background music, sound effects, and even synthesized voices, providing a full-stack solution for audiovisual production.
🆕 New Capabilities of Sora
Eigor highlights two new capabilities of Sora that were not previously possible. First, Sora can extend videos, seamlessly generating frames before or after an existing clip, allowing users to expand the duration of a video. Second, Sora can create looping videos, generating additional frames that allow footage to loop indefinitely. These features open up new possibilities for creating animations and interactive content.
💰 Cost Reduction and Editing Capabilities
Eigor discusses how Sora's capabilities lead to significant cost reductions in video production. He explains that tasks like rotoscoping and animating images, which previously required hours of manual labor, can now be achieved much more efficiently with AI. Eigor also addresses concerns about editability, citing research on tools like the Multi Motion Brush Tool and inpainting techniques that allow for fine-tuning and editing of AI-generated videos.
🌎 World Generation and 3D Visualization
Eigor explores Sora's potential for generating entire 3D worlds and environments. He discusses technologies like Goshen splatting, which converts videos into 3D models, allowing for further manipulation and animation in game engines like Unity. Eigor also mentions Sora's ability to recreate environments like Minecraft, suggesting that it may eventually be capable of generating entire virtual worlds. He expresses both excitement and apprehension about these possibilities, emphasizing the need to stay up-to-date with AI advancements.
Mindmap
Keywords
💡Sora
💡Audio
💡Editing
💡B-roll
💡World Generation
💡Prompt Engineering
💡Stock Footage
💡Upscaling
💡Deepfakes
💡Augmented Reality (AR)
Highlights
Sora, the AI video generator by OpenAI, was released on February 15th, 2024, sparking significant interest in its capabilities beyond initial expectations.
11 Labs responded to Sora's release with a new sound generator capable of creating detailed soundscapes from text prompts, suggesting the possibility of fully generated audio-visual content.
Sora's ability to extend videos by generating new, seamless content before or after a given clip introduces a groundbreaking feature for video production.
The potential for looping videos created by Sora opens up new creative possibilities for content creators, including the idea of infinite, seamless video loops.
Sora's capabilities significantly lower the cost and technical barriers to producing high-quality video content, democratizing access to videography.
The integration of AI in video editing software is anticipated, offering features like video extension, looping, and possibly even detailed editing adjustments.
The ability to prompt entire narratives into existence with Sora marks a significant advancement in storytelling, potentially revolutionizing scriptwriting and content creation.
Current limitations in editing AI-generated videos, such as making minor adjustments, are expected to be overcome as technology evolves, mirroring advancements seen in AI image generation.
The comparison of Sora's current stage to the GPT-3 model of text AI suggests that we are on the brink of more advanced, intuitive video AI technologies.
Sora's release has prompted comparisons with existing AI technologies, indicating that AI video generation may have leapfrogged years ahead in terms of development.
The potential impact of Sora on the stock footage market and the ability for creators to generate custom video libraries for projects highlights a shift in how content is produced and sourced.
Sora's world simulator capabilities suggest a future where virtually any environment can be generated for video production, reducing the need for on-location shooting.
The prospect of generating 3D models from AI-generated videos opens new avenues for integrating AI content into gaming and virtual reality applications.
The rapid development pace of AI technologies like Sora raises questions about the future of content creation and the role of human creators in a predominantly AI-driven industry.
Sora's ability to generate content that closely mimics real-world environments and narratives from simple text prompts signifies a major leap forward in AI's creative potential.
Transcripts
Sora the video generator by open AI
released on February 15th 2024 and I've
spent pretty much every hour of my life
scouring the internet and researching
what else this could do and there's
actually a lot of things that weren't
obvious in the middle of all the hype
that accompanied the release of this AI
video generator I studied a technical
report on detail watched all the YouTube
videos spent an unhealthy amount of time
on Twitter looking for all the
discussions and the little findings
people had matter of fact since release
I didn't even leave the
apartment
if we haven't met yet I'm eigor I made
it my full-time calling to research what
AI has to offer and how to put it to
work in your everyday life and before
doing that with the a Advantage I had a
video production company that operated
for eight years in Central Europe I
helped clients with everything from
corporate video trainings to directing
smaller commercials and even shooting
festivals nightclub videos when it comes
to videography I've really seen it all
and this stuff is exactly in the middle
between technology and video production
so I can't wait to dive into all of this
all right so without further Ado let's
look at all the implications of Sora
that you might have not been aware of
right away okay so first of all I want
to talk about audio because Sora only
generates video right all the example we
saw
were muted without music or sound
effects in the background and a lot of
people rightfully pointed out that hey
in film it's really 50/50 at the very
least it's 50% visuals and another 50%
audio and there's many layers to that
right you might have the actor's voice
as one track but then there's also sound
effects of things happening around them
and then you have foli which is the
background sound that just persists
you're not really consciously aware of
it but it's there and if it's not there
the shot is missing something so surely
audio must be a complicated issue too
right well not really because 11 Labs
actually reacted to the Sora release and
they released a new sound generator that
from text prompts is able to generate an
entire soundscape okay so today we don't
have access right but if open AI hooked
up Sora to this audio generator you
would have a audio visual generator
where you create full soundscapes have a
quick listen
and sure a sound designer could do this
manually but again if you're a oneman
show and you're producing a commercial
like I did so so many times you're doing
everything yourself from planning to
recording editing doing the sound design
doing the color grading doing feedback
rounds with the client invoicing and
often times you don't have budget for a
sound designer so you bet that there's
going to be models I don't know if Sora
or others that combine both they're
going to give you audio visual outputs
this is not a question that's just a
straight fact at this point and with
tools like sun AI out there already that
can generate full songs including lyrics
at a decent quality with AI well you're
going to be able to generate the
background music the background sound
effects the voices that are in the scene
because voice generators are thing and
they're virtually indistinguishable
already right and now the video
components so we really have the full
stack for audiovisual production it's
just a question of time now and from my
estimate it looks to be months not years
till we'll get there okay my next point
is all about the capab abilities of Sora
that are actually brand new because a
lot of the stuff that we saw just
drastically reduce the cost of what it
takes to produce a clip like this or an
animated video like this you might be
aware that movies like this exist right
it just cost a lot of money to produce
this so first of all let's talk about
the things that are actually brand new
and not just a cost reduction although
that has its implications too and we'll
talk about that but the things that are
actually new are first of all you can
extend videos okay so this is
beautifully outlined in a technical
paper here and it shows the example of a
San Francisco subway car so as you can
see this clip is the same in all three
instances but if you back up a little
bit then extended the beginning of it
okay so as you can see the video
generated by Sora is different every
single time and it seamlessly
transitions into the subway car so this
is something that was not possible up
until now okay it generates this video
from scratch now I guess you could argue
that you could recreate this entire
scene in 3D and then create the frames
before that and seamlessly transition
into it but you have to realize that at
a certain point this is going to become
a feature in every editing software
right you'll have just an image and it
will turn it into a video and then you
can extend it to any duration you can
add a clip before add a clip after
you'll be able to turn your old family
photos into Vivid memories sort of that
is really scary but it's going to be a
thing and you bet apps like Instagram at
one point I don't know when are going to
have a feature where you're going to be
able to turn a photo into video and then
extend that indefinitely another new
capability is you're going to be able to
Lo videos okay and this is also
something that you could kind of but not
really achieved today definitely not in
this form okay you'll give it a video
clip and it will be generating extra
frames that will seamlessly let the
footage loop I had a good chat with a
friend and we kind of talked about how
this could be the new Rick rolling on
the
internet because if you do this to a
longer clip you just don't realize that
it's looping and that it's just playing
forever so you could send somebody a
clip and it might take them minutes to
realize that the whole thing is looping
and just repeating over and over again
anyway this is something that was was
not really possible and some people went
ahead and tried this anyway in
videography there was this whole Trend a
few years back where people were trying
to seamlessly transition one thing into
another like for
example and my shirt is gone magic now
those are the simplest way to do it but
here we will have the capability of
generating brand new frames and things
will be able to Loop indefinitely okay
so those are the new features you can
expect in editing software somewhere
down the line but then there's a lot of
the ones that are just simple cost
reduction this is why people refer to it
as the death of Hollywood in many cases
now I don't know if that's an accurate
assessment in my opinion I think they're
going to use this Tech to Advantage to
lower the prices of production and pump
out even more content we'll also talk
about that soon but let's finish up the
segment and talk about the things that
were already available but now it's just
a 10,000x reduction cost for that
calculation I see a subscription price
that is somewhere around the GPT plus
plan so what's going to be possible at
this super low cost is first of all
generating images we're able to do that
with other image generators right sure
these are hyper realistic and very high
quality just like M journey and so but
then it's capability to turn images into
videos that is very very big in my
opinion because it's going to make it so
easy to craft compelling videos like I
feel like most people that talk about
this don't appreciate how much this is
going to lower the barrier for entry for
videography and high quality videography
that is because you're going to get
access to things like this so even if
you've seen this before I think I have a
bit of a different perspective here so
look here on the left you have the Drone
image here on the right you have this
butterfly right and here in the middle
you have the mix of the two where the
Drone is flying through something like
the Coliseum and then it morphs into a
butterly fly and look I could do this
today okay this just takes about 3 to 5
hours of work dependent on your skill
level you just go into after effects and
you rotoscope out this butterfly meaning
you go frame by frame that's 25 frames
every single second and you make sure
you animate a mask exactly in the form
of the Butterflies wings and you redo
that for every movement now yes there's
tools that help you but a lot of times
you're stuck with manual labor there so
it might just turn out that the 3 to 5
hour task turns it into 15 20
hours and then you can bring the
butterfly into here and morph it into
the Drone with something like a morph
cut inside of Premiere Pro now if none
of that means anything to you that's
fine I'm just saying hours of work are
going to be done like
this and this is just one simple example
in many others a oneman crew could never
do this right all these animation
related examples where they turn an
image into an animation like this are
usually just not feasible for a oneman
show it takes too much time to animate
all the little things you might be able
to do it for a few shots but if you do a
whole one minute trailer you'll find
that you spend 2 weeks at the computer
if you really animate all the little
details like in this shot and you have a
lot of different shots so that's my
second point it lowered the bar by a
factor that is larger than most people
realize I don't know if it's 1,000x or
10,000x but a lot of these things were
Unthinkable for small Crews or oneman
shows and now they will be doable like
for example before
after Okay so this point is all about
the editability of the video and here in
Twitter Owen Fern went ahead and he
criticized the fact that hey yes these
Generations are absolutely incredible
but what if the client has feedback and
this is very very appropriate criticism
in my opinion because clients always
have feedback and if you're going to use
this for job if this is supposed to be
the death of Hollywood just between
directors and producers there is so much
feedback going on in the post-production
of any advertisement movie heck even if
it's an event video I had clients that
went back and forth 10 times and gave
feedback over and over again and I had
to adjust things so one points out here
that yeah there's going to be a lot of
little details that will need to be
changed about these scenes and with Sora
you're not really able to go back and
change little details right you're going
to have to regenerate the whole scene
and maybe you like the character here
but you just don't like the fact that
this is not a Thum it just looks like a
fifth finger and we would like to give
it a look of a Thum can we do that and
his point is the answer has to be no and
then you have a dissatisfied client
which is a very fair point but as I've
been following this very closely over
the last months there's one tool and one
research that needs to to be pointed out
here okay first things first Runway ml
the previous so to say leader in AI
video a few weeks ago introduced a
feature called multi motion brush tool
which allowed you to use multiple
brushes on the video to just animate
specific parts now that is for animation
but over in M journey and many other
image generators you're able to do
something called inpainting where you
just paint in a little part of the image
and then edit just that you can reprompt
it so on images today you could actually
go in and just paint in this Thum and
say regenerate the Thum why would that
not be possible on video eventually it
will be and further than that bite Dan
the creator of Tik Tok actually
published a research paper less than a
week ago about this so-called boxor okay
so I didn't cover it on the channel
because I like to cover things that are
available today or truly truly
revolutionary this kind of Falls in this
in between zone of hey really
interesting but it's not available and
in my eyes probably not worth a
dedicated video but look the whole point
of this is you draw different boxes in
the scene and thereby you can control
the seen in great detail so if you
select the balloon and say it's going to
fly away in this direction and then you
select a girl and she's going to run in
a different direction exactly that is
going to happen so between tools like
the box imator and inating in mid
Journey it's just a question of time
where you're going to be able to use a
mix of these tools and also in paint on
top of AI video now sure there's going
to be a temporal axis there right
because on images you only have the X
and Y AIS and in video there's also the
time axis and sometimes you even have
movement in zspace but between This
research and painting I can totally see
that happening for AI video 2 down the
line plus as we know with prompt
engineering today for language based
models there's a lot of control that you
have in the text prompt you just have to
be really detailed if you look at a lot
of these prompts they're good but
they're not as detailed as they could be
some of the best stable diffusion
prompting is extremely detailed also in
mid journey in stable diffusion if you
keep your prompts relatively simple
you're going to get varied results even
if you roll the dice and create a new
scene it's going to be very similar plus
let's refer back to Mid Journey again
they just recent recently announced a
new character tool where it's going to
maintain character consistency based on
a character that you pick in a tool so
all of these AI image features that
we've been talking about and I've been
tracking regularly they're going to
apply to video tool it's just going to
take longer but I absolutely believe
that we'll be able to implement all of
this little feedback into AI video and
therefore this actually being production
ready at some point okay so my next
Point here is that I didn't expect right
in a beginning is that you can prompt
stories into existence from a single
prompt okay so here's an example from
Bill PE from the open AI team and he
generated an entire story of two dogs
that should walk through NYC then a taxi
should stop to let the dogs pass across
walk then they should walk past the
pretzel and hot dog stand and finally
they should end up at Broadway signs and
if you follow this channel you might
know how much context you can add text
prompts to achieve exceptionally
accurate results from things like chat
GPT if you added way more details here I
believe they would be reflected in it
and then the story can develop and as
right now you already have tools that
can manipulate someone's mouth to speak
in another language so it looks
naturally also that will be possible
here so you will be able to create these
long shots like they have in movies
which are incredibly difficult to
achieve I mean some movies like Dunkirk
took it so far where they turned the
movie into a single Take It All flows
seamlessly and Sora is able to do it too
and that I didn't expect at the
beginning also they didn't share this
example right off the bat I think this
is actually very very impressive and if
now we're already able to generate
stories from a single Simple Text prompt
it's just a question of time until we
arrive at something like this where you
just type in a prompt and you get a full
movie back or a full show I mean at some
point it's just a question of having
enough gpus this is obviously just a
mockup but something to think about
especially because this is the worst
teack is ever going to be and you know
what let's talk about that point that is
actually my next one so where are we in
the timeline of this okay it was really
helpful to look into some of the
discussions that are happening online to
orient myself in terms of where we
actually are today soad most St from
stability AI actually had a fantastic
take here he compared this to the gpt3
of video model models so if you didn't
know gpt3 was the predecessor to chat
GPT okay it was available before but the
interface was not as intuitive and you
actually had to prompt it differently
rather than cat gbt that had
reinforcement learning for human
feedback which means a lot of humans
feedbacked the outputs to make it more
user friendly for humans and that's
where this is at right now okay it's not
at the cat GPD point where it's going to
be really easy to use and it's going to
gain Mass popularity and then we got
gbd4 and all the additional features and
it's just crazy capable now and he even
said that all the images generators like
stable diffusion were more comparable to
gpt2 where the quality of the output was
not nearly as good as gpt3 so as in
large language models this puts us on
the timeline somewhere in the middle of
2022 because the chat gbt gbt 4 llama
and mistrals will come over the next few
years we Rems at the pace that we're
moving ahead right and on this topic
there's another fantastic Fred by Nick
samier here on X and he ran all the
exact prompts that Sora generated
through my journey and then paired them
with the results and the thing is
they're shockingly similar right so
people are already joking that hey is my
journey just open AI disguised probably
they're just using very similar training
data right but look at that all of these
examples are very similar now I'm sure
these are the ones that were the most
similar right to create this illusion of
it essentially being the same model here
I mean if you look closer the beaver is
very different but the point is these
are not night and day right sure these
helmets are completely different but the
Cinematic look is very similar with
slightly different color grading down
here fair but the point that I'm trying
to make here is that we literally
skipped two to three years ahead in AI
video because what we had up pela was
something like gpt1 or
gpt2 oh that's hot now we got gpt3 that
is actually usable and can create useful
outputs that are essentially hyper
realistic but we're not even at the chat
GPT moment yet where you get editability
and things like audio generation that we
talked about here that is all yet to
come but again at this pace of
development we should probably be
thinking in days and and weeks and maybe
months and not years or decades I guess
that poses the question at which point
in the development do we reach the
Matrix and I don't know the answer to
that question I'm turning 30 next month
and it does feel like it will happen in
this lifetime or something akin to that
right who knows moving on okay so my
next Point goes back to my original
video where I stated that you know this
is going to be the death of stock
footage I sell it myself since almost
decade and there's just no way people
are going to be paying $50 or $100 per
clip if they can just generate them for
a few cents and yeah I think that one is
an obvious one but beyond that it really
got me thinking about what this means
for video creation especially for the
smaller cruise and oneman shows well
you're going to be able to generate
entire video libraries for yourself hear
me out so right now if you have a video
let's say this is the a roll right this
is the main story of the video me
talking presenting to you all my
findings and then on top of that we have
something that we refer to as broll
these are the clips that are there to
add an additional layer of information
they add visual interest keep you more
more engaged and really allow us to get
the most out of this audiovisual medium
and right at this very moment you're
consuming both audio and video at the
same time so we're trying to make the
most out of all these layers I do my
best to keep my speech and presentation
concise because I value your time and
then in the editing we do our best to
add as much information on top and right
now that is done for boll so we pay for
various libraries where we take these
shots that enhance our videos and we
also pay for various music libraries to
add the right type of music to enhance
the atmosphere of the video but with
models like Sora this will really change
the game because you're going to be able
to generate an entire library for
yourself for that specific project
because the cost goes down so much
you're going to be able to prompt things
into existence that beforehand you would
have to research download and compile
and usually they don't even match and
you have to do color correction and
color grading on top of them and here as
you can see from a single text prompt we
got five video frames and all of these
can be upscaled with something like
topas video AI right that tool is paid
they cost a few hundred doar but you can
upscale 1080p Clips to 4K with AI really
effectively but here you're just going
to be able to prompt them and then again
just looking over at all the AI Imaging
tools all the features that we see in
the Imaging tools are going to be
available to the video tools so
something like a oneclick upscale to 4K
quality is going to be there can you
regenerate this or can you generate four
more just like this is going to be there
you can think about the whole mour
interface in Discord being something
that you can do with these videos
upscale reroll more like this use a
different ver version of the model and
after a few minutes you'll have a whole
library of Boll that can enhance your
video now I as a video creator can't
wait for this I know that eventually the
end point of all of this is the
technology really replacing a lot of
content and who knows if I'll be sitting
here and presenting the news to you if
an AI can do it in real time minutes
after the release of something and you
will be able to get it exactly in the
voice that you prefer while it also
respects your context right so in this
video I kind of have to assume your
knowledge level right so at certain
points I also have to assume that
somebody never created a video before
but some of you might be experienced
directors that know all these Concepts
and know how the industry works well the
AI is eventually going to be able to
create that exactly for your context but
I digress the point here is that at
least for the footage at least for the
production of this video I could have a
custom library that is going to enhance
all the visuals and maybe we could be
taking a trip through Tokyo as of now
where I present these ideas there's
going to be some point where I'm just
going to be able to take my voice and
use my digital Avatar let him walk
through Tokyo and explain these Concepts
in a very practical manner without ever
leaving my desk I don't think at this
point that is a stretch a week or two
ago it seemed a bit unreal to think of
lifelike video the best we had was
animations that were good and talking
head videos that looked okay they looked
convincing for a second or two if you
weren't looking for AI but again if this
is the gpt3 of AI video then what is the
chat GPT and the GPT 4 going to look
like that's what I'm already thinking
about and some of these Advanced
capabilities are outlined in the
technical paper too here here it clearly
states that you're going to be able to
create videos in any format okay so from
1920 * 1080 to 1080 * 1920 so you know
phone format all the way to WID screen
and then cropping into cinematic formats
from this is easy right all you need to
do is add black bars at the top and
bottom and you have all the Cinematic
format so really there's going to be a
lot of variability and you're going to
be able to get exactly the b-roll that
you need for your project and then
eventually AI is going to be creating
the scripts and editing the video itself
according to all the other videos it saw
and how they were edited right I mean
that might take a lot of time and we do
so much manual work with these videos
that there's always going to be a style
expression and a handwriting to the
post- production of a video I think but
it's crazy to see that you know a week
ago thinking about the fact that you
would have a library of b-roll for a
specific video well you had to go out
there and shoot it in the real world or
you had to purchase stock footage and
then it was scattered and all over the
place here you're going to be able to
get the best of both worlds going to get
great b-roll and all from the same scene
and it's going to cost virtually nothing
or if you have some b-roll that you
already use going to be able to extend
that or maybe you have some phone
pictures and you're going to turn those
into b-roll it's really a whole new
world for video production I I can't
overstate that but it doesn't end there
and this brings me to my last point
which is 3D World and World Generation
because in the technical paper they
actually refer to this as a world
simulator and I think that's a big claim
but it's also a Justified one because if
you take some of the clips at face value
it's incredible it's temporarily
consistent the these houses are not
warping right you're moving through the
scene like a drone would you have these
people on their horses going about their
daily business it's incredible but what
you have to realize is that beyond that
you can apply this in something like
goshan splatting which simply put is a
technology that creates this so-called
Gan Splat that is a 3D representation of
the video in even simpler terms it turns
a video into a 3D model and this is what
it looks like in practice now look this
is a simple video that wasn't even
intended for this purpose but you could
easily imagine a drone shot where the
Drone parallaxes around the subject and
gets it from all angles and then you can
create 3D objects of something that
doesn't even exist so right here manov
Vision took exactly this drone clip and
he recreated it as a goshan Splat and
then brought it into Unity a real-time
game engine and then you can animate the
camera and insert characters and do all
sorts of things right the important fact
here is that Sora doesn't have to do
everything from A to Z you can still
have a human write the script you can
still have a human in front of a green
screen acting it out you can have your
favorite actors in these scenes but it's
going to be so much cheaper to produce
because you're just going to generate
old environments like this and then
everything is going to be shot in front
of a green screen until AI perfectly
synthesizes the actor's voices which if
you follow this channel you know that it
already has and then the last missing
piece is really the human part it's
character consistency and the ability to
edit little details so it aligns with
the vision of everybody involved in the
movies creation and then if you take
that thought experiment even a step
further you end up in Minecraft because
in the technical paper you can see these
that are not recorded from with in
Minecraft these have been generated by
Sora by simply including the word
Minecraft in the prompt it saw so much
Minecraft footage that it was able to
recreate Minecraft perfectly and if it
can do it with Minecraft now how long
until it will do it to all of this world
I don't know but I'm scared and excited
at the same time but one thing is for
sure I want to stay on top of all of
this I'm going to keep my eye on it and
if you want to follow me along for the
ride subscribe to this channel subscribe
to our Weekly Newsletter that is
completely free and keeps you up to date
once a week with all the Revolutionary
breakthroughs and that's really all I
got for today except if you want to try
out Sora there is actually a very very
limited demo here on this page if you
haven't tried this yet I recommend it
because it's the closest you can get to
trying it and it's this little interface
here where you can change these
variables so you can go from an old man
to an adorable kangaroo and then there's
a few more variables that you can change
out here okay Antarctica and for now
this is the closest we get to playing
with this thing so I hope you enjoy this
let me know which one of these was new
or interesting to you and if you have
even more facts that I might have not
considered yet also leave those below
and if you haven't seen the original
video about the announcements and all
the video clips they presented that is
over here all right I can't wait to see
how this develops and what the
competition comes up with this is a
whole new world and I'm here for it see
you soon
Ver Más Videos Relacionados
OpenAI released their new text-to-video model called Sora which generates the best video I've seen!
Watch Out for the Best Text-to-Video AI Software on the Internet
GOOGLE Genie SCIOCCA l'industria dello spettacolo
How To Use OpenAI Sora Video Generator 2024 (AI Text-to-Video)
Top 10 AI Tools You Need to Know in 2024 – #4 Will Shock You 😱!!
OpenAI's Sora: How It Will Revolutionize the Future | AI Trends
5.0 / 5 (0 votes)