Google's LUMIERE AI Video Generation Has Everyone Stunned | Better than RunWay ML?
Summary
TLDREl script presenta Lumiere, la última herramienta de IA de Google, capaz de generar videos a partir de texto. Lumiere no solo convierte texto en video, sino que también permite animar imágenes existentes, crear videos en el estilo de una imagen o pintura, y realizar animaciones específicas dentro de imágenes. La tecnología detrás de Lumiere, basada en un modelo de difusión espacial-temporal, garantiza una coherencia temporal en los videos. Además, se explora cómo los modelos de IA generan videos y si aprenden algo más allá de las estadísticas superficiales, lo que podría cambiar el futuro de la producción de video y la interacción con la IA en el mundo físico.
Takeaways
- 🌟 Google lanzó Lumiere, una herramienta de IA que convierte texto en video.
- 🎨 Lumiere permite animar imágenes existentes y crear videos en el estilo de una imagen o pintura específica.
- 🤖 La inteligencia artificial de Lumiere genera videos con una consistencia temporal mejorada, lo que significa que las escenas tienen una coherencia a lo largo de los frames.
- 📜 Google publicó un documento explicando las mejoras en Lumiere, incluyendo un modelo de difusión espacial-temporal para la generación de videos realistas.
- 🔮 Lumiere utiliza una arquitectura Spacetime Unit, que planifica la generación de todo el video de antemano, en lugar de frame por frame.
- 🎭 Además de la conversión de texto a video, Lumiere ofrece funciones como la animación de secciones específicas dentro de las imágenes y el video en pintura.
- 📹 La tecnología de Lumiere es capaz de realizar la stylización de video, cambiando el estilo de elementos visuales según sea necesario.
- 🔍 Se discute la naturaleza de cómo los modelos de IA generan imágenes y videos, con debates sobre si aprenden más que estadísticas superficiales o si hay un entendimiento más profundo.
- 🏆 Según estudios, Lumiere supera a otros modelos de punta en preferencia de usuario en la generación de texto a video e imagen a video.
- 🚀 La IA en la producción de video está progresando rápidamente, lo que podría tener un impacto significativo en la industria del cine y la televisión en las próximas décadas.
- 🌐 Runway ML, otro modelo de IA líder en la generación de texto a imagen, está trabajando en modelos de mundo generales para mejorar la coherencia y realismo en la generación de contenido.
Q & A
¿Qué es Lumiere de Google y cómo funciona?
-Lumiere es una herramienta de Inteligencia Artificial lanzada por Google centrada en la generación de videos a partir de texto. Funciona mediante un modelo de redes neuronales que traduce texto en video, permitiendo animar imágenes existentes y crear videos en el estilo de una imagen o pintura específica, entre otras funcionalidades.
¿Qué es el Spacetime diffusion model y cómo se relaciona con Lumiere?
-El Spacetime diffusion model es un modelo de generación de video realista utilizado en Lumiere. Se trata de un enfoque que permite generar la duración temporal completa de un video de una vez, lo que ayuda a mantener una consistencia temporal global, a diferencia de otros modelos que trabajan fotograma por fotograma.
¿Cómo Lumiere mejora la consistencia temporal en los videos generados?
-Lumiere logra una mayor consistencia temporal mediante su SpaceTime unet architecture, que genera la totalidad de la duración del video desde el principio, en lugar de crear fotogramas clave distantes y luego resolver la secuencia temporal, lo que a menudo resulta en cambios drásticos y falta de coherencia en el video.
¿Qué es 'video en pintura' y cómo lo implementa Lumiere?
-El 'video en pintura' es una técnica donde una parte de una imagen faltante es suplida por el AI para completar la escena. Lumiere implementa esta funcionalidad haciendo que el AI asuma y complete la escena basándose en pistas de la imagen visible, permitiendo la creación de una narrativa visual más completa.
¿Cómo Lumiere maneja la animación de secciones específicas dentro de las imágenes?
-Lumiere permite la animación de secciones específicas dentro de las imágenes, una funcionalidad conocida como 'cinemagraphs', donde solo ciertos elementos de la imagen se mueven, creando una ilusión de vida en una escena estática.
¿Qué es una 'world model' y cómo se relaciona con el futuro de la IA según Runway ml?
-Una 'world model' es un sistema de IA que construye una representación interna de un entorno y lo usa para simular eventos futuros dentro de ese entorno. Runway ml sugiere que el siguiente gran avance en IA vendrá de sistemas que comprenden el mundo visual y sus dinámicas, promoviendo la creación de modelos que simulen mundos completos y capturen la realidad con mayor profundidad.
¿Cómo Lumiere se compara con otros modelos de generación de video en términos de preferencia del usuario?
-Según los estudios mencionados en el guion, los videos generados por Lumiere son preferidos por los usuarios en comparación con otros modelos de generación de video actuales, como Pika y Genan 2, en ambos text to video e image to video generation.
¿Qué es 'stylized generation' y cómo lo implementa Lumiere?
-La 'stylized generation' es la capacidad de un modelo de IA para crear videos o imágenes en un estilo específico, como el de una pintura o una imagen de referencia. Lumiere implementa esto al utilizar una imagen de referencia para influir en la apariencia y el estilo del video generado.
¿Cómo Lumiere contribuye a la simplificación de la producción de videos para personas comunes?
-Lumiere contribuye a la simplificación de la producción de videos al permitir a las personas crear contenido de alta calidad sin las limitaciones financieras tradicionales. La generación de videos y voces AI, junto con la asistencia en la escritura de historias, permite a cualquier persona con talento creativo crear narrativas visuales sin grandes inversiones.
¿Qué cambios ha habido en la calidad de la generación de videos AI en el último año según el guion?
-Según el guion, en el último año ha habido un avance significativo en la calidad de la generación de videos AI. Hace un año o año y medio, los videos generados por IA presentaban formas bloqueadas y faltaba coherencia entre escenas, mientras que ahora los videos generados por Lumiere y otros modelos son mucho más consistentes y realistas.
¿Qué papel juegan los modelos de 'generative models' en la comprensión de la IA y cómo se relaciona esto con la investigación de Google?
-Los 'generative models' son modelos de IA capaces de crear contenido original basado en datos de entrada. La investigación de Google, junto con otros estudios, busca entender si estos modelos aprenden más que solo estadísticas superficiales, es decir, si hay algún tipo de comprensión o conocimiento más profundo que se está desarrollando en la IA, más allá de simples correlaciones entre píxeles y palabras.
Outlines
🚀 Lanzamiento de Lumiere, la nueva herramienta de AI de Google
Google ha lanzado Lumiere, una herramienta de inteligencia artificial que convierte texto en video. Esta modelo AI no solo traduce texto a video, sino que también permite animar imágenes existentes, crear videos en el estilo de una imagen o pintura y realizar animaciones específicas dentro de imágenes. Además, Lumiere utiliza un Spacetime diffusion model para mejorar la consistencia temporal en el video. Los videos generados a partir de prompts como 'bandera de EE. UU. ondeando' o 'oso caminando en Nueva York' muestran una gran consistencia y calidad. También se discuten las mejoras en la investigación de Google y la complejidad de cómo las redes neuronales AI generan estos videos.
🎨 Transformación de la producción de video con Lumiere
El script explora cómo Lumiere puede cambiar la producción de video en películas y series de televisión, permitiendo a las personas crear contenido de alta calidad en casa. Se menciona un estudio de Harvard titulado 'Beyond surface statistics', que cuestiona si los modelos AI aprenden más allá de las estadísticas superficiales al crear imágenes y videos. El estudio sugiere que estos modelos pueden estar desarrollando una comprensión más profunda de la geometría de la escena a pesar de solo haber sido entrenados con imágenes 2D, lo que podría ser una nueva dimensión en el aprendizaje profundo de AI.
🤖 Avances en la generación de video y modelos del mundo
Se discuten los avances en la generación de video con herramientas como Runway ml y cómo estas plataformas están introduciendo modelos del mundo para mejorar la coherencia y realismo en los videos generados por AI. Runway ml está llevando a cabo un esfuerzo de investigación a largo plazo para desarrollar 'General World models' que comprendan la visual y dinámicas del mundo, lo que podría ser el siguiente gran avance en la IA. También se comparan los resultados de Lumiere con otros modelos líderes en la industria, demostrando su superioridad en términos de consistencia y calidad.
📊 Comparación de modelos de generación de video
El script presenta una comparación detallada de los modelos de generación de video Lumiere, Pika y Gen 2, mostrando cómo Lumiere supera a sus competidores en varios aspectos como la consistencia temporal y la calidad de la imagen. Se incluyen ejemplos de prompts y cómo cada modelo los interpreta, destacando la capacidad de Lumiere de mantener la coherencia y la precisión en la representación de los elementos del prompt en el video generado.
🌟 La próxima revolución en la creación de contenido con AI
El video script concluye destacando la importancia de las herramientas de IA como Lumiere para la creación de contenido en el futuro. Se sugiere que la próxima generación de creadores de contenido podría utilizar estas herramientas para producir obras de arte y narrativas de alta calidad sin las limitaciones financieras tradicionales. Además, se menciona la posibilidad de utilizar simulaciones para crear historias donde los personajes y escenarios se desarrollan de manera orgánica, permitiendo a los creadores seleccionar los momentos y narrativas más impactantes.
Mindmap
Keywords
💡Lumiere
💡Modelo de difusión espacial-temporal
💡Consistencia temporal
💡Imagen a video
💡Estilización de video
💡Cinemagraph
💡Video en pintura
💡Aprendizaje profundo
💡Modelos de mundo generales
💡Generación de contenido
Highlights
Google lanza su nueva herramienta de inteligencia artificial Lumiere, un modelo de IA de texto a video.
Lumiere permite animar imágenes existentes y crear videos en el estilo de una imagen o pintura.
Google publicó un documento sobre las mejoras en Lumiere, incluyendo un modelo de difusión espacial-temporal para la generación de videos realistas.
Lumiere ofrece consistencia temporal en los frames, una mejora significativa en comparación con otros modelos.
Se muestra la capacidad de Lumiere para animar imágenes en IM a través de ejemplos como un oso caminando en Nueva York.
Lumiere puede generar estilos utilizando una imagen de referencia, creando videos con consistencia estilística.
Se introduce una nueva arquitectura de Spacetime unit que planea la generación de todo el video de una vez, en lugar de frame por frame.
Lumiere también permite la estilización de video, transformando fuentes de video en diferentes estilos.
La función de cinemagraphs de Lumiere permite animar solo ciertas partes de una imagen, como el humo de un tren.
Video y pintura de Lumiere usa el AI para completar partes faltantes de una imagen, como hojas verdes.
Lumiere puede cambiar la ropa de un personaje a lo largo de varias tomas basándose en un prompt de texto.
Se cuestiona cómo los modelos de IA transforman conceptos en imágenes, sugiriendo que podrían estar aprendiendo algo más profundo que solo estadísticas superficiales.
Un estudio de 'Beyond surface statistics' explora si los modelos de IA aprenden una comprensión más profunda de los objetos y la posición.
Los modelos de IA parecen ser capaces de crear representaciones internas relacionadas con la geometría de la escena, a pesar de solo ser entrenados con imágenes 2D.
Runway ml, otro modelo de IA de texto a imagen, permite la creación de películas generadas completamente por IA.
Runway ml introduce modelos de mundo generales para mejorar la coherencia y realismo en la generación de videos.
Lumiere se compara con otros modelos de punta, mostrando una preferencia significativa de los usuarios por la calidad de los videos generados por Lumiere.
Lumiere ofrece una mejora significativa en la consistencia y realismo de los videos generados por IA en comparación con tecnologías anteriores.
El avance en la tecnología de IA sugiere un futuro donde las personas pueden crear películas de estilo Hollywood con facilidad utilizando herramientas de IA.
Transcripts
and just like that out of the blue
Google drops its latest AI tool Lumiere
Lumiere is at its core a text to video
AI model you type in text and the AI
neural Nets translate that into video
but as you'll see Lumiere is a lot more
than just text to
video it allows you to animate existing
images creating video and the style of
that image or painting as well as things
like video in painting and creating
specific animation sections within
images so let's look at what it can do
the science behind it Google published a
paper talking about what they improved
and I'll also show you why the
artificial AI brains that generate these
videos are much more weird than you can
imagine so this is lumere from Google
research A Spacetime diffusion model for
realistic video generation we'll cover
SpaceTime diffusion model a bit later
but right now now this is what they're
unveiling so first of all there's text
to video this is the video that are
produced by various prompts like US flag
waving on massive Sunrise clouds funny
cute pug dog feeling good listening to
music with big headphones and Swinging
head Etc snowboarding Jack Russell
Terrier so I got to say these are
looking pretty good if these are good
representations of the sort of style
that we can get from this model this
would be very interesting so for example
take a look at this one astronaut on the
planet Mars making a detour around his
base this is looking very consistent
this looks like a tablet this looks like
a medicine tablet of some sort floating
in space but I got to say everything is
looking very consistent which is what
they're promising in their research it
looks like they found a way to create a
more consistent shot across different
frames temporal consistency as they call
it here's image to video so as you can
see that this is nightmarish but that's
that's the scary looking one but other
than that everything else is looking
really good so they're taking IM images
and turning them into animations little
animations of a bear walking in New York
for example Bigfoot walking through the
woods so these were started with an
image that then gets animated these are
looking pretty good here are the Pillars
of Creation animated right there that's
uh pretty neat kind of a 3D structure
they're showing styliz generation so
using a Target image to kind of make
something colorful or animated take a
look at this elephant right here one
thing that jumps out at me is it is very
consistent there's no weirdness going on
in a second we'll take a look at other
leading AI models that generate video
and I got to say this one is probably
the smoothest looking one here's another
one so as you can see here here's the
style reference image so they want this
style and then they say a bear twirling
with delight for example right so then
it creates a bear twirling with delight
or a dolin leaping out of the water in
the style of this image here's the same
or similar prompts with this as the
style reference now this as a the style
reference I got to say it captures the
style pretty well here's kind of that
neon phosphorus glowing thing and they
introduce A Spacetime unit architecture
and we'll look at that towards the end
of the video but basically it sounds
like it creates sort of the idea of the
entire video at once so while other
models it seems like kind of go frame by
frame this one has sort of an idea of
what the whole thing is going to look
like at the very beginning and there's a
video stylization so here's a lady
running this is the source video and the
various craziness that you can make her
into the same thing with a dog and a car
and a bear cinemagraphs is the ability
to animate only certain portions of the
image like the smoke coming out of this
train this is something that Runway ml I
believe recently released and looks like
Google is hot on their heels creating
basically the same ability then we have
video and painting So if a portion of an
image is missing you're able to use AI
to sort of guess at what that would look
like I got to say so here where the hand
comes in that is very interesting cuz
that seems kind of advanced cuz notice
in the beginning he throws the Green
Leaf in the missing portion of the image
and then you see him coming back to the
image that we can see throwing a green
leaf or two so it makes the assumption
that hey the things there will also be
green leaves interestingly enough though
I do feel like I can spot a mistake here
the leaves that are already on there are
fresh looking as opposed to the cooked
ones like they are on this side so it
knows to put in the green leaves as the
guy is throwing them for them to be
fresh because it matches the fresh
leaves here but it misses the point that
hey these are cooked leaves and these
are fresh but still it's very impressive
that it's able to sort of to sort of
guess at what's happening in that moment
and this is where if you've been
following some of the latest AI research
this is where these neural Nets get a
little bit weird well again come back to
that at the end but how they are able to
predict certain things like what happens
here for example like no one codes it to
know that this is probably a cake of
some sort nobody tells it what this
thing is it guesses from clues that it
sees on screen but how does that is
really really weird let's just say that
this is pretty impressive so here we're
able to change the clothes that the
person is wearing throughout these shots
while you know notice the hat and the
face they kind of remain consistent
across all the shots whereas the dress
is changed based on a text prompt as you
watch this think about where video
production for movies and serial TV
shows Etc where that's going to be in 5
to 10 years will something like this
allow everyday people sitting at home to
create stunning Hollywood style movies
with whatever characters they want
whatever settings they want with'
generated video and AI voices we can
create a movie starting Hugh Hefner as a
chicken for example so really fast this
is another study called Beyond surface
statistics out of hardw so this has
nothing to do with the Google project
that we're looking at but this paper
tries to answer the question of how do
these models how do they create images
how do they create videos as you can see
here it says these models are capable of
synthesizing high quality images but it
remains a mystery how these networks
transform let's say the phrase car in
the street into a picture of a car in a
street so in other words when we type in
this when a human person says draw a
picture of a car in a street or a video
of a car in a street how does that thing
do it how does it translate that into a
picture do they simply memorize
superficial correlations between pixel
values and words or are they learning
something deeper such as the underlying
model of objects such as cars roads and
how they are typically positioned and
there's a bit of a argument going on in
the scientific Community about this so
some AI scientists say all it is is just
sort of surface level statistics they're
just memorizing where these little
pixels go and they're able to kind of
reproduce certain images Etc and some
people say well no there's something
deeper going on here something new and
surprising that these AI models are
doing so what they did is they created a
model that was fed nothing but 2D images
so images of cars and people and ships
Etc but that model it wasn't taught
anything about depth like depth of field
like where the foreground of an image is
or where the background of an image is
it wasn't taught about what the focus of
the image is what a car is ETC and what
they found is so here's kind of like the
decoded image so this is kind of how it
makes it from step one to finally step
15 where as you can see you can see this
is a car so a human being would be able
to point at this and say that's a car
what in the image is closest to you the
person taking the image you say well
probably this wheel is the closest right
this is the the kind of the foreground
this is the main object and that's kind
of the background that's far far away
and this is close right but the reason
that you are able to look at this image
and know that is because you've seen
these objects in the real world in the
3D world you can probably imagine how
this image would look if you're standing
off the side here looking at it from
this direction this AI model that made
this has no idea about any of that all
it's seeing is a bunch of these 2D
images just pixels arranged in a screen
and yet when we dive into try to
understand how it's building these
images from scratch this is what we
start to notice so early on when it's
building this image this is kind of what
the the depth of the image looks like so
very early on it knows that sort of this
thing is in the foreground it's closer
to us and this right here the blue
that's the background it's far from us
now looking at this image you can't
possibly tell what this is going to be
you can't tell what this is going to be
till much much later maybe here we can
kind of begin to start seeing some of
the lines that are in here but that's
about it you you see like the wheels and
maybe you could guess of what that is
but here in the beginning you have no
idea and yet the model knows that
something right here is in the
foreground something's in the background
and towards the end it knows that this
is closer this is close and this is far
this is Salient object meaning like what
is the focus what is the main object so
it knows that the main object is here it
doesn't know what a car is it doesn't
know what an object is it just knows
like this is the the focus of the image
again only towards much later do we
realize that yes in fact this is the car
and so this is the conclusion of the
paper our experiments provide evidence
that stable diffusion model so this is
an image generating model AI although
solely trained on two-dimensional images
contain an internal linear
representation related to scene geometry
so in other words after seeing thousands
or millions of 2D images inside its
neural network inside of its brain it
seems like and again this is a lot of
people sort of dispute this but some of
these research makes it seem like it's
developing its neural net that allows it
to create a 3D representation of that
image even though it's never been taught
what 3D means it uncovers a salent
object or sort of that main Center
object that it needs to focus on versus
the background of the image as well as
information related to relative depth
and these representations emerge early
so before it starts painting the colors
or the little shapes or the the wheels
and the Shadows it first starts thinking
about the 3D space on which it's going
to start painting that image and here
they say these results add a Nuance to
the ongoing debates and there are a lot
of ongoing debates about this about
whether generative models so these AI
models can learn more than just surface
statistics in other words is there some
sort of understanding that's going on
maybe not like human understanding but
is it just statistics or is there
something deeper that's happening and
this is Runway ml so this is the other
one of the leading sort of text 2 image
AI models and you might have seen the
images so as you can see here this is
what they're offering people have made
full movies maybe not hour long but
maybe 10 minutes 20 minute movies that
are entirely generated by AI so as you
can see here it's it's similar to what
Google is offering although I got to say
after looking at Google's work and then
this one Google's does seem just a
little bit more consistent I would say
there seems to be a little bit less
shifting and and shapes going on it's
just a little bit more consistent across
time time and they have a lot of the
same thing like this stylization here
from a reference video to this image
that's like the style reference but the
interesting thing here is this is in the
last few months looks like December 2023
Runway nml introduced something they
call General World models and they're
saying we believe the next major
advancement in AI will come from systems
that understand the visual world and its
Dynamics they're starting a long-term
research effort around what they call
General World models so their whole idea
is that instead of the video AI models
creating little Clips here and there
with little isolated subjects and
movements that a better approach would
be to actually use the neural networks
and them building some sort of a world
model to understand the images they're
making and to actually utilize that to
have it almost create like a little
world so for example if you're creating
a clip with multiple characters talking
then the AI model would actually almost
simulate that entire world with the with
the rooms and the people and then the
people would talk talk to each other and
it would just take that clip but it
would basically create much more than
just a clip like if a bird is flying
across the sky it would be simulating
the wind and the physics and all that
stuff to try to capture the movement of
that bird to create realistic images and
video so they're saying a world model is
an AI system that builds an internal
representation of an environment and it
uses it to simulate future events within
that environment so for example for Gen
2 which is their model their video model
to generate realistic short video it has
developed some understanding of physics
and motion card still very limited
struggling of complex camera controls or
object motions amongst other things but
they believe and a lot of other
researchers as well that this is sort of
the next step for us to get better at
creating video at teaching robots how to
behave in the physical world like for
example the nvidia's foundation agent
then we need to create bigger models
that simulate entire worlds and then
from those worlds they pull out what we
need whether that's an image or text or
a robot's ability to open doors and pick
up objects all right but now back to
Lumiere A Spacetime diffusion model for
video generation so here they have a
number of examples for the text to video
of image to video stylized generation
Etc and so in lumier they're trying to
build this text video diffusion model
that can create videos that portray
realistic diverse and coherent motion a
pivotal challenge in video synthesis and
so the new thing that they introduces
the SpaceTime unet archit tecture that
generates entire temporal duration of
the video at once so in other words it
sort of thinks through how the entire
video going to look like in the
beginning as opposed to existing video
models so other video models which
synthesize distant key frames followed
by temporal super solution basically
meaning they do it one at a time so they
start with one and then create the
others and they're saying that makes
Global temporal consistency difficult
meaning that the object as as you watch
a video of it right it looks a certain
way on the first second of the video but
by second five is just completely
different and so here basically they're
comparing these two videos so imagin and
rs so The Lumiere model as you can see
here here sample a few clips and they're
looking at the XT slice so the XT slice
you can basically think of that as so
for example in stocks you have you know
the price of stock over time right so it
kind of goes like this here the x is the
spatial Dimension so where certain
things are in space on the image versus
T temporal the time so the X here is
basically where we might be looking at
the width of the image for example of
any image in time and T the temporal is
like how consistent is across time so as
you can see hit this green line so we're
just looking at this thing across the
entire image and this is what that looks
like so as you can see here this is
going pretty well and then it kind of
messes up and it kind of gets crazy here
and then kind of goes back to doing okay
whereas in Lumiere it's pretty pretty
good I mean maybe some funkiness right
there in one one frame but it's pretty
good same thing here I mean this is as
you can see here pretty good maybe you
can say that there's a little bit of
funkiness here but overall it's very
good whereas in this image and video I
mean as you can see here there's kind of
like a lot of nonsense that's happening
right and so here you can see like you
can't tell how many legs it has if it's
missing a leg Etc whereas in The Lumiere
I mean I feel like the you know you can
see each of the legs pretty distinctly
and their position and it's remains
consistent across time or at least
consistently easy to see where they are
but I got to say I can't wait to get my
hands on it it looks like as of right
now I don't see a way to access it this
is just sort of a preview but hopefully
they will open up for testing soon and
we'll be able to get our hands on it and
check it out and here interestingly
enough they actually compare how well
their performs against the other
state-of-the-art models in the in the
industry so the two that I'm familiar
with is Pika and genan 2 those are the
two that I've used and they're saying
that their video video is preferred by
users in both text to video and image to
video generation so blue is theirs and
the Baseline is the orange one so it
seems like there are pretty big
differences in every single one this
seems like video quality I mean it beats
out every single other one of these
which which I believe this text
alignment which here means probably how
well the image how true it is to The
Prompt right so if you type in a prompt
how accurately it represents it so it
looks like maybe image is the closest
one but it beats out most of the other
ones by quite a bit and then video
quality of image to video it seems like
it beats them out as well with Gen 2
probably being the next best one and
here they provide a side-by-side
comparison so for example the first
prompt is a sheep to the right of a wine
glass so this is Pika which which not
great CU there's no wine glass here's
Gen 2 consistently putting it on the
left anime diff which just has two
glasses and maybe a reflection of a
sheep image and video same thing so the
glasses on the left zero scope no
glasses that I can see although they
have sheep and of course R so the Lumi
the Google one is it seems like a nail
it in every single one the glass is on
the right although I got to say Gen 2 is
is great although it confused the left
and right but other than that I mean
same if image and video actually
although I feel like Gen 2 the quality
is much better of the sheep cuz that's
you know that's a good-looking sheep I
should probably rephrase that that's a
well rendered sheep how about that
versus imagin I mean that's a weird
looking thing there that could almost be
a horse or a cow if you just look at the
face and Google is again excellent
here's teddy bear skating in Time Square
this is Google this is imag again
weirdness happening there and that's gen
two again pretty good but I mean the the
thing is facing away although here I
just noticed so they they took skating
to mean ice skates whereas here it looks
like these are roller skates skateboard
Etc and so it looks like in the study
they just showed you two to things they
say do you like the left or the right
more based on motion and better quality
well I got to say if you're an aspiring
AI cinematographer then this is really
good news consistent coherent images
that are able to create near lifelike
scenes at this point I mean I'm sure
there's other people that'll complain
about stuff but you got to realize how
quickly the stuff is progressing just to
give you an idea this is about a year
ago or so this is what a I generated
video looked like so can you tell that
is improved just a little bit that's
about a year I'm not sure exactly when
this was done but I'm going to say a
year year and a half ago and I mean this
thing gets nightmarish so when I'm
talking about weird blocky shapes things
not being consistent across scenes like
what are we even looking at
here is this a mouth is this a building
and here's kind of uh something from
about 4 months ago from Pika Labs so as
you can see here it's much better it's
much more consistent right as you can
see here humans again maybe they look a
little bit weird but it's better it can
put you in the moment if you're telling
a story that's not necessarily about
everything looking realistic something
like this can be created pretty easily
and since it's new it's novel people
might be this might be a whole new
movement a new genre of film making
that's new exciting and never before
seen and most importantly it's easy to
create with a you know at home with a
few AI tools and anybody out there with
creative abilities with creative talent
to tell the stories that they have in
their mind without being limited
financially by Capital they're going to
be able to create AI voices they're
going to be able to create AI footage
maybe even have Chad GPT help them with
some of the story writing and once more
the sort of the next generation of
things that we're seeing that people are
working on is things like the similation
where you create the characters and then
you sort of let them loose in a world
they get simulated with these they get
sort of simulated so the stories kind of
play out in the world and then you sort
of pick and choose what to focus on
which scenes and which characters you
want to bring to the front so you
basically act as the World Builder you
build the worlds the characters the
narratives and AI assists you in
creating the visuals the voices Etc and
you can be 100% in control of it or you
can only control the things that you
want and the AI generates the rest so to
me this if you're interested in movie
making and you like these sort of styles
that by the way quickly will become much
more realistic I would be really looking
at this right now because right now is
the time that it's sort of emerging into
the world and getting really good and
it's going to get better by next year
it's going to be a lot
better well my name is Wes rth and uh
thank you for watching
Ver más vídeos relacionados
![](https://i.ytimg.com/vi/lCRSXk1bmRY/hq720.jpg)
¿Qué puedo hacer con Inteligencia artificial de manera fácil? / Para profesores.
![](https://i.ytimg.com/vi/2v6ipqgFk6s/hq720.jpg)
¿La Inteligencia Artificial acabará con el Arte?
![](https://i.ytimg.com/vi/54Y1NpfHcPU/hq720.jpg)
¿Qué es la IA generativa? | Desmitificando la IA generativa con AWS
![](https://i.ytimg.com/vi/2Fg7MWyMY68/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLDVadbGrJE5oKwi8a8JYS0KqxrLiw)
Cómo ENTRENAR tu PROPIO MODELO de OPENAI para que HABLE COMO UN FAMOSO
![](https://i.ytimg.com/vi/CcAJyw0Cwqg/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLACZR_t62CL5kaC0GOpdb2L5Jhtew)
Resumen automático de vídeos en Youtube con Inteligencia Artificial ( ChatGPT )
![](https://i.ytimg.com/vi/VaUsJvUMUgE/hq720.jpg)
What Your Photos Reveal!
5.0 / 5 (0 votes)