The First High Res FREE & Open Source AI Video Generator!
Summary
TLDRThe video explores the emerging field of AI video generation, highlighting Google's Imogen and RunwayML's Gen 2. It introduces 'Potate One', an open-source, high-resolution text-to-video model surpassing Model Scope AI. With capabilities for 1024x576 resolution and potential for local GPU usage, Potate One offers a promising alternative to Gen 2, encouraging community-driven improvements and experimentation in AI video generation.
Takeaways
- 😀 AI text generation, like chatbots, is currently the most popular form of AI, followed closely by text-to-image generation and manipulation.
- 🌟 The next frontier in generative AI is AI video generation, with Google's Imogen video and Runway ML's Gen 2 being notable examples.
- 🚀 Runway ML's Gen 2 is a multi-modal system that can generate videos from text, images, or video clips, but it's not open source and has limited public access.
- 🌐 An open-source competitor to Gen 2 has emerged, called 'potate one', which is based on Model Scope AI video generator and offers higher frame rates and resolutions.
- 📹 'Potate one' is capable of generating 1024 by 576 resolution videos, marking a significant leap into HD territory for open-source text-to-video models.
- 💧 Despite the higher resolution, 'potate one' videos still sometimes include watermarks, similar to the base Model Scope videos.
- 🔗 The GitHub repository for 'potate one' is available, allowing users to run the model on their own machines and access training scripts.
- 🔄 'Potate 2' is in development, promising potentially higher resolution and more coherent video generation capabilities.
- 🎥 The video generation process with 'potate one' is slow, especially on Google Collab, but can be faster with better hardware or paid services.
- 🤖 The 'potate one' model shows promise in coherency and resolution, making it a strong open-source alternative to proprietary models like Gen 2.
Q & A
What are the two main types of AI that are currently popular?
-The two main types of AI that are currently popular are AI text generation, such as chatbots, and text to image generation and manipulation.
What is the next level of AI image generation mentioned in the script?
-The next level of AI image generation mentioned is AI video generation.
What is Google's contribution to AI video generation as mentioned in the script?
-Google's contribution to AI video generation is an Imogen video, which is a high-resolution and high frame rate video.
What is Runway ML's Gen 2 and why is it significant?
-Runway ML's Gen 2 is a multi-modal system that can generate novel videos from text, images, or video clips. It is significant because it is one of the few AI video generation tools that is accessible to the public.
Why is open-source software important in the context of AI video generation?
-Open-source software is important because it allows for modification and building upon existing video generators, expanding the possibilities and improving the technology.
What is 'Potate One' and how does it relate to AI video generation?
-'Potate One' is an open-source, 1024 by 576 text-to-video model announced by Kim Andrew. It is significant as it breaks into HD territory for open-source text-to-video models.
What are the main features of 'Potate One' that make it competitive with Runway ML's Gen 2?
-Potate One is competitive with Runway ML's Gen 2 due to its higher frame rate, higher resolution video generation, and being fully open-source.
What is the significance of the model being able to generate videos with a resolution of 1024 by 576?
-The significance is that it represents a step into HD territory for open-source text-to-video models, offering higher quality than previous models.
What is the role of the GitHub repository in the context of 'Potate One'?
-The GitHub repository is where the source code and training scripts for 'Potate One' are available, allowing users to modify and improve the model.
How can users try out 'Potate One' without installing anything on their own machine?
-Users can try out 'Potate One' for free using Google Colab, which provides a simple setup and allows the generation of videos without local installation.
What are some of the limitations or challenges mentioned in the script regarding AI video generation?
-Some limitations or challenges include the generation time, which can be slow, and the complexity of setting up the model locally, which may require experience with installing GitHub repos and running Python.
Outlines
🚀 Introduction to AI Video Generation
The paragraph introduces the topic of AI video generation as the next frontier in AI technology, following the popularity of AI text and image generation. It mentions Google's Imogen video and Runway's Gen 2 as existing technologies, but highlights the limitations in accessibility. The paragraph then introduces 'potate one', an open-source AI video generator that can produce higher resolution videos compared to previous models. It also discusses the potential of open-source software, the availability of 'potate one' through a Twitter announcement, and the promise of future improvements with 'potate 2'.
🍓 Exploring 'Potato One' Video Generations
This paragraph delves into the capabilities of 'potato one', showcasing various demo videos such as animated fruits, a trippy art line drawing, and an astronaut in a blob world. It emphasizes the higher resolution and coherency of the videos produced by 'potato one' compared to the base model scope. The paragraph also discusses the accessibility of the model through Google Colab, the availability of training scripts, and the potential for community contributions through GitHub and Discord.
🛠 Setting Up and Using 'Potato One' on Google Colab
The paragraph provides a walkthrough on how to set up and use 'potato one' on Google Colab. It describes the process of generating videos, including the time it takes and the resolution achieved. It also touches on the potential for higher frame rates and the ability to generate longer videos with more steps. The paragraph includes a brief tutorial on using the Colab interface, the installation of necessary packages, and the generation of a video with a sample prompt.
🌐 Open Source Potential and Future of AI Video Generation
The final paragraph discusses the open-source nature of 'potato one' and its implications for the future of AI video generation. It mentions the integration of 'potato one' into Blender and the ease of use it offers. The paragraph concludes with an invitation for viewers to share their creations and to look forward to the next generation of AI video generation, 'potato 2'.
Mindmap
Keywords
💡AI Text Generation
💡Text to Image Generation
💡AI Video Generation
💡Runway ML's Gen 2
💡Open Source
💡Model Scope AI Video Generator
💡Potate One
💡Coherency
💡Google Colab
💡Resolution
Highlights
AI video generation is emerging as the next frontier in AI technology.
Google's Imogen video showcases high-resolution, high-frame-rate capabilities.
RunwayML's Gen 2 stands out as a multi-modal system for video generation.
The limitations of Gen 2 include restricted access and a lack of open-source availability.
Open-source AI video generators like Model Scope AI Video Generator are gaining traction.
Potate One is introduced as an open-source, high-definition text-to-video model.
Potate One is based on Model Scope and offers higher resolution than Gen 2.
The video generation process includes customizable parameters like FPS and frame count.
Potate One can be run on Google Colab with 15GB of RAM, making it accessible to the public.
The video generation quality is competitive with RunwayML's Gen 2, despite being in early stages.
The open-source nature of Potate One allows for community-driven improvements and modifications.
Potate One's coherency and resolution are significant advancements in AI video generation.
The video generation process can be time-consuming, especially on free platforms like Google Colab.
Potate One's potential for integration into software like Blender expands its usability.
The community is encouraged to experiment with Potate One and share their creations.
Potate Two is in development, promising further enhancements in AI video generation.
The video concludes with a call to action for viewers to engage with the AI technology and share their experiences.
Transcripts
as we drift through the new AI explosion
that is happening in our society two
main AIS seem to keep popping up the
first one being obviously AI text
generation such as chat gbt probably the
most popular form of AI at the moment
but the second most in its very close
second is of course text to image
generation and manipulation this form of
generative AI is super popular obviously
the mid-journey the dolly the Bing it's
all really great stuff but what's the
next step the next level of AI image
Generation Well that happens to be AI
video generation we've seen a lot from
AI video generation so far of course
Google is a mind-blowing Imogen video
comes to mind but of course we don't
have access to any of this this was just
a cool paper that Google released with
some more high resolution and high frame
rate video and of course the only one
that we can actually use is Runway
researches Gen 2 which is really A
multi-modal system that can generate
novel videos from text images or video
clip but it does just do plain text to
video and one of the main issues is that
most people still do not have access to
the Gen 2 app and the only real access
anyone has is through the Discord server
by runwayml and well I mean this video
generation is really truly the best the
public has at the moment you can only
generate four second videos at a
decently low frame rate and of course
it's not open source like we saw with
the stable diffusion AI image generator
you see when things are open source well
the possibilities of Open Source
software allow for the modification and
build upon of these video generators so
viewers I'm happy to share with you
today that we actually have a Gen 2
competitor that is fully open source
it's based off of model scope AI video
generator which was all right I mean it
was like the very very baby steps of AI
video I mean all of this is still baby
steps but models scope is definitely
behind a gen 2. this open source free AI
video generator I'm showing you today is
actually fairly competitive with Runway
ml's gen 2. we're talking higher frame
rate higher resolution video Generations
so viewers our journey Starts Here on
Twitter where Kim Andrew says that they
are happy to announce the first open
source 1024 by 576 text-to-video model
known as potate one so obviously now
we're actually you know breaking into
the HD territory for these open source
text-to-video models which is awesome it
is really heavily based on model scope
the open source model that produced
rather cruddy little blurry videos often
with watermarks you still get watermarks
sometimes with this but this is much
higher resolution and I think the
quality is definitely substantially
better but we got lots of other thanks
here so model scope's the main one
Lambda API and some other devs you can
try it through this link right here and
he also says that potate 2 is in the
oven which means he's working on a
better version of this potentially a
higher resolution more coherent one
either way the demo video that we have
here is only about a second long but it
actually looks pretty promising we have
a nice still background that isn't
moving too much and we've got all these
different kind of colorful beautiful 3D
animated fruits flying in the air and
it's a pretty good little demo video I
think this is something that I would
expect to see out of Gen 2 I think
that's that's kind of the quality we're
looking at and if we go down we've got a
GitHub link of course this whole thing
is open source you can run this on your
own machine anyone can go ahead and
modify this training scripts are also
available which is really cool we've got
another example here of an astronaut
that seems to be jumping through this
world of fuzzy little blobs it's kind of
like this really trippy weird video but
you know it'd be cool for a music video
or something thing either way again this
is something I would expect to see out
of the likes of Runway ml's Gen 2 but
this comes from a fully open source free
to download and use model the higher
resolution really is kicking butt here
that is really what we needed some high
resolution High Fidelity High frame rate
text a video and we're slowly but surely
getting there and I would be much
happier for us to get there with open
source models because open source just
means that anyone can modify it anyone
can make it better anyone can have
access to it and the vram requirements
to run it locally actually are not too
bad you can run this completely free in
collab with 15 gigs of RAM that is
supplied by Google collab and if you
have a graphics card like I do that has
over 15 gigabytes of vram you could
potentially run this thing at your house
Nvidia was kind enough to send me a RTX
4080 that has 16 gigabytes of vram so I
could run this thing on my own GPU at
home I don't have to go to a server or
go to some service like gen 2. we have
another one this is a little video of
fruits bouncing off the ground seemingly
but yeah you can see the fruits are
actually blurry as they get closer to
the camera which is a sign of good
coherency the fruits are clearly
strawberries and maybe some avocados or
something I'm not really sure exactly
what's going on but there all seem to be
bouncing and landing around it's a
pretty decent looking video we've got
something that's a little bit more
trippy here again this is like almost a
flash warning this is pretty crazy but
it looks like some art line drawing
you'd see in a cool music video so there
isn't any hugging face demo which always
seems to be the easiest way to try these
things out but there is a collab demo
which is also fairly easy keep in mind
viewers this is a prototype model it was
trained with lambdalabs.com on an a100
GPU 10 000 training steps again you have
full access to the data set in config
all of these fine-tuning models here
we've got two text-to-video fine tunings
a video blips 2 preprocessor a pie scene
detect and of course here are the links
to the base model here model scope and
yeah you can try it for completely free
with this little collab here is the link
to the GitHub if you have issues
figuring out how to run this they
actually do have a Discord server which
is nice but there's a bunch of different
collabs obviously the main one here is
going to be your potate one text to
video collab of course a little tutorial
video which I didn't think was super
helpful but we've got some nice examples
to look at down here and these are more
high resolution more coherent than just
your base model scope video this one is
a giraffe underneath a microwave here
and you can see the draft is inside of
the microwave this is like a throwback
or a lot of these problems actually
Throwbacks to other text of video
generators that we've seen from meta Ai
and Google AI but yeah the giraffe and
the microwave came out all right it's
literally just a draft sitting in a
microwave we've got a golden doodle
playing in the park by a lake this one's
all right as well not super coherent
we've got the panda bear driving a car
this one's honestly pretty good he's
just kind of sitting there the car is
very still and very coherent which I
like he makes sense in the passenger
seat there got the teddy bear running in
New York City this one kind of just
looks like he's hobbling around on his
butt through New York City but I gotta
say the actual motion of him moving
through the city or the actual panning
action through the city looks pretty
decent this is one of the weaker
generations for sure though this is
definitely a very complex shot here this
is a drone fly through of a fast food
restaurant on a dystopian Alien Planet
if you click it here it actually does
look a lot like Drone footage in terms
of how smooth it pans into the
restaurant and you could definitely tell
it's some sort of building on like a
dystopian Alien Planet although you know
the building isn't all that coherent
we've also got a dog wearing a superhero
outfit with a red cape flying through
the sky and this one actually is really
good you can see him do a full 180 turn
there and the whole body looks pretty
coherent and notice guys my favorite
part about this one is the physics on
the cape they actually fly around and
Float around and make sense quite a lot
this generation is impressive and to see
something this coherent come out of the
model shows you that it's definitely
doing some good work we still do have
this Shutterstock logo here I mean this
is really just an after effect on all of
these from the model scope video which
this is based off of but again guys
model scope video is open source which
allows us to improve upon it with stuff
like potate one I really think that this
dog generation though with the cape
flying around in him doing the full 180
with the the actual coherent background
is just truly marvelous that is a good
generation you've also got three more
here monkey learning to play the piano
there definitely looks like a little
monkey and he's just kind of scrabbling
around on the piano not supers coherent
but the background is Rock Solid the
piano is Rock Solid and the monkey looks
decent I would say and we've got a
litter of puppies running through the
yard it was able to do multiple puppies
at once which is pretty good but they're
kind of morphing in each other this is a
pretty weak generation and uh yeah it's
just a little bit scrabbling weird and
finally we've got a robot dancing in
Times Square and this one honestly came
out shockingly good as well you can see
Times Square moving in the background in
a cinematic way the light is reflecting
off the floor in a realistic way that
that's what you would expect essentially
from all these screens in the background
of Times Square and the robot does kind
of seem to be just standing there
dancing maybe a little bit the robot is
coherent throughout the whole video
though and it's doing a nice panning so
while these videos are pretty low frame
very short videos they look very very
promising in terms of their coherency
and for me personally that is the most
important part we have to nail down with
text to video before we start to improve
on frame rates and overall generation
time we want to get that coherency down
and this is looking a lot more coherent
than just your base model scope video
it's looking almost as coherent as Gen 2
in a lot of situations and again fully
open source which is really the the
crown jewel of this whole potate one
text to video generator so viewers as I
said earlier you can use this entire
thing for free on Google collab which is
awesome the collab is super simple to
set up I'll show you guys how to set up
in a little bit but it does take quite a
long time about 10 minutes to actually
generate a video through this thing but
that's on Google collab it could be a
lot faster by either paying for a better
collab or running it potentially at home
with your own GPU again here's where you
put the problems in it also supports
negative prompts which is good number of
steps per frame generation a guidance
scale total FPS in this case it was 24
and then the actual total number of
frames so you could generate longer
videos if you want it's just going to
take a really long time and collab I did
two videos with the base prompt of duck
here so let's see how those turn out all
right here is our first generation
actually at 24 FPS it's not too bad uh
it looks pretty rough pretty rudimentary
here but the resolutions decently High
yeah this is a little bit tough to look
at but I do like the way that the Ducks
are flapping their wings around here
considering this is brand new baby stage
technology and as you can see viewers it
is at 24 FPS with a 1024 and 576
resolution and what's actually really
cool is that's actually a higher
resolution than you get with Gen 2
outputs Gen 2 outputs are 768 by 448 so
you're getting a little bit higher
resolution than even just regular Gen 2
although the coherency might not fully
be that capable yet so viewers when you
first open up into the Google collab
notebook what you're going to want to do
is run this first play button all the
way up at the top this is just going to
set up the requirements to make
everything work it installs a bunch of
different githubs and has essentially
installs the AI onto this collab
notebook and once you're done with that
it's very simple you can simply type in
your prompt in this case I'm doing lemon
character dancing on the beach bokeh and
then I click this little run button I
just want to click run anyways and
eventually it will start to generate
those frames again you can change the
number of frames if you want it will
just go up in terms of the length of the
video it's a very simple collab to use
and set up you'll see this warning that
says could not find a tensor RT that's
okay it'll still generate again it just
takes a really long time to generate
these videos but that's okay and by the
way guys if you make anything really
cool please feel free to share it on my
Discord server I love to see all your
cool generations and stuff you create
with this new AI technology that we
experience you could even turn the FPS
up or the total number of steps to get a
clearer generation it's all just going
to take longer and as you can see the
GPU Ram will start to pick up again we
have that 15 gigabyte total and as you
can see we slowly begin to generate our
first frame again this is why it takes
so long it takes about 11 seconds of Pop
to generate a single step and again
there's 50 steps per frame so for 20
frames you can see it starts to get
pretty long but that's kind of how it is
when we're using this very limited
completely free Google collab on your
own GPU at home this might be blazing
fast so while this generates I'm going
to go ahead and see if I can run this on
my own machine at home this is very
complex if you're someone who isn't used
to running this stuff on your own
machine you're going to have to download
Python and have that python runtime pip
install all of these different GitHub
locations but let's give it a shot and
see if I can make this work I'm an
inexperienced person when it comes to
this so viewers here is my lemon
character dancing on the beach it's a
little disturbing I won't deny I did try
to get this set up and working inside of
my computer but I wasn't able to figure
it out with the limited time that I have
today if you viewers want me to teach
you guys how to install this thing
locally please leave a comment down
below and I'll do a dedicated video for
it but uh if you are more akin to using
this kind of stuff it's going to be a
lot easier if you want faster generation
you could do something like lower the
total number of steps and by the way
guys once you you do a new generation
your old MP4 files get deleted so yeah
we could have each one of these video
Generations just be one single step and
when we rerun it it's going to generate
a lot faster but only one step per frame
really isn't going to give you a very
clear video as we'll see and you can see
if we run this thing on only one single
step you just get a very blurry basic
image no real clear content I think it's
fair to say viewers that this thing
really is meant to be used by a person
who has a little bit of experience
installing these GitHub repos on their
own systems and I know that's a lot of
you viewers at home as you can see cam
Andrews is happy that people are using
potato one to create stunning videos
here's another example of a few videos
combined together to create something a
little bit more than just the one video
so if we play this here you can see it's
all a video of a very similar area here
it's like mountains with a waterfall and
it looks pretty decent not super
coherent of course it's not nearly as
good as regular text image but this is
the baby steps of text a video and it's
really nice that we actually have a
alternative now to Gen 2 in some ways
that is fully open source fully
manipulatable there's no bars held back
on you when you use this thing and it
creates some decently high resolution
video footage and you can do stuff
longer than just a second if you really
want to wait for it the generation time
really is the main downside of this
thing in viewers one more thing here
potato one is also available in blender
thanks to tint wanton you can
essentially directly integrate it into
blender and it's a lot easier to use
than your typical stuff maybe I'll
actually do a video on how to integrate
it into blender because that's probably
a little bit easier than trying to
install it through python on your own
system viewers I love covering really
cool new Cutting Edge AI projects such
as this one especially if they're going
to be open source and free for all to
download and use let me know if you
create anything cool with this I'd love
to see some videos down in my Discord
server which is linked in the
description and yeah do you think this
is better than Gen 2 do you prefer it
over Gen 2 just because it's going to be
open source I'm very excited to see how
the next Generation known as potate 2
ends up that's gonna be it for me for
today tune in at the end of the week for
a larger AI news recap and I'll see you
in the next video goodbye
浏览更多相关视频
FREE and Unlimited Text-To-Video AI is Here! 🙏 Full Tutorials (Easy/Med/Hard)
AI Video Tools Are Exploding. These Are the Best
RIP MidJourney ! Utilisez FLUX 1 GRATUITEMENT et sans censure ! (Guide d'utilisation)
Elon Musk CHANGES AGI Deadline..Googles Stunning New AI TOOL, Realistic Text To Video, and More
Llama 3 e Meta AI: demo dell'AI GRATIS di Meta
AI News: Everything You Missed This Week!
5.0 / 5 (0 votes)