FREE and Unlimited Text-To-Video AI is Here! 🙏 Full Tutorials (Easy/Med/Hard)
Summary
TLDRThe video showcases two text-to-video generation products: RunwayML's Gen 2, a cutting-edge, freemium service with limitations on video length, and an open-source project by potat1, which allows local machine processing. Gen 2 impresses with its accuracy despite some minor flaws, while the open-source option, though limited to short videos, demonstrates comparable quality. The video also guides viewers on setting up the open-source project locally using Anaconda and discusses the challenges of maintaining video quality at longer durations.
Takeaways
- 🎥 The video discusses the emerging reality of text-to-video technology and showcases two different products in this field.
- 🔍 One product is RunwayML's Gen 2, which is closed source and has been in development with a private beta phase, now available for public use with limitations on video length.
- 🆓 RunwayML's Gen 2 is free to use but has a credit system that limits the amount of video generated, with each second using five credits.
- 🦆 The video creator tests Gen 2 by inputting 'ducks on a lake' and generates a short video clip, noting the quality and some minor inaccuracies.
- 📈 Gen 2 is highlighted as being on the cutting edge of text-to-video technology, outperforming other similar products.
- 💻 The second product is an open-source text-to-video project by potat1, which can be run on a local computer or Google Colab.
- 🔗 Links to the project's Hugging Face page and GitHub repository are provided for interested users to explore and use the technology.
- 🚀 The open-source project uses Google Colab to demonstrate the ease of setting up and running a short video generation based on text prompts.
- 🚨 A limitation of the open-source project is the short video length due to memory constraints and quality degradation with longer videos.
- 🛠️ The video provides a detailed guide on setting up the open-source text-to-video project on a local machine, especially for those with an Nvidia GPU.
- 🔄 The creator discusses the challenges of maintaining video quality for longer durations and mentions ongoing efforts to improve this aspect of the technology.
- 👍 The video concludes with an invitation for viewers to try the technology, seek help in Discord communities, and subscribe for more content.
Q & A
What is the main topic of the video?
-The main topic of the video is the demonstration and discussion of two text-to-video generation products, one being a closed-source product called Gen 2 by runwayml, and the other an open-source project.
What is runwayml's Gen 2 product and how does it work?
-Runwayml's Gen 2 is a text-to-video generation product that has been in private beta and is now available for public use. It requires credits for video generation, with each second of video using five credits and a limit on the total seconds of video that can be generated.
How much does it cost to use runwayml's Gen 2 product?
-The basic use of runwayml's Gen 2 is free, but it is limited in terms of the number of seconds of video that can be generated. For more features like upscaling resolution, removing watermarks, and increasing the amount of generated video, there is a subscription fee of twelve dollars per editor per month.
What is the open-source text-to-video project mentioned in the video?
-The open-source text-to-video project mentioned is by potat1 and is available on the hugging face page and GitHub. It uses different text-to-video libraries and can be run on Google Colab.
What are the limitations of the open-source text-to-video project when generating videos?
-The open-source project has limitations in terms of the length of the video it can generate. Increasing the number of frames can lead to memory issues on Google Colab and a degradation in video quality.
How can one run the open-source text-to-video project locally?
-To run the open-source text-to-video project locally, one needs to have Anaconda for Python version management, clone the necessary repositories, install the required libraries and modules, and ensure that CUDA and a compatible GPU are available for processing.
What is the issue with increasing the video length in the open-source project?
-Increasing the video length beyond one to two seconds in the open-source project can result in a severe degradation in video quality. The models are trained on short video clips, which is why there is a limitation in generating longer videos.
What does the video creator suggest for those who need help with setting up the text-to-video projects?
-The video creator suggests joining their Discord for assistance and also recommends joining the Discord of the open-source project for further help and support.
What is the video creator's opinion on the current state of text-to-video technology?
-The video creator is impressed with the current state of text-to-video technology, considering it to be on the cutting edge and showing excitement for the progress being made in the field.
How can viewers support the video creator?
-Viewers can support the video creator by liking and subscribing to their content, which helps in the visibility and growth of their channel.
Outlines
🚀 Introduction to Text-to-Video Technologies
The script introduces two different text-to-video generation products, one being a closed-source product from runwayml called Gen 2, which has recently become publicly available for free with limitations on video length. The other is an open-source project by potat1, which can be run locally or on Google Colab. The video showcases the capabilities of Gen 2, highlighting its cutting-edge performance in text-to-video generation despite some minor inaccuracies, such as a duck appearing to have two heads. The script also mentions the cost structure for Gen 2's premium features and provides a link to the product's website.
🔍 Exploring Open-Source Text-to-Video with Hugging Face
The script discusses an open-source text-to-video project found on the Hugging Face page and its GitHub repository, which offers various Google Colab versions using different libraries. The focus is on using xeroscope V 1.1 for text-to-video generation. The process involves running a Colab notebook, entering prompts, and setting parameters for video generation. However, the script notes limitations in video length due to memory constraints and quality degradation with longer videos. The author shares their experience running the project locally on a Windows machine with an Nvidia GPU, detailing the steps for setting up the environment using Anaconda, installing necessary libraries, and running the inference script. The script concludes with a discussion on the current limitations of the model when generating longer videos and mentions ongoing efforts to improve video quality, inviting viewers to join Discord communities for further assistance and updates.
Mindmap
Keywords
💡Text to Video
💡RunwayML's Gen 2
💡Open Source
💡Google Colab
💡Hugging Face
💡Cuda
💡Anaconda
💡PyTorch
💡Inference
💡Discord
💡Model Degradation
Highlights
Introduction of two different text-to-video products: one closed source and one open source.
RunwayML's Gen 2 product is now publicly available for free with limitations on video length.
Gen 2 is on the cutting edge of text-to-video technology, outperforming other solutions.
Demonstration of Gen 2 generating a 4-5 second video from the text prompt 'ducks on a lake'.
The duck in the video appears with two heads, showcasing the current limitations of text-to-video generation.
Information on pricing for Gen 2, including credits for video generation and editor subscription benefits.
Introduction to an open-source text-to-video project by potat1 available on Hugging Face and GitHub.
Instructions on how to use Google Colab for text-to-video generation with the open-source project.
Limitation of the open-source project in generating long videos due to memory and quality issues.
The process of setting up the open-source project locally on a Windows machine with an Nvidia GPU.
Use of Anaconda for Python version management to avoid module version mismatch issues.
Steps to create a conda environment and install necessary libraries for the project.
Cloning the required repositories and setting up the environment for the text-to-video generation.
Running a script to check for the correct version of torch and CUDA availability.
Execution of the inference script for local text-to-video generation and monitoring GPU usage.
Observation of video quality degradation when increasing video length beyond the model's training range.
Community support available through Discord for both Gen 2 and the open-source project.
Encouragement for viewers to try different models for text-to-video generation and share their findings.
Call to action for viewers to like, subscribe, and engage with the content for more updates.
Transcripts
text the video is finally becoming a
reality some of the things that I've
been seeing get created by people who
are using text to video are absolutely
incredible so I'm going to show you two
different products one is closed source
and it's really impressive the other is
a brand new open source project that you
can run on your local computer or Google
call lab I'm going to show you all of
these let's go so first is runwayml's
Gen 2 product Gen 2 has been in the
works for a while it's had a private
beta for a while but now anybody can use
it it's free but you are limited in the
number of seconds of video that you can
generate so let's try it out I'm going
to say ducks on a lake
generate now you can see up in this
corner I now have 82 seconds of video
left and it says each second of video
generation uses five credits and you
have 410 credits left Gen 2 is
definitely on The Cutting Edge of text
to video and does outperform everything
else and here we go it's done so each
video is about four to five seconds
let's play it I mean that looks pretty
good there's not a lot of movement but
it certainly looks very accurate this
duck looks like it has two heads but
overall for a text to video which is in
its earliest stages this is impressive
so play around with this you can get
this at runwayml.com it's free I think
you get new credits every month but
after that you do have to pay for it and
for the pricing it's twelve dollars per
editor per month you get upscale
resolution you get to remove their
watermarks you get shorter wait times
and 125 seconds of generated video every
month it may not sound like a lot but
the amount of processing power it takes
to make these videos is substantial and
you'll see that shortly when I run it on
my local machine next is an open source
text a video project by potat1 and I'll
drop all the links to these things in
the description below so this is the
hugging face page and if we scroll to
the bottom we could go to their GitHub
page and on their GitHub page they give
us a bunch of different Google colab
versions that use different
text-to-video libraries I'm going to use
the xeroscope V 1.1 text to video collab
so here it is I already started running
it the first thing you need to do is
just click this play button and that's
going to install all the libraries that
you need and also clone the two repos it
really could not be easier then down
here this is where we're going to start
entering our prompt so you can have
prompt I'm going to say ducks on a lake
similar to Gen 2 no negative prompt
number of steps 33 I'm going to leave
that guidance scale 23 frames per second
I'm going to leave that and number of
frames 24. now here's a big limitation
at 24 total frames and 30 frames per
second this is coming in at less than
one second of video you can certainly
increase it but what I've found is that
if you increase it too much first of all
you run run out of memory on Google
collab and second of all the quality
degrades really quickly I'm still trying
to figure out how to maintain the
quality of videos that are longer
because on my local machine I can
actually create longer videos because I
have a pretty beefy GPU so once I figure
that out I'll create an update video and
I'll show you but for now let's run it
push play and here we go now it's going
to give us a warning that's okay we can
ignore that and this does take a little
while and here you can actually see it
running and processing each frame and it
says we're at about two seconds per it I
think that means iteration but I'm not
sure if you know leave a comment in the
description below and let me know okay
it's finished you're going to see this
little check mark now to find the video
that you just created you want to click
this little folder icon on the left side
and then you're going to go to outputs
and then here it is and I'm going to
right click and click download I'm going
to save it to my desktop and let's open
it up and see how it looks and there it
is so again it's only one second of
video Let's have it on repeat now it's
pretty comparable to Gen 2 but you can't
have very long videos and I'm going to
show you that okay next I'm going to
show you how to get this running locally
I'm on a Windows machine and I have an
Nvidia GPU so that's what I'll be using
the first thing you're going to need is
Anaconda and then that is python version
management and it'll alleviate us of all
those python version and module version
mismatch issues and again I know a lot
of you struggle with that I do too so
please use Anaconda it makes things so
much easier so the first thing I'm going
to do is create a folder called content
I'm going to name it content too because
I already have a Content but you can go
ahead and name it whatever you want from
there now we're going to create our
conda environment and we're going to use
Python version
3.10.11 which is what I have found works
with all of these tensorflow libraries
and and all the other machine learning
and AI libraries that we need and also
it works with Cuda hit enter so it's
giving me a warning do I want to remove
the existing environment yes I do you
probably won't see that then it asked me
to proceed if I want to install all of
these new packages yes I do and there we
go so then I'm going to highlight this
line and we're going to activate our
conda environment with conda activate my
end hit enter and there we could see my
end next I'm going to make sure we have
all of the torch libraries necessary to
run this so I'm going to say conda
install pytorch torch Vision torch audio
we may not need torch audio but I
included it because I had that text to
audio library that I was working with as
well so I'm going to go ahead and
install it all of these scripts all of
these commands will be in a link in the
description below and I'm going to
confirm yes I want all of these
installed all right there it's finished
the next thing we're going to do is
clone the two repos that we need to get
this running first we're going to clone
the text to video fine-tuning library
hit enter and it's done next we're going
to actually clone the model and this is
git clone and we're going to grab it
from hugging face okay that's done that
took a little while next we're going to
change directory into the text to video
fine tuning folder and then from there
we're going to run pip install Dash R
requirements.txt and that's going to
install all the modules that we need for
these scripts okay that finished so one
thing I want to do before I run the
inference script is make sure that I
have Cuda installed and it's working and
I want to run this little Checker script
that makes sure that we have the right
version of torch and Cuda and that Cuda
is available so I'm going to write
python Checker dot pi and there we go it
gives me the version and that it is true
and available all right and the last
thing we have to do is run the inference
file so it's Python inference.pi and
then we pass it in a bunch of different
variables and we want to make sure that
we enter all the correct paths to the
model and the repo so to do that we're
going to come in here we're going to
right click on it and we're going to say
copy as path and that'll go in this
First Command where it says Dash M so
I'm just going to paste it in there and
next we need the output folder and
that's going to be right here already
and I'm going to just make sure that
this outputs folder is created so I go
into here and there's no outputs folder
so I'm just going to create new and then
call it outputs and now it should work
enter and there we go it's working and
if we look at our monitor we can see
that the GPU is running it and that's it
so let's take a look at what it looks
like so we go to the outputs folder and
and there it is ducks on a lake and I
think this looks really good the only
problem is it's only one second now we
can start to increase it but what I've
found is that if we increase it past two
seconds of video we really start to see
a severe degradation in the quality I
jumped into the Discord of this project
and that's because they said the models
are trained on one to two second videos
and that makes a lot of sense they're
working on this problem right now and in
fact they gave me a suggestion of a new
model I should try that model can be
found right here and so I haven't tried
it yet I'm going to try it out if I get
it working I'll create another video on
how to do that but now let me show you
one more about what it looks like at 48
frames so we changed that last parameter
to 48 we hit enter and there it goes
it's running all right it's finished
let's take a look at what that one looks
like now so here's the second one and
this is two seconds now so it still
looks pretty good now let me show you
what happens when we move it up to three
seconds okay it's done let's take a look
so here it is it actually still looks
pretty decent but you can tell the Ducks
are starting to pop in and out of
nowhere and then once we increase it
from here we're going to see a complete
degradation of the video quality but
they're working on it and the progress
is so exciting so hopefully you get this
working if you need any help jump in my
Discord I'm happy to help out also jump
in cam enduru's Discord they'll help you
out as well there's a bunch of different
models that you can try for text to
video and some of them are going to do
better than others but this is great
progress and completely local and open
source if you like this video please
consider giving me a like And subscribe
and I'll see you in the next one
浏览更多相关视频
The First High Res FREE & Open Source AI Video Generator!
Wake up babe, a dangerous new open-source AI model is here
2-Langchain Series-Building Chatbot Using Paid And Open Source LLM's using Langchain And Ollama
Build Your Own YouTube Video Summarization App with Haystack, Llama 2, Whisper, and Streamlit
Generating scan reports with Trivy
How to run your own local Telegram Bot API server in Python [PTB v13]
5.0 / 5 (0 votes)