FREE and Unlimited Text-To-Video AI is Here! 🙏 Full Tutorials (Easy/Med/Hard)

Matthew Berman
12 Jun 202308:09

Summary

TLDRThe video showcases two text-to-video generation products: RunwayML's Gen 2, a cutting-edge, freemium service with limitations on video length, and an open-source project by potat1, which allows local machine processing. Gen 2 impresses with its accuracy despite some minor flaws, while the open-source option, though limited to short videos, demonstrates comparable quality. The video also guides viewers on setting up the open-source project locally using Anaconda and discusses the challenges of maintaining video quality at longer durations.

Takeaways

  • 🎥 The video discusses the emerging reality of text-to-video technology and showcases two different products in this field.
  • 🔍 One product is RunwayML's Gen 2, which is closed source and has been in development with a private beta phase, now available for public use with limitations on video length.
  • 🆓 RunwayML's Gen 2 is free to use but has a credit system that limits the amount of video generated, with each second using five credits.
  • 🦆 The video creator tests Gen 2 by inputting 'ducks on a lake' and generates a short video clip, noting the quality and some minor inaccuracies.
  • 📈 Gen 2 is highlighted as being on the cutting edge of text-to-video technology, outperforming other similar products.
  • 💻 The second product is an open-source text-to-video project by potat1, which can be run on a local computer or Google Colab.
  • 🔗 Links to the project's Hugging Face page and GitHub repository are provided for interested users to explore and use the technology.
  • 🚀 The open-source project uses Google Colab to demonstrate the ease of setting up and running a short video generation based on text prompts.
  • 🚨 A limitation of the open-source project is the short video length due to memory constraints and quality degradation with longer videos.
  • 🛠️ The video provides a detailed guide on setting up the open-source text-to-video project on a local machine, especially for those with an Nvidia GPU.
  • 🔄 The creator discusses the challenges of maintaining video quality for longer durations and mentions ongoing efforts to improve this aspect of the technology.
  • 👍 The video concludes with an invitation for viewers to try the technology, seek help in Discord communities, and subscribe for more content.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the demonstration and discussion of two text-to-video generation products, one being a closed-source product called Gen 2 by runwayml, and the other an open-source project.

  • What is runwayml's Gen 2 product and how does it work?

    -Runwayml's Gen 2 is a text-to-video generation product that has been in private beta and is now available for public use. It requires credits for video generation, with each second of video using five credits and a limit on the total seconds of video that can be generated.

  • How much does it cost to use runwayml's Gen 2 product?

    -The basic use of runwayml's Gen 2 is free, but it is limited in terms of the number of seconds of video that can be generated. For more features like upscaling resolution, removing watermarks, and increasing the amount of generated video, there is a subscription fee of twelve dollars per editor per month.

  • What is the open-source text-to-video project mentioned in the video?

    -The open-source text-to-video project mentioned is by potat1 and is available on the hugging face page and GitHub. It uses different text-to-video libraries and can be run on Google Colab.

  • What are the limitations of the open-source text-to-video project when generating videos?

    -The open-source project has limitations in terms of the length of the video it can generate. Increasing the number of frames can lead to memory issues on Google Colab and a degradation in video quality.

  • How can one run the open-source text-to-video project locally?

    -To run the open-source text-to-video project locally, one needs to have Anaconda for Python version management, clone the necessary repositories, install the required libraries and modules, and ensure that CUDA and a compatible GPU are available for processing.

  • What is the issue with increasing the video length in the open-source project?

    -Increasing the video length beyond one to two seconds in the open-source project can result in a severe degradation in video quality. The models are trained on short video clips, which is why there is a limitation in generating longer videos.

  • What does the video creator suggest for those who need help with setting up the text-to-video projects?

    -The video creator suggests joining their Discord for assistance and also recommends joining the Discord of the open-source project for further help and support.

  • What is the video creator's opinion on the current state of text-to-video technology?

    -The video creator is impressed with the current state of text-to-video technology, considering it to be on the cutting edge and showing excitement for the progress being made in the field.

  • How can viewers support the video creator?

    -Viewers can support the video creator by liking and subscribing to their content, which helps in the visibility and growth of their channel.

Outlines

00:00

🚀 Introduction to Text-to-Video Technologies

The script introduces two different text-to-video generation products, one being a closed-source product from runwayml called Gen 2, which has recently become publicly available for free with limitations on video length. The other is an open-source project by potat1, which can be run locally or on Google Colab. The video showcases the capabilities of Gen 2, highlighting its cutting-edge performance in text-to-video generation despite some minor inaccuracies, such as a duck appearing to have two heads. The script also mentions the cost structure for Gen 2's premium features and provides a link to the product's website.

05:00

🔍 Exploring Open-Source Text-to-Video with Hugging Face

The script discusses an open-source text-to-video project found on the Hugging Face page and its GitHub repository, which offers various Google Colab versions using different libraries. The focus is on using xeroscope V 1.1 for text-to-video generation. The process involves running a Colab notebook, entering prompts, and setting parameters for video generation. However, the script notes limitations in video length due to memory constraints and quality degradation with longer videos. The author shares their experience running the project locally on a Windows machine with an Nvidia GPU, detailing the steps for setting up the environment using Anaconda, installing necessary libraries, and running the inference script. The script concludes with a discussion on the current limitations of the model when generating longer videos and mentions ongoing efforts to improve video quality, inviting viewers to join Discord communities for further assistance and updates.

Mindmap

Keywords

💡Text to Video

Text to video technology is the process of generating video content from textual descriptions. It is a rapidly advancing field in AI, where the script provided to the system is used to create visual content. In the video, the host discusses two different products that utilize this technology, illustrating its potential and current capabilities.

💡RunwayML's Gen 2

RunwayML's Gen 2 is a specific product mentioned in the script that is part of the text to video technology wave. It is described as being on the cutting edge, offering a free service with limitations on the video length that can be generated. The script provides an example of generating a video with the prompt 'ducks on a lake', showcasing the product's ability to translate text into visual content.

💡Open Source

Open source refers to a type of software where the source code is available to the public, allowing anyone to view, modify, and distribute the software. In the context of the video, an open source text to video project is introduced, which viewers can run on their local computers, emphasizing community contribution and accessibility.

💡Google Colab

Google Colab is a cloud-based platform provided by Google for machine learning and data analysis. It is mentioned in the script as a platform where one can run the open source text to video project without installing anything locally, highlighting its ease of use and accessibility for experimentation.

💡Hugging Face

Hugging Face is an organization known for its contributions to the machine learning community, particularly in the domain of natural language processing. In the script, it is mentioned as the source of the open source text to video project, indicating its role in facilitating AI advancements.

💡Cuda

CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by Nvidia. It is discussed in the script as a requirement for running the text to video project locally, emphasizing the need for a capable GPU to handle the computational demands of AI models.

💡Anaconda

Anaconda is a distribution of the Python programming language for scientific computing, that aims to simplify package management and deployment. The script describes using Anaconda to manage Python environments and dependencies for the text to video project, showcasing its utility in streamlining development workflows.

💡PyTorch

PyTorch is an open-source machine learning library based on the Torch library. It is mentioned in the script as one of the necessary libraries for running the text to video project locally, indicating its importance in building and training AI models.

💡Inference

Inference in the context of AI refers to the process of making predictions or decisions based on a trained model. The script describes running an inference script to generate video content from text, demonstrating the application of a pre-trained model to new data.

💡Discord

Discord is a VoIP, instant messaging, and digital distribution platform that is mentioned in the script as a place where the host and other users can discuss the text to video technology, seek help, and share experiences, indicating its role as a community hub for enthusiasts and developers.

💡Model Degradation

Model degradation refers to a decrease in the performance or quality of an AI model's output. In the script, it is discussed in the context of increasing video length, where longer videos result in lower quality due to the models being trained on shorter clips, illustrating the current limitations in text to video generation.

Highlights

Introduction of two different text-to-video products: one closed source and one open source.

RunwayML's Gen 2 product is now publicly available for free with limitations on video length.

Gen 2 is on the cutting edge of text-to-video technology, outperforming other solutions.

Demonstration of Gen 2 generating a 4-5 second video from the text prompt 'ducks on a lake'.

The duck in the video appears with two heads, showcasing the current limitations of text-to-video generation.

Information on pricing for Gen 2, including credits for video generation and editor subscription benefits.

Introduction to an open-source text-to-video project by potat1 available on Hugging Face and GitHub.

Instructions on how to use Google Colab for text-to-video generation with the open-source project.

Limitation of the open-source project in generating long videos due to memory and quality issues.

The process of setting up the open-source project locally on a Windows machine with an Nvidia GPU.

Use of Anaconda for Python version management to avoid module version mismatch issues.

Steps to create a conda environment and install necessary libraries for the project.

Cloning the required repositories and setting up the environment for the text-to-video generation.

Running a script to check for the correct version of torch and CUDA availability.

Execution of the inference script for local text-to-video generation and monitoring GPU usage.

Observation of video quality degradation when increasing video length beyond the model's training range.

Community support available through Discord for both Gen 2 and the open-source project.

Encouragement for viewers to try different models for text-to-video generation and share their findings.

Call to action for viewers to like, subscribe, and engage with the content for more updates.

Transcripts

play00:00

text the video is finally becoming a

play00:02

reality some of the things that I've

play00:04

been seeing get created by people who

play00:06

are using text to video are absolutely

play00:08

incredible so I'm going to show you two

play00:10

different products one is closed source

play00:12

and it's really impressive the other is

play00:14

a brand new open source project that you

play00:16

can run on your local computer or Google

play00:18

call lab I'm going to show you all of

play00:19

these let's go so first is runwayml's

play00:22

Gen 2 product Gen 2 has been in the

play00:25

works for a while it's had a private

play00:26

beta for a while but now anybody can use

play00:29

it it's free but you are limited in the

play00:31

number of seconds of video that you can

play00:33

generate so let's try it out I'm going

play00:34

to say ducks on a lake

play00:36

generate now you can see up in this

play00:38

corner I now have 82 seconds of video

play00:40

left and it says each second of video

play00:42

generation uses five credits and you

play00:44

have 410 credits left Gen 2 is

play00:46

definitely on The Cutting Edge of text

play00:48

to video and does outperform everything

play00:50

else and here we go it's done so each

play00:52

video is about four to five seconds

play00:54

let's play it I mean that looks pretty

play00:56

good there's not a lot of movement but

play00:58

it certainly looks very accurate this

play01:00

duck looks like it has two heads but

play01:02

overall for a text to video which is in

play01:04

its earliest stages this is impressive

play01:07

so play around with this you can get

play01:08

this at runwayml.com it's free I think

play01:11

you get new credits every month but

play01:13

after that you do have to pay for it and

play01:15

for the pricing it's twelve dollars per

play01:17

editor per month you get upscale

play01:19

resolution you get to remove their

play01:21

watermarks you get shorter wait times

play01:22

and 125 seconds of generated video every

play01:26

month it may not sound like a lot but

play01:28

the amount of processing power it takes

play01:30

to make these videos is substantial and

play01:33

you'll see that shortly when I run it on

play01:34

my local machine next is an open source

play01:36

text a video project by potat1 and I'll

play01:39

drop all the links to these things in

play01:41

the description below so this is the

play01:42

hugging face page and if we scroll to

play01:44

the bottom we could go to their GitHub

play01:46

page and on their GitHub page they give

play01:48

us a bunch of different Google colab

play01:50

versions that use different

play01:51

text-to-video libraries I'm going to use

play01:53

the xeroscope V 1.1 text to video collab

play01:57

so here it is I already started running

play01:58

it the first thing you need to do is

play02:00

just click this play button and that's

play02:02

going to install all the libraries that

play02:04

you need and also clone the two repos it

play02:06

really could not be easier then down

play02:08

here this is where we're going to start

play02:10

entering our prompt so you can have

play02:12

prompt I'm going to say ducks on a lake

play02:14

similar to Gen 2 no negative prompt

play02:17

number of steps 33 I'm going to leave

play02:18

that guidance scale 23 frames per second

play02:21

I'm going to leave that and number of

play02:22

frames 24. now here's a big limitation

play02:24

at 24 total frames and 30 frames per

play02:27

second this is coming in at less than

play02:29

one second of video you can certainly

play02:31

increase it but what I've found is that

play02:33

if you increase it too much first of all

play02:36

you run run out of memory on Google

play02:37

collab and second of all the quality

play02:40

degrades really quickly I'm still trying

play02:42

to figure out how to maintain the

play02:44

quality of videos that are longer

play02:45

because on my local machine I can

play02:47

actually create longer videos because I

play02:49

have a pretty beefy GPU so once I figure

play02:51

that out I'll create an update video and

play02:52

I'll show you but for now let's run it

play02:54

push play and here we go now it's going

play02:56

to give us a warning that's okay we can

play02:58

ignore that and this does take a little

play02:59

while and here you can actually see it

play03:01

running and processing each frame and it

play03:03

says we're at about two seconds per it I

play03:06

think that means iteration but I'm not

play03:07

sure if you know leave a comment in the

play03:09

description below and let me know okay

play03:11

it's finished you're going to see this

play03:12

little check mark now to find the video

play03:14

that you just created you want to click

play03:16

this little folder icon on the left side

play03:18

and then you're going to go to outputs

play03:19

and then here it is and I'm going to

play03:21

right click and click download I'm going

play03:23

to save it to my desktop and let's open

play03:24

it up and see how it looks and there it

play03:26

is so again it's only one second of

play03:28

video Let's have it on repeat now it's

play03:30

pretty comparable to Gen 2 but you can't

play03:32

have very long videos and I'm going to

play03:34

show you that okay next I'm going to

play03:36

show you how to get this running locally

play03:37

I'm on a Windows machine and I have an

play03:39

Nvidia GPU so that's what I'll be using

play03:42

the first thing you're going to need is

play03:43

Anaconda and then that is python version

play03:45

management and it'll alleviate us of all

play03:48

those python version and module version

play03:50

mismatch issues and again I know a lot

play03:53

of you struggle with that I do too so

play03:55

please use Anaconda it makes things so

play03:57

much easier so the first thing I'm going

play03:59

to do is create a folder called content

play04:00

I'm going to name it content too because

play04:02

I already have a Content but you can go

play04:04

ahead and name it whatever you want from

play04:05

there now we're going to create our

play04:07

conda environment and we're going to use

play04:09

Python version

play04:11

3.10.11 which is what I have found works

play04:14

with all of these tensorflow libraries

play04:16

and and all the other machine learning

play04:17

and AI libraries that we need and also

play04:20

it works with Cuda hit enter so it's

play04:22

giving me a warning do I want to remove

play04:23

the existing environment yes I do you

play04:26

probably won't see that then it asked me

play04:27

to proceed if I want to install all of

play04:29

these new packages yes I do and there we

play04:31

go so then I'm going to highlight this

play04:33

line and we're going to activate our

play04:35

conda environment with conda activate my

play04:38

end hit enter and there we could see my

play04:40

end next I'm going to make sure we have

play04:42

all of the torch libraries necessary to

play04:44

run this so I'm going to say conda

play04:45

install pytorch torch Vision torch audio

play04:48

we may not need torch audio but I

play04:50

included it because I had that text to

play04:52

audio library that I was working with as

play04:53

well so I'm going to go ahead and

play04:55

install it all of these scripts all of

play04:56

these commands will be in a link in the

play04:59

description below and I'm going to

play05:00

confirm yes I want all of these

play05:01

installed all right there it's finished

play05:03

the next thing we're going to do is

play05:04

clone the two repos that we need to get

play05:06

this running first we're going to clone

play05:08

the text to video fine-tuning library

play05:10

hit enter and it's done next we're going

play05:12

to actually clone the model and this is

play05:14

git clone and we're going to grab it

play05:15

from hugging face okay that's done that

play05:17

took a little while next we're going to

play05:19

change directory into the text to video

play05:21

fine tuning folder and then from there

play05:23

we're going to run pip install Dash R

play05:25

requirements.txt and that's going to

play05:27

install all the modules that we need for

play05:28

these scripts okay that finished so one

play05:30

thing I want to do before I run the

play05:32

inference script is make sure that I

play05:34

have Cuda installed and it's working and

play05:36

I want to run this little Checker script

play05:38

that makes sure that we have the right

play05:39

version of torch and Cuda and that Cuda

play05:42

is available so I'm going to write

play05:43

python Checker dot pi and there we go it

play05:46

gives me the version and that it is true

play05:49

and available all right and the last

play05:50

thing we have to do is run the inference

play05:52

file so it's Python inference.pi and

play05:55

then we pass it in a bunch of different

play05:56

variables and we want to make sure that

play05:58

we enter all the correct paths to the

play06:01

model and the repo so to do that we're

play06:03

going to come in here we're going to

play06:05

right click on it and we're going to say

play06:06

copy as path and that'll go in this

play06:08

First Command where it says Dash M so

play06:10

I'm just going to paste it in there and

play06:12

next we need the output folder and

play06:14

that's going to be right here already

play06:15

and I'm going to just make sure that

play06:17

this outputs folder is created so I go

play06:19

into here and there's no outputs folder

play06:21

so I'm just going to create new and then

play06:24

call it outputs and now it should work

play06:26

enter and there we go it's working and

play06:29

if we look at our monitor we can see

play06:30

that the GPU is running it and that's it

play06:32

so let's take a look at what it looks

play06:34

like so we go to the outputs folder and

play06:36

and there it is ducks on a lake and I

play06:38

think this looks really good the only

play06:40

problem is it's only one second now we

play06:42

can start to increase it but what I've

play06:43

found is that if we increase it past two

play06:46

seconds of video we really start to see

play06:49

a severe degradation in the quality I

play06:51

jumped into the Discord of this project

play06:53

and that's because they said the models

play06:54

are trained on one to two second videos

play06:56

and that makes a lot of sense they're

play06:58

working on this problem right now and in

play07:00

fact they gave me a suggestion of a new

play07:02

model I should try that model can be

play07:04

found right here and so I haven't tried

play07:06

it yet I'm going to try it out if I get

play07:08

it working I'll create another video on

play07:09

how to do that but now let me show you

play07:11

one more about what it looks like at 48

play07:14

frames so we changed that last parameter

play07:16

to 48 we hit enter and there it goes

play07:18

it's running all right it's finished

play07:19

let's take a look at what that one looks

play07:21

like now so here's the second one and

play07:23

this is two seconds now so it still

play07:25

looks pretty good now let me show you

play07:27

what happens when we move it up to three

play07:28

seconds okay it's done let's take a look

play07:30

so here it is it actually still looks

play07:32

pretty decent but you can tell the Ducks

play07:34

are starting to pop in and out of

play07:36

nowhere and then once we increase it

play07:39

from here we're going to see a complete

play07:40

degradation of the video quality but

play07:42

they're working on it and the progress

play07:44

is so exciting so hopefully you get this

play07:47

working if you need any help jump in my

play07:49

Discord I'm happy to help out also jump

play07:51

in cam enduru's Discord they'll help you

play07:54

out as well there's a bunch of different

play07:55

models that you can try for text to

play07:57

video and some of them are going to do

play07:58

better than others but this is great

play08:00

progress and completely local and open

play08:03

source if you like this video please

play08:05

consider giving me a like And subscribe

play08:06

and I'll see you in the next one

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
Text-to-VideoAI TechnologyInnovationVideo GenerationRunwayMLOpen SourceContent CreationMachine LearningSoftware TutorialTech Review
Besoin d'un résumé en anglais ?