Civitai Beginners Guide To AI Art // #1 Core Concepts

Civitai
29 Jan 202411:29

Summary

TLDRThis video serves as a beginner's guide to AI art, particularly focusing on Stable Diffusion. Hosted by Tyler, it covers essential AI art concepts, software installation, and terminology such as text-to-image, image-to-image, inpainting, upscaling, and more. The guide explains tools like ControlNet, LoRAs, and extensions like Automatic 1111, along with the importance of models and checkpoints. Additionally, it discusses the significance of prompts, upscaling, and training data. The video aims to equip viewers with the knowledge needed to create AI-generated images locally on their machines.

Takeaways

  • 🎨 The video introduces the basics of AI art, focusing on generating images using Stable Diffusion software.
  • 📜 Key AI art concepts like text-to-image, image-to-image, and batch image-to-image are explained, helping users understand how to generate images.
  • 🖼️ The video introduces 'inpainting,' which involves using a painted mask area to add or remove objects from images, similar to Photoshop's generative fill.
  • 🎥 Concepts like text-to-video (text to vid) and video-to-video (vid to vid) are introduced for generating videos using text prompts or transforming existing videos.
  • 📝 'Prompt' refers to the text input that guides the AI in generating images, while 'negative prompt' tells the AI what to exclude from the image.
  • 📈 'Upscaling' enhances low-resolution images or videos into higher resolution using built-in AI models or external software like Topaz Photo AI.
  • 🛠️ Models, also called checkpoints, are critical in AI art generation. Different models produce different image styles, and users are advised to choose models wisely.
  • 🔐 Safe tensor files are preferred over ckpt files for models, as they are more secure and less susceptible to malicious code.
  • 🖌️ ControlNet, an important extension in Stable Diffusion, is used to process image structures, depth, and poses, essential for tasks like image-to-image generation.
  • 🧠 Tools like Deorum and techniques like Enhanced Super-Resolution GAN (ESRGAN) and AnimateDiff are introduced for enhancing image resolution and adding motion to text-to-image generations.

Q & A

  • What is AI art, and how is it generated?

    -AI art refers to the creation of images, videos, or other media using artificial intelligence. It's typically generated by feeding a text prompt into a model like Stable Diffusion, which creates an image based on the input text.

  • What is Stable Diffusion, and what is its role in AI art generation?

    -Stable Diffusion is a machine learning model used for generating images based on text inputs. It can also handle tasks like transforming existing images, adding elements to images, and generating videos. It serves as the engine behind much of the AI art creation process.

  • What is the difference between 'text-to-image' and 'image-to-image' generation?

    -Text-to-image generation creates a visual output solely based on a text prompt, whereas image-to-image takes an existing image as input, using the text to alter or enhance the original image.

  • What is inpainting, and how is it used in AI art?

    -Inpainting involves masking specific areas of an image and using AI to add or remove elements from those parts. It's similar to Photoshop’s generative fill but is done locally using software like Stable Diffusion.

  • What are prompts and negative prompts?

    -Prompts are the text inputs that describe what you want the AI to generate, while negative prompts describe what you do not want to appear in the final image, helping refine the result.

  • What is upscaling, and why is it important in AI art?

    -Upscaling is the process of increasing the resolution of an image, making it larger and more detailed. It is often the final step before sharing AI-generated images to ensure they have high quality.

  • What are checkpoints and safe tensors in the context of Stable Diffusion?

    -Checkpoints, now often called models, are files that store the machine learning model used for generating images. Safe tensors are a safer version of checkpoints, as they are less prone to malicious code.

  • What is a 'LoRA,' and how is it used in Stable Diffusion?

    -LoRA stands for Low-Rank Adaptation, which is a model trained on a small, specific dataset to generate images with a particular character, style, or concept. LoRAs are used to fine-tune outputs in a targeted way.

  • What are 'textual inversions' and 'embeddings'?

    -Textual inversions and embeddings are smaller datasets used to fine-tune specific aspects of an image, such as fixing common issues like incorrect hand or eye shapes, or focusing on details like specific objects or faces.

  • What is a VAE (Variational Autoencoder), and how does it affect image generation?

    -A VAE is a file used to enhance the quality of generated images by improving details, colors, and sharpness. Some models have VAEs built-in, while others require you to add one manually to get the best results.

Outlines

00:00

🎨 Introduction to AI Art and Stable Diffusion

Tyler introduces the official beginner’s guide to AI art, covering key concepts and terminology related to AI art and stable diffusion. This video series aims to teach viewers how to generate AI images on their local machines, navigate the necessary software, and download resources from coty.com. Tyler outlines the core topics, including text-to-image and image-to-image generation, the importance of prompts, and commonly used software like automatic1111, Focus, Comfy UI, and Easy Diffusion.

05:01

🖼️ Types of AI Image Generation

This section breaks down the different types of AI image generation, beginning with text-to-image, which involves creating images from a textual prompt. Tyler also explains image-to-image and batch image-to-image, where an existing photo serves as a base for further modifications using AI. He introduces the concept of 'control net' for enhancing image generation and describes inpainting, text-to-video, video-to-video, and how prompts and negative prompts shape the final AI-generated outputs.

10:01

📈 Upscaling and Final Touches in AI Art

Tyler discusses the process of upscaling low-resolution images into higher resolution formats, commonly done with built-in AI models or external software like Topaz Photo AI. He emphasizes that upscaling is typically the last step before sharing AI-generated images. This segment also mentions that upscaling enhances the final quality and prepares the image for online sharing, ensuring the best possible resolution and visual appeal.

🧠 AI Models and Checkpoints

Here, Tyler introduces the concept of AI models and checkpoints, explaining their role in generating outputs. Checkpoints or models are built from millions of images and shape the style of the final image. He also explains the difference between checkpoint files (CKPT) and safe tensor files, advising the use of safe tensor files to avoid malicious code. The importance of training data and the evolution from Stable Diffusion 1.5 to Stable Diffusion XL 1.0 is also highlighted, showcasing how different datasets improve AI model quality.

🤖 Lora and Textual Inversions in AI Models

Tyler delves into specialized AI models like Lora, which is trained on a smaller dataset for specific purposes, such as generating images with a particular character or style. He also explains textual inversions and embeddings, which are used to fix specific image features like hands and eyes. Additionally, Tyler touches on VAEs (Variational Autoencoders) that enhance image detail and vibrancy, ensuring better final results in terms of color and sharpness.

🔧 Essential AI Extensions for Stable Diffusion

This section covers essential extensions for Stable Diffusion, starting with Control Nets, which are used to define image structures and positions. Tyler explains that Control Nets are vital for tasks like image-to-image and video-to-video generation. He also introduces Deorum, a community known for building AI tools, and explains the role of Deorum's popular automatic1111 extension for smooth video generation. Other notable tools, like the enhanced super-resolution GAN (ESRGAN) for upscaling and AnimateDiff for adding motion to images, are briefly discussed.

📚 Conclusion and Glossary Resources

Tyler wraps up the video by summarizing the core concepts and terminology covered throughout the guide. He encourages viewers to refer to the Stable Diffusion glossary on coty.com’s education hub for further learning and clarification on any terms they may encounter while using AI art tools. The video series aims to provide a strong foundation for beginners to confidently navigate AI art creation.

Mindmap

Keywords

💡Text to Image

Text to Image refers to generating an image from a descriptive text prompt using AI. In the video, it is explained as the most common method of AI art generation, where you input text that describes the desired image and the AI creates the image based on that input. This is the foundation of tools like Stable Diffusion.

💡Image to Image

Image to Image is the process of taking an existing image and using it as a base to generate a new image with AI, based on a text prompt. This technique allows users to modify or enhance existing images, often with the help of additional tools like ControlNet. The script explains how it’s used for refining a specific photo or batch of images.

💡Stable Diffusion

Stable Diffusion is an AI model used for generating images from text prompts. It is the core technology discussed in the video, and it operates by interpreting prompts and creating images through diffusion models. The script mentions versions like SD 1.5 and XL 1.0, highlighting how these models differ in training data and output quality.

💡ControlNet

ControlNet is an extension for Stable Diffusion that adds more control to the image generation process by reading and manipulating structures like depth and character positions in images. The video emphasizes how it is crucial for tasks like Image to Image and Video to Video generation, allowing more precise control over the outputs.

💡Prompt

A Prompt is the text input given to the AI model to instruct it on what kind of image to generate. In the context of the video, prompts are essential for all forms of AI image generation. The script describes how detailed prompts can shape the outcome and how there are positive and negative prompts, the latter telling the AI what to avoid in the image.

💡Upscaling

Upscaling is the process of increasing the resolution of an image, making a low-resolution image clearer and more detailed. The video describes this as one of the final steps in image generation, often using external software or AI tools, to ensure images are ready for sharing or printing. It is an essential step before publishing AI-generated images.

💡Checkpoints

Checkpoints, also called models, are files that contain the machine learning algorithms used in Stable Diffusion to generate images. The script explains that they are the result of training on large datasets of images, and the choice of checkpoint/model determines the style and quality of the generated output.

💡Safe Tensors

Safe Tensors are a newer file format for Stable Diffusion models, designed to be safer by minimizing the risk of malicious code compared to older checkpoint files (CKPT). The video suggests users opt for Safe Tensor files when downloading models to avoid security risks.

💡LORA

LORA, or Low-Rank Adaptation, is a technique for training smaller, more focused AI models that specialize in generating specific styles, characters, or concepts. The video explains that LORA models are often used to add very particular traits to image outputs, such as generating a specific anime character or replicating a certain artistic style.

💡Negative Prompt

A Negative Prompt is the opposite of a regular prompt—it tells the AI what to exclude from the image. The video highlights its importance in controlling the generation process by ensuring certain unwanted elements or styles don’t appear in the final image, refining the accuracy of outputs.

Highlights

Introduction to AI art and stable diffusion, guiding beginners from zero to generating their first AI images.

Overview of the core concepts and terminology of AI art, including text-to-image, image-to-image, and batch image generation.

Explanation of image generation methods such as text-to-image and image-to-image using reference photos or existing images.

Introduction to inpainting, the process of modifying parts of an image using a painted mask, similar to Photoshop's generative fill.

Text-to-video and video-to-video generation explained, using text prompts to create videos or alter existing video footage.

Understanding prompts and negative prompts: the core text inputs used to guide AI in generating images and excluding unwanted elements.

Overview of upscaling, transforming low-resolution images (e.g., 512x512) into higher resolution versions (e.g., 1080x1080).

Introduction to models and checkpoints, including different model types, and how they influence image style and quality.

Explanation of safe tensor files replacing checkpoints (ckpt) to reduce the risk of downloading malicious content.

Description of training data sets like LAION 5B and how they are used to train AI models for image generation.

Introduction to Stable Diffusion 1.5 and its continued popularity despite the release of Stable Diffusion XL 1.0.

Discussion of LoRA (Low-Rank Adaptation) and its use in fine-tuning models for specific styles, people, or characters.

Explanation of textual inversions and embeddings for fixing common image generation problems like bad hands or eyes.

Introduction to ControlNets for guiding image structure, depth, and character positions in image-to-image or video generation.

Overview of key tools and extensions like ControlNets, Deorum for smooth video generation, and ESRGAN for upscaling.

Transcripts

play00:00

[Music]

play00:02

welcome to cai.com official beginners

play00:05

guide to AI art my name is Tyler and

play00:07

throughout this series I will be your

play00:09

guide as we go from zero to generating

play00:12

our first AI images throughout these

play00:14

videos you can expect to learn about the

play00:17

Core Concepts and the terminology behind

play00:20

AI art and stable diffusion we're going

play00:22

to discuss and walk through how to

play00:25

install the various pieces of software

play00:27

and programs you will need to generate

play00:30

AI images on your own local machine and

play00:33

we're going to learn how to navigate

play00:35

these programs as well as how to

play00:37

properly download and store resources

play00:41

from the coty.com resource library

play00:44

before we get to installing anything

play00:46

there are a lot of Core Concepts and

play00:49

terminology used throughout AI art and

play00:52

stable diffusion that if you're new to

play00:54

all this really might be overwhelming or

play00:57

not familiar so in this video we're

play00:59

going to discuss some common terms

play01:02

abbreviations and Concepts that you will

play01:04

encounter as you're browsing websites

play01:06

like civii and interacting with software

play01:10

like automatic 1111 Focus comfy UI or

play01:14

easy diffusion so let's get started by

play01:16

discussing the various concept types of

play01:19

image generation that you'll be doing

play01:21

throughout your time making AI images

play01:23

starting with our very first concept and

play01:26

the most common which is text to image

play01:29

you're going to to see this term a lot

play01:31

and this refers to taking a text prompt

play01:34

and generating an image out of nothing

play01:36

using only the text and telling the AI

play01:40

exactly what you would like to see in

play01:42

your image then we have image to image

play01:45

and batch image to image this is the

play01:48

process of taking an existing image or a

play01:51

reference photo for example a photo of

play01:53

myself or a photo of a friend and using

play01:56

that photo as the input for the AI to

play01:59

then take your prompt reference the

play02:01

photo and build the output image on top

play02:05

of the already existing photo for this

play02:08

you'll be using something called a

play02:09

control net which we will talk about in

play02:11

the extensions part of this video image

play02:13

to image is doing so with only one

play02:16

single image whereas a batch image to

play02:18

image is taking a folder of images and

play02:22

running them through the diffusion

play02:23

process all at the same time next we

play02:26

have in painting which is the practice

play02:28

of using a painted mask area to add or

play02:31

remove objects from an image think of

play02:34

this as generative fill from Photoshop

play02:37

except it lives locally in your stable

play02:40

diffusion software and you get to paint

play02:43

right on your image with a brush tool

play02:45

punching The Prompt exactly what you

play02:47

want to happen in the part of the image

play02:49

that you painted next we have text to

play02:50

video and video to video or as you'll

play02:53

see them referred to text to vid or vid

play02:56

to vid these are the processes of taking

play02:59

a text prompt and getting a video output

play03:02

with motion or taking an existing video

play03:05

input and transforming that video

play03:08

utilizing your prompt next we have the

play03:11

most important part The Prompt and the

play03:15

negative prompt The Prompt is the text

play03:18

input that you give your stable

play03:20

diffusion based software or any AI image

play03:22

generation software in general to tell

play03:25

it exactly what you would like it to

play03:27

Output in your image the negative prompt

play03:30

does the reverse this is where you take

play03:33

your text input and tell stable

play03:35

diffusion what you do not want in your

play03:38

photo next we have upscaling upscaling

play03:41

is the process of taking low resolution

play03:43

media think an image that is a 512 x 512

play03:48

small little square and converting it to

play03:51

high resolution media think a square

play03:53

that is 1080 x 1080 this is usually done

play03:57

by enhancing the existing pixels and

play04:00

most of the time we are now doing this

play04:03

through either AI models that are built

play04:05

into our stable diffusion software and

play04:08

interfaces or we're using external

play04:10

programs like topaz photo AI or topaz

play04:13

video AI to upscale our images and

play04:16

videos before we go and we share them on

play04:19

the Internet or post them wherever we

play04:21

want the upscaling process is usually

play04:23

going to be the last part before you're

play04:25

ready to share your images these are the

play04:27

Core Concepts that you will be utilizing

play04:29

anytime you sit down to generate

play04:31

something with stable diffusion next

play04:33

we're going to dive into the models

play04:36

assets and resources that you're going

play04:38

to come across on a regular basis so to

play04:41

start off checkpoints checkpoints are

play04:44

now more commonly referred to as models

play04:47

but you will see these terms use

play04:49

interchangeably as you go from site to

play04:51

site and you're looking for different

play04:53

models or checkpoints to use in your

play04:56

Generations a model is the product of

play04:59

the training millions of images scraped

play05:01

from all over the web and this file

play05:04

drives our results from text to image

play05:07

image to image and text to video

play05:09

Generations this is the heartbeat of

play05:12

everything you will be doing in stable

play05:14

diffusion typically your model will

play05:16

dictate the overall style that you will

play05:19

get out of your image some models are

play05:22

really great all-arounders some are very

play05:25

strictly trained on anime and some are

play05:28

very strictly trained trained on

play05:30

realistic images choosing the right

play05:32

model is vital to getting the image that

play05:35

you would like out of stable diffusion

play05:37

all right let's move on to checkpoints

play05:39

and safe tensors now checkpoints are a

play05:42

file format created by pytorch lightning

play05:45

it contains a machine learning model

play05:47

which is used by stable diffusion to

play05:49

generate our image outputs now this the

play05:52

checkpoint or the ckpt file is

play05:56

superseded and has mostly been replaced

play05:59

by safe tensor files safe tensor files

play06:02

are essentially the same thing except

play06:05

they are less susceptible to having

play06:07

malicious code put in them so whenever

play06:11

possible you would want to look for the

play06:14

safe tensor version of a model rather

play06:17

than a ckpt this is also why it is good

play06:21

to read reviews before you download any

play06:23

models and install them into your hard

play06:26

drive on your machine you want to make

play06:28

sure that you're not downloading

play06:29

anything malicious now anytime you hear

play06:31

the term training data it's referring to

play06:34

a set of many images that are used to

play06:37

train a stable diffusion model Laura or

play06:40

embedding Lon 5B this is a large scale

play06:44

data set for research purposes that has

play06:47

been trained on

play06:49

5.85 billion clip filtered text to image

play06:52

pairs this is the data set that stable

play06:55

diffusion was trained on which brings us

play06:58

to stable diffusion 1.5 or also referred

play07:02

to all over the internet as SD 1.5 this

play07:05

is a latent text to image model trained

play07:08

on

play07:09

595,000 steps at a resolution of 512x

play07:13

512 images from the Layon 5B model this

play07:17

has now been superseded by stability

play07:20

ai's latest release stable diffusion XL

play07:24

1.0 however a lot of the community still

play07:27

uses stable diffusion 1.5 because of its

play07:30

flexibility and the sheer amount of

play07:32

resources that are available for SD 1.5

play07:36

next up we have Laura L O R A which

play07:40

stands for low rank adaption now Aura is

play07:44

essentially a model but trained on a

play07:46

much much much smaller data set geared

play07:49

towards a very specific thing this thing

play07:52

could be a person a style or a concept

play07:56

so you will find many lauras trained on

play07:59

specific anime characters so that when

play08:01

you include the Laura in your image

play08:03

generation process it is going to push

play08:06

your image output to have that specific

play08:09

character in the final image textual

play08:12

inversions and embeddings well these are

play08:14

similar to lauras but they're trained on

play08:16

even smaller data sets and really geared

play08:19

towards capturing Concepts such as

play08:21

fixing bad hands fixing bad eyes objects

play08:25

and specific faces next we have vaes or

play08:30

vay vays are optional detail oriented

play08:33

files that sometimes come built into

play08:37

your models or more often you will have

play08:40

to include a v next to your model for

play08:43

your image generation you can think of

play08:45

vays as the final touch to getting a

play08:48

really crisp sharp colorful image some

play08:51

models without the use of a vay the

play08:54

colors will feel very dull and washed

play08:56

out or they will have less details so

play08:59

you either want to make sure that the

play09:01

model that you are currently running has

play09:04

a vay built into it or if not you want

play09:08

to use your own vay alongside of it that

play09:11

just about covers the model section now

play09:13

let's jump into some of the most

play09:15

important and common extensions that you

play09:18

will encounter while you're using stable

play09:20

diffusion all right so our first

play09:22

extension and quite possibly one of the

play09:25

most important things you will come

play09:27

across while you're using stable the

play09:29

fusion if you want to do anything

play09:31

outside of just basic text to image

play09:34

prompting is control Nets control Nets

play09:37

consists of a bunch of different models

play09:40

that are trained on specific data sets

play09:43

to read different structures of an image

play09:46

such as straight lines depth character

play09:49

position where it will actually position

play09:52

a dummy inside of the character in your

play09:54

photo so that you can then take that

play09:56

dummy and generate a whole new person on

play09:58

top of the exact pose that that person

play10:01

was in control Nets are essential if you

play10:03

want to do anything involving image to

play10:06

image or video to video next we have

play10:10

deorum deorum is a community of AI image

play10:13

synthesis developers enthusiasts and

play10:16

artists that build a large set of

play10:19

generative AI tools they are most

play10:21

commonly known for their super popular

play10:24

automatic 1111 extension that can take a

play10:27

text prompt and generate a really really

play10:29

smooth video output that you can also

play10:32

key frame specific zooming panning and

play10:35

turning motions into next we have estan

play10:38

the enhanced super resolution generative

play10:41

adversarial Network estan is a technique

play10:43

that is used to generate high resolution

play10:46

images from low resolution pixels think

play10:49

upscaling a 720 image up to 1080 this

play10:53

model is commonly found in a lot of

play10:55

stable diffusion interfaces next we have

play10:57

animate diff anim diff is a technique

play11:00

used to inject motion into text to image

play11:04

and even imageo image Generations these

play11:07

are all of the Core Concepts and

play11:09

terminology and terms that you will come

play11:12

across during your time using stable

play11:14

diffusion if at any point you get lost

play11:16

or you need some extra help figuring out

play11:18

what something means or you need

play11:20

something to refer to feel free to visit

play11:22

our stable diffusion glossery in the

play11:24

coty.com education Hub we'll see you

play11:27

guys in the next video

Rate This

5.0 / 5 (0 votes)

関連タグ
AI artStable DiffusionText to ImageImage to ImageControlNetUpscalingAI modelsVideo generationPromptingExtensions
英語で要約が必要ですか?