How to make AI Faces. ControlNet Faces Tutorial.

Sebastian Kamph
28 Apr 202314:42

TLDRThis tutorial demonstrates how to manipulate faces using Stable Fusion and ControlNet. The video begins by showing the potential output when using ControlNet and offers tips to enhance the process. The presenter explains the difference between 'face' and 'face only' preprocessors and their impact on the body's pose. By using Control Version 1.1 with Stable Fusion 1.5 models, viewers learn how to adjust control settings for better results. The video also addresses common issues with face pre-processing and provides solutions, such as using negative styles and specific prompts to achieve desired outputs. The tutorial further explores variations in facial poses and the use of different ControlNet models, including the open pose full model for character images. It concludes with a discussion on the Mediapipe face model as an alternative option, highlighting its additional details around the eyes and eyebrows. The presenter encourages viewers to experiment with different models to find the best fit for their needs.

Takeaways

  • 🎨 **Control Faces in Stable Fusion**: The tutorial demonstrates how to control faces using ControlNet in Stable Fusion, achieving various output results from a single input image.
  • 📌 **Different Preprocessors**: There are options like 'face' and 'face only' to preprocess the image, which determines how much of the body is included along with the face.
  • 🔄 **Control Net Versions**: The video mentions using ControlNet 1.1 with Stable Fusion 1.5 models, noting that there are also 2.1 models available.
  • 🎭 **Controlled Pose Adjustments**: By using the face control, one can adjust the pose of the face, including the direction of the head and upper torso.
  • 🚫 **Negative Styles for Corrections**: Introducing negative styles can help correct issues like distorted teeth or other facial irregularities in the generated images.
  • 🗣️ **Text Prompting for Specific Actions**: If a particular action like 'woman shouting' isn't captured in the output, it can be directly prompted in the text input for better results.
  • 🧑‍🤝‍🧑 **Combining Styles for Coherent Results**: Combining prompts with styles like 'digital oil painting' or 'skin enhancer' can lead to more consistent and desired outputs.
  • 🛠️ **Adjusting Control Steps for Variations**: Varying the control steps from 0 to 1 can introduce different levels of control, creating variations in the final render.
  • 🤹 **Randomness for Creative Outputs**: By adjusting the ending control step, one can introduce a degree of randomness or chaos into the image generation process.
  • 👽 **Open Pose Full for Character Images**: For full character images, using the 'open pose full' model allows for more powerful style changes and better control over the face and body.
  • ✍️ **Manual Touch-ups for Perfection**: If faces are not generated correctly, manual in-painting can be used to fix them, leveraging tools for image upscaling and refinement.

Q & A

  • What is the main topic of the tutorial?

    -The main topic of the tutorial is how to control faces inside of stable Fusion using ControlNet.

  • What are the two different preprocessor options mentioned for working with faces in ControlNet?

    -The two different preprocessor options mentioned are 'face' and 'face only'.

  • What does the 'face only' preprocessor option allow?

    -The 'face only' preprocessor option allows the body to take any shape around the face, without restricting the pose of the upper torso or shoulders.

  • What is the purpose of using negative styles in the generation process?

    -Negative styles are used to fix issues in the generated images, such as incorrect facial features, by providing the AI with additional constraints.

  • How can you prompt the AI to generate images with specific facial expressions?

    -You can prompt the AI by including the desired facial expression in the text input, such as 'woman shouting', which helps the AI understand the intended output better.

  • What is the significance of using the 'open pose' model in ControlNet?

    -The 'open pose' model allows for more detailed control over the body, including the face, upper torso, and even hands, making it particularly powerful for changing styles and poses.

  • How does changing the ending control step affect the generated images?

    -Changing the ending control step introduces variations into the generated images by controlling the extent to which the AI adheres to the input style, allowing for a mix of the original style and new randomness.

  • What is the recommended approach if the generated faces are not satisfactory?

    -If the generated faces are not satisfactory, one can use the 'between painting' feature to manually paint over the face, or use image-to-image upscaling and inpaint for better results.

  • What is the difference between the ControlNet 1.1 face models and the Mediapipe face model?

    -The Mediapipe face model provides more detail around the eyes, eyebrows, and mouth compared to the ControlNet 1.1 face models, offering an alternative option for users to achieve better results.

  • Why is it important to test different models and options in ControlNet?

    -It is important to test different models and options because different scenarios and desired outcomes may require specific settings or models to achieve the best results.

  • How does the tutorial help users troubleshoot common issues with the face pre-processor?

    -The tutorial provides specific strategies such as using negative styles, changing the text input to match the desired facial expression, and adjusting the control steps to troubleshoot and improve the face pre-processor's output.

  • What is the role of the 'stable Fusion 1.5 models' in the process described?

    -Stable Fusion 1.5 models are used as the base for generating images in the tutorial. They correlate with the ControlNet models and are essential for the overall image generation process.

Outlines

00:00

😀 Introduction to Controlling Faces in Stable Fusion

The video begins with the host introducing the audience to the process of controlling faces within Stable Fusion. They demonstrate how to manipulate input images to achieve desired output results using Control Net. The host shares a few tricks to improve the process and invites viewers to install Control Net if they haven't already, providing a link in the video description. The tutorial continues with a practical example, where an image of a woman shouting is loaded into Control Net, and various options for preprocessing the face are explored. The difference between using 'face' and 'face only' modes is explained, highlighting how these choices affect the body's pose in the final image. The host also discusses the compatibility of Control Net versions with Stable Fusion models and concludes the paragraph by generating four images using the 'woman' prompt and adjusting the control settings to achieve better results.

05:02

🎨 Enhancing Image Quality with Styles and Prompting

The host addresses common issues with the face pre-processor in Stable Fusion and offers solutions to improve image quality. They suggest using negative styles, which are provided in the video description, to enhance the first render of generated images. An example is given where the negative styles are applied, resulting in significantly improved image quality, including better teeth detail. An alternative method to fix issues is also discussed, which involves prompting the AI with a more specific description, such as 'woman shouting,' to match the control image. The paragraph concludes with a demonstration of how combining prompts with styles like 'digital oil painting' and 'skin enhancer' can further refine the generated images, resulting in outputs that closely align with the desired pose and facial expression.

10:03

🚀 Exploring Open Pose Models for Character Control

The video moves on to discuss the use of open pose models for controlling characters in Stable Fusion. The host explains the difference between using 'open pose full' and 'face only' models, noting that the latter allows for more variability in the body's pose. They demonstrate how changing the control step can introduce variations into the generated images, creating a range of poses that are similar but not identical. The host also touches on the use of the 'Media pipe face' model as an alternative to Control Net 1.1, highlighting its ability to provide more detailed facial features. The paragraph concludes with a practical example of using the open pose model to generate images of a woman with a painted face, showing how to correct facial imperfections in the generated images using in-painting techniques.

Mindmap

Keywords

ControlNet Faces

ControlNet Faces refers to a feature within a software application that allows users to manipulate and control the facial features and poses of generated images. In the video, it is used to create various outputs with different facial expressions and poses starting from a given input image.

Stable Fusion

Stable Fusion is the name of the software or application where the face manipulation is taking place. It is a platform that utilizes AI to generate images based on textual descriptions, and in this context, it is used to create images of faces with specific features and poses as directed by ControlNet.

Preprocessor

In the context of the video, a preprocessor is a tool within the software that prepares the image for further manipulation. It outlines key facial features such as the mouth, nose, and eyes with dots, which can then be adjusted to control the pose of the face.

Face Only Potential

This term refers to a setting within the software that allows the user to focus solely on the face while ignoring the rest of the body. It is used when the user wants the generated image to have any body shape around the face, without restrictions.

Control Version 1.1

This denotes a specific version of the control mechanism used within the software. The video discusses using Control Version 1.1 in conjunction with Stable Fusion 1.5 models to achieve desired facial poses and expressions.

Control Weighted

Control Weighted likely refers to a parameter in the software that determines the influence of the control settings on the final output image. In the video, it is mentioned in the context of starting the control at a certain level and ending at another to achieve a specific effect.

Negative Styles

Negative Styles are terms or styles that are used to exclude certain features or characteristics from the generated images. In the video, they are used to correct issues with the generated faces, such as teeth appearing messed up.

Text Input

Text Input is a feature that allows users to describe what they want the AI to generate using words. In the video, text input is used to prompt the AI to create images of a 'woman shouting' based on the control image provided.

Open Pose

Open Pose is a model within the software that is particularly good for capturing and replicating poses, including facial expressions and body language. It is used in the video to create images of a man walking and a woman Viking Warrior.

Media Pipe Face

Media Pipe Face is an alternative face detection and manipulation tool mentioned in the video. It is noted for providing more detailed control around the eyes, eyebrows, and mouth compared to the Control Net 1.1 models.

Image Upscaling

Image Upscaling is a process that increases the resolution of an image while maintaining or enhancing its quality. In the video, it is suggested as a next step for improving the generated images after the initial facial manipulation.

Highlights

The tutorial demonstrates how to control faces inside of stable Fusion using ControlNet.

Different input images can produce varied output results with the use of ControlNet.

The face and face only preprocessor options allow for control over the pose of the face and the direction of the head and torso.

Using Control version 1.1 with stable Fusion 1.5 models is recommended for portrait images.

The starting control is set to 0 and the ending control step at 1 for initial image generation.

Negative styles can be applied to improve image quality, especially when encountering issues with the face pre-processor.

Prompting the AI with specific actions, like 'woman shouting', can help achieve desired outputs even if not present in the control image.

Combining text prompts with styles like 'digital oil painting' can enhance the output images to be more in line with expectations.

Changing the ending control step can introduce variations in the generated images, providing a base for further creativity.

The open pose full model is particularly powerful for changing styles and poses, especially when dealing with full character images.

In cases where the face is not clearly visible, alternative prompts like 'woman Viking Warrior' can be used to achieve better results.

Image-to-image upscaling and in-painting techniques can further enhance the quality of the generated faces.

Mediapipe face is an alternative to ControlNet 1.1 face models, offering more detailed control around facial features.

Testing different models is crucial to find the best fit for a specific use case or desired outcome.

The tutorial provides a comprehensive guide on using ControlNet for generating high-quality AI faces with various poses and styles.

The importance of using the right model and settings is emphasized for achieving the most accurate and desired facial expressions and poses.

The video includes a step-by-step process for beginners to understand and apply ControlNet for face generation.

The presenter shares personal tricks and tips to overcome common issues encountered during the face generation process.