Which is better? Midjourney v6 vs. DALL-E 3 vs. Stable Diffusion XL

WesGPT
25 Dec 202314:07

TLDRThis video compares the image generation capabilities of three leading AI models—DALL-E 3, Stable Diffusion XL, and Midjourney v6—across five categories: cartoon images, photorealistic humans, architecture, seamless patterns, and logos. The video presents a single prompt for each category and evaluates the outputs based on adherence to the prompt, style, and overall appeal. DALL-E 3 tends to produce more illustrative images, Midjourney v6 leans towards photorealism, and Stable Diffusion XL offers a mix of both styles. The video also includes a fun comparison with an older model, DALL-E 2, highlighting the significant advancements in AI image generation. Viewers are encouraged to guess which model generated each image and share their preferences in the comments.

Takeaways

  • 📈 **Midjourney v6, DALL-E 3, and Stable Diffusion XL** are compared in image generation across five categories.
  • 💡 **DALL-E 3** is available on the plus plan within chat GPT and focuses on generating images based on text prompts.
  • 🔍 **Stable Diffusion XL** is the newest model from Stable Diffusion and can be accessed via their API or beta.dreamstudio.a.
  • 💻 **Midjourney v6** is accessed through Discord after purchasing a subscription, with the basic plan starting at $10 per month.
  • 🎨 **Image Categories** include cartoon images, photorealistic humans, architecture, seamless patterns, and logos.
  • 🐙 **Cartoon Image Prompt** was 'underwater adventure' with a cheerful octopus, treasure chests, and playful fish.
  • 🎷 **Photorealistic Human Prompt** described a middle-aged black male playing a saxophone on a busy city sidewalk.
  • 🏰 **Architecture Prompt** was to create an image of a Gothic Cathedral with detailed features and a surrounding park.
  • 🌸 **Seamless Textures Prompt** asked for a vintage floral wallpaper with hand-drawn flowers in pastel colors.
  • ☕ **Logo Design Prompt** was for a gourmet coffee shop featuring a steaming coffee cup with coffee beans.
  • 📊 **Comparison and Personal Preference** is emphasized as there might not always be a clear winner, and preference plays a role.
  • 📚 **DALL-E 2** also generated images for the same prompts, showcasing the improvements in the newer models.

Q & A

  • Which three image generation models were compared in the video?

    -The video compared DALL-E 3, Stable Diffusion XL, and Midjourney version 6.

  • How can one access DALL-E 3 for image generation?

    -DALL-E 3 is accessible through the plus plan within chat GPT.

  • What is the cost for using Stable Diffusion XL's image generator?

    -To use Stable Diffusion XL, you need to purchase credits, which are relatively cheap, offering about 5,000 images for every $10.

  • How much does the basic subscription plan for Midjourney cost?

    -The basic subscription plan for Midjourney costs $10 per month, which includes about 200 image generations.

  • What are the five categories tested in the video to compare the image generators?

    -The five categories tested were cartoon images, photorealistic humans, architecture, seamless patterns, and logos.

  • What was the prompt given to the models for generating a cartoon image?

    -The prompt was to depict an underwater cartoon scene with a cheerful octopus wearing a pirate hat, surrounded by treasure chests, colorful coral reefs, and playful fish, with a translucent shimmering effect on the water.

  • Which model, according to the video, best adhered to the prompt for the cartoon image?

    -The video suggests that DALL-E 3 best adhered to the prompt for the cartoon image, as it included the treasure chest shimmering effect and a colorful octopus with a pirate logo.

  • What was the prompt for generating a photorealistic human image?

    -The prompt was to generate a photorealistic image of a street performer, a middle-aged black male playing a saxophone on a busy city sidewalk, with an urban street setting, pedestrians, tall buildings, and streetlights.

  • Which model generated the image that the video described as the most beautiful AI-generated image ever?

    -Midjourney version 6 generated the image that was described as the most beautiful AI-generated image ever in the video.

  • What was the prompt for creating an image of an elaborate Gothic Cathedral complex?

    -The prompt was to create an image of an elaborate Gothic Cathedral complex with detailed flying buttresses, pointed arches, and stained glass windows, surrounded by a small park with ancient trees, a cobblestone plaza, and statues from the medieval period.

  • How did the models perform in generating seamless textures as per the video?

    -The video suggests that Midjourney version 6 performed the best with the seamless texture prompt, although it has a 'dash-tile' feature that the other models lack, which was not used to keep the comparison fair.

  • What was the final category tested in the video, and what was the prompt for the image generation?

    -The final category tested was logos. The prompt was to illustrate a logo for a gourmet coffee shop featuring a steaming coffee cup with coffee beans, with a cozy and inviting feel, and a color scheme including warm tones like brown, cream, and red.

Outlines

00:00

🎨 Image Generation Comparison: Introduction and Cartoon Category

The video introduces a comparison between three major image generation models: Dolly 3, Stable Diffusion XL, and Mid Journey version 6. These models are tested across five categories, starting with cartoon images. The script describes the accessibility and cost of each model, and outlines the process of generating images based on prompts. The first category prompt is 'underwater adventure,' and the video presents three generated images, encouraging viewers to guess which model produced each before revealing the answers.

05:01

🎭 Photorealistic Human Category and Architectural Category

The second paragraph discusses the photorealistic human category, with a prompt to generate an image of a street performer playing a saxophone. The video showcases three images, each with a different style, and asks viewers to guess the model before revealing that the first image was Dolly 3, the second was Stable Diffusion XL, and the third was Mid Journey version 6. The paragraph then transitions to the architectural category, with a prompt to create an image of a Gothic Cathedral complex. Three different styles of images are presented, and it's revealed that Dolly 3 produced an isometric view, Mid Journey version 6 a photograph-style image, and Stable Diffusion XL a painting-style image.

10:01

🌿 Seamless Textures and Business Logo Design

The third paragraph focuses on the seamless textures category with a prompt to create a vintage floral wallpaper. Three images are shown, each with varying degrees of hand-drawn style and seamlessness. The video then reveals that the third image, which was thought to be Mid Journey version 6, was actually Dolly 3, and discusses the seamless nature of each image. The final category is the design of a business logo for a gourmet coffee shop, with a prompt for a cozy and inviting feel with warm tones. Three logo designs are presented, and the video concludes with the revelation of which model generated each logo, noting the improvements from Dolly 2 to Dolly 3 and Mid Journey version 6, and inviting viewer feedback for future video content.

Mindmap

Keywords

Image Generation

Image generation refers to the process of creating visual content from textual descriptions using artificial intelligence. In the video, image generation is the core theme as it compares the outputs of three different AI models: DALL-E 3, Midjourney v6, and Stable Diffusion XL. The process is showcased through various categories such as cartoon images, photorealistic humans, architecture, seamless patterns, and logos.

DALL-E 3

DALL-E 3 is an advanced AI model developed by OpenAI that is capable of generating high-quality images from textual prompts. It is one of the 'big three' models compared in the video, demonstrating its ability to interpret prompts and generate images across different categories, from cartoon scenes to architectural designs.

Stable Diffusion XL

Stable Diffusion XL is a newer model from the Stable Diffusion family, known for its ability to generate images. It is highlighted as the newest model with access provided through an API or the beta.dream studio/generate platform. The video evaluates its performance against the other models based on the quality and style of the images it produces.

Midjourney v6

Midjourney v6 is the latest model from Midjourney, which is accessible through a Discord subscription. It is noted for its photorealistic image generation capabilities and is compared with DALL-E 3 and Stable Diffusion XL in the video to determine its strengths and weaknesses in various image categories.

Discord

Discord is a communication platform where Midjourney v6 is accessible. Users need to subscribe to a plan and then access the Midjourney bot within a Discord server to start generating images. It serves as the interface for interacting with the Midjourney v6 model in the context of the video.

Prompt

A prompt in the context of AI image generation is a textual description that guides the AI model to create a specific image. The video script mentions that prompts were used to test the AI models' abilities across different categories, such as 'underwater adventure' for a cartoon image or 'photorealistic image of a street performer' for a realistic human depiction.

Cartoon Image

A cartoon image is a simplified, non-realistic representation of subjects, often with exaggerated features. In the video, the AI models are tasked with generating cartoon images based on prompts, showcasing their ability to interpret and create stylized visuals.

Photorealistic

Photorealistic refers to images that closely resemble photographs, with a high level of detail and realism. The video tests the AI models' capability to generate photorealistic images of human subjects, evaluating how well each model can replicate the nuances of real-life imagery.

Architecture

Architecture, in the context of the video, refers to the category of images depicting buildings or structures in a detailed and accurate manner. The AI models are prompted to generate images of a Gothic Cathedral, which includes elements like flying buttresses, pointed arches, and stained glass windows.

Seamless Patterns

Seamless patterns are repeating designs that can be tiled without visible breaks or gaps, creating a continuous look. The video challenges the AI models to generate a vintage floral wallpaper pattern that is seamless, testing their ability to create designs that can be repeated without interruption.

Logos

Logos are symbolic icons or designs used to represent brands or concepts. In the video, the AI models are prompted to illustrate a logo for a gourmet coffee shop, which should include a steaming coffee cup with coffee beans and evoke a cozy, inviting feel with a warm color scheme.

Highlights

A comparison of image generation results between DALL-E 3, Stable Diffusion XL, and Midjourney v6 is conducted across five different categories.

DALL-E 3 is available on the plus plan within chat GPT.

Stable Diffusion XL is the newest model from Stable Diffusion, accessible through their API or beta.dreamstudio.

Midjourney v6 is accessed through Discord after purchasing a subscription plan.

The basic Midjourney plan costs $10 per month for about 200 image generations.

The video tests each image generator on categories such as cartoon images, photorealistic, architecture, seamless patterns, and logos.

The first category compared is cartoon images, with an underwater adventure prompt.

Midjourney v6's image was more unique, not strictly adhering to the prompt compared to the other models.

DALL-E 3's image for the photorealistic human category was criticized for looking too much like a Pixar movie.

Midjourney v6's photorealistic human image was the presenter's favorite, standing out for its quality.

In the architecture category, the Gothic Cathedral prompt resulted in varied styles, with DALL-E 3 producing an isometric view.

Stable Diffusion XL's painting-style image for the Gothic Cathedral lacked visible stained glass windows.

Midjourney v6 was praised for its seamless texture prompt, potentially due to its built-in --tile feature.

DALL-E 3's business logo for a gourmet coffee shop was noted for its close attempt at text, despite errors.

Stable Diffusion XL's logo lacked text but had an appealing design with accurate colors and a cozy feel.

Midjourney v6 did not attempt text in its logo, focusing on a visual representation that fit the aesthetic.

DALL-E 2 was shown for comparison, demonstrating the significant advancements in the newer models.

Viewers are encouraged to comment on which model and image style they prefer and suggest prompts for future tests.