Which is better? Midjourney v6 vs. DALL-E 3 vs. Stable Diffusion XL
TLDRThis video compares the image generation capabilities of three leading AI models—DALL-E 3, Stable Diffusion XL, and Midjourney v6—across five categories: cartoon images, photorealistic humans, architecture, seamless patterns, and logos. The video presents a single prompt for each category and evaluates the outputs based on adherence to the prompt, style, and overall appeal. DALL-E 3 tends to produce more illustrative images, Midjourney v6 leans towards photorealism, and Stable Diffusion XL offers a mix of both styles. The video also includes a fun comparison with an older model, DALL-E 2, highlighting the significant advancements in AI image generation. Viewers are encouraged to guess which model generated each image and share their preferences in the comments.
Takeaways
- 📈 **Midjourney v6, DALL-E 3, and Stable Diffusion XL** are compared in image generation across five categories.
- 💡 **DALL-E 3** is available on the plus plan within chat GPT and focuses on generating images based on text prompts.
- 🔍 **Stable Diffusion XL** is the newest model from Stable Diffusion and can be accessed via their API or beta.dreamstudio.a.
- 💻 **Midjourney v6** is accessed through Discord after purchasing a subscription, with the basic plan starting at $10 per month.
- 🎨 **Image Categories** include cartoon images, photorealistic humans, architecture, seamless patterns, and logos.
- 🐙 **Cartoon Image Prompt** was 'underwater adventure' with a cheerful octopus, treasure chests, and playful fish.
- 🎷 **Photorealistic Human Prompt** described a middle-aged black male playing a saxophone on a busy city sidewalk.
- 🏰 **Architecture Prompt** was to create an image of a Gothic Cathedral with detailed features and a surrounding park.
- 🌸 **Seamless Textures Prompt** asked for a vintage floral wallpaper with hand-drawn flowers in pastel colors.
- ☕ **Logo Design Prompt** was for a gourmet coffee shop featuring a steaming coffee cup with coffee beans.
- 📊 **Comparison and Personal Preference** is emphasized as there might not always be a clear winner, and preference plays a role.
- 📚 **DALL-E 2** also generated images for the same prompts, showcasing the improvements in the newer models.
Q & A
Which three image generation models were compared in the video?
-The video compared DALL-E 3, Stable Diffusion XL, and Midjourney version 6.
How can one access DALL-E 3 for image generation?
-DALL-E 3 is accessible through the plus plan within chat GPT.
What is the cost for using Stable Diffusion XL's image generator?
-To use Stable Diffusion XL, you need to purchase credits, which are relatively cheap, offering about 5,000 images for every $10.
How much does the basic subscription plan for Midjourney cost?
-The basic subscription plan for Midjourney costs $10 per month, which includes about 200 image generations.
What are the five categories tested in the video to compare the image generators?
-The five categories tested were cartoon images, photorealistic humans, architecture, seamless patterns, and logos.
What was the prompt given to the models for generating a cartoon image?
-The prompt was to depict an underwater cartoon scene with a cheerful octopus wearing a pirate hat, surrounded by treasure chests, colorful coral reefs, and playful fish, with a translucent shimmering effect on the water.
Which model, according to the video, best adhered to the prompt for the cartoon image?
-The video suggests that DALL-E 3 best adhered to the prompt for the cartoon image, as it included the treasure chest shimmering effect and a colorful octopus with a pirate logo.
What was the prompt for generating a photorealistic human image?
-The prompt was to generate a photorealistic image of a street performer, a middle-aged black male playing a saxophone on a busy city sidewalk, with an urban street setting, pedestrians, tall buildings, and streetlights.
Which model generated the image that the video described as the most beautiful AI-generated image ever?
-Midjourney version 6 generated the image that was described as the most beautiful AI-generated image ever in the video.
What was the prompt for creating an image of an elaborate Gothic Cathedral complex?
-The prompt was to create an image of an elaborate Gothic Cathedral complex with detailed flying buttresses, pointed arches, and stained glass windows, surrounded by a small park with ancient trees, a cobblestone plaza, and statues from the medieval period.
How did the models perform in generating seamless textures as per the video?
-The video suggests that Midjourney version 6 performed the best with the seamless texture prompt, although it has a 'dash-tile' feature that the other models lack, which was not used to keep the comparison fair.
What was the final category tested in the video, and what was the prompt for the image generation?
-The final category tested was logos. The prompt was to illustrate a logo for a gourmet coffee shop featuring a steaming coffee cup with coffee beans, with a cozy and inviting feel, and a color scheme including warm tones like brown, cream, and red.
Outlines
🎨 Image Generation Comparison: Introduction and Cartoon Category
The video introduces a comparison between three major image generation models: Dolly 3, Stable Diffusion XL, and Mid Journey version 6. These models are tested across five categories, starting with cartoon images. The script describes the accessibility and cost of each model, and outlines the process of generating images based on prompts. The first category prompt is 'underwater adventure,' and the video presents three generated images, encouraging viewers to guess which model produced each before revealing the answers.
🎭 Photorealistic Human Category and Architectural Category
The second paragraph discusses the photorealistic human category, with a prompt to generate an image of a street performer playing a saxophone. The video showcases three images, each with a different style, and asks viewers to guess the model before revealing that the first image was Dolly 3, the second was Stable Diffusion XL, and the third was Mid Journey version 6. The paragraph then transitions to the architectural category, with a prompt to create an image of a Gothic Cathedral complex. Three different styles of images are presented, and it's revealed that Dolly 3 produced an isometric view, Mid Journey version 6 a photograph-style image, and Stable Diffusion XL a painting-style image.
🌿 Seamless Textures and Business Logo Design
The third paragraph focuses on the seamless textures category with a prompt to create a vintage floral wallpaper. Three images are shown, each with varying degrees of hand-drawn style and seamlessness. The video then reveals that the third image, which was thought to be Mid Journey version 6, was actually Dolly 3, and discusses the seamless nature of each image. The final category is the design of a business logo for a gourmet coffee shop, with a prompt for a cozy and inviting feel with warm tones. Three logo designs are presented, and the video concludes with the revelation of which model generated each logo, noting the improvements from Dolly 2 to Dolly 3 and Mid Journey version 6, and inviting viewer feedback for future video content.
Mindmap
Keywords
Image Generation
DALL-E 3
Stable Diffusion XL
Midjourney v6
Discord
Prompt
Cartoon Image
Photorealistic
Architecture
Seamless Patterns
Logos
Highlights
A comparison of image generation results between DALL-E 3, Stable Diffusion XL, and Midjourney v6 is conducted across five different categories.
DALL-E 3 is available on the plus plan within chat GPT.
Stable Diffusion XL is the newest model from Stable Diffusion, accessible through their API or beta.dreamstudio.
Midjourney v6 is accessed through Discord after purchasing a subscription plan.
The basic Midjourney plan costs $10 per month for about 200 image generations.
The video tests each image generator on categories such as cartoon images, photorealistic, architecture, seamless patterns, and logos.
The first category compared is cartoon images, with an underwater adventure prompt.
Midjourney v6's image was more unique, not strictly adhering to the prompt compared to the other models.
DALL-E 3's image for the photorealistic human category was criticized for looking too much like a Pixar movie.
Midjourney v6's photorealistic human image was the presenter's favorite, standing out for its quality.
In the architecture category, the Gothic Cathedral prompt resulted in varied styles, with DALL-E 3 producing an isometric view.
Stable Diffusion XL's painting-style image for the Gothic Cathedral lacked visible stained glass windows.
Midjourney v6 was praised for its seamless texture prompt, potentially due to its built-in --tile feature.
DALL-E 3's business logo for a gourmet coffee shop was noted for its close attempt at text, despite errors.
Stable Diffusion XL's logo lacked text but had an appealing design with accurate colors and a cozy feel.
Midjourney v6 did not attempt text in its logo, focusing on a visual representation that fit the aesthetic.
DALL-E 2 was shown for comparison, demonstrating the significant advancements in the newer models.
Viewers are encouraged to comment on which model and image style they prefer and suggest prompts for future tests.