Stable Diffusion 3 Announced! How can you get it?

Sebastian Kamph
24 Feb 202407:56

Summary

TLDR这段视频讲述了稳定扩散3 (Stable Diffusion 3) 的发布,评估了这一最新的文本到图像模型在理解自然语言提示和生成高质量文本方面的能力。通过与Dolly和MidJourney等其他模型的比较,展示了稳定扩散3在准确生成带有自然语言文本的图像方面的出色表现。总的来说,这个新模型在理解复杂提示和生成高质量文本图像方面有着巨大的潜力,值得社区期待。

Takeaways

  • 🚀 Stable Fusion 3由Stability AI宣布发布,它是一个最新的文本到图像模型。
  • 📖 该模型的一个显著特点是对文本的理解和识别能力,尤其是在图像中包含文本的场景。
  • 🎨 在对比测试中,Stable Fusion 3在将文本融入图像方面表现出较好的能力,超过了DALL-E 3和MidJourney。
  • 🔍 Stability AI的网站发布了关于Stable Fusion 3的介绍,强调了其在多UB提示、图像质量和拼写能力方面的显著改进。
  • 🔥 除了文本处理,Stable Fusion 3还在理解复杂提示方面展现了优秀的性能。
  • 👀 Stable Fusion 3在一些示例中展现了出色的文本清晰度和正确性,甚至在复杂的场景中也能保持文本的可读性。
  • 💡 Stable Fusion 3目前不对公众开放,但感兴趣的用户可以加入等待名单。
  • 📚 开发者计划在未来几天发布关于Stable Fusion 3的白皮书,并将开始邀请用户预览。
  • 🖼️ 在多个对比示例中,Stable Fusion 3能够根据提示生成具有文本元素的高质量图像,展现了其在图像和文本融合方面的能力。
  • 🌟 稳定性AI通过社交媒体分享了Stable Fusion 3生成的图像示例,展示了它对复杂提示的理解和执行能力。

Q & A

  • 什么是Stable Diffusion 3,它有什么新特性?

    -Stable Diffusion 3是Stability AI公司最新发布的文本到图像模型。它在多主题提示理解、图像质量和拼写能力方面有了极大提升,能更好地理解提示语言并生成高质量文字。

  • 视频中提到了与Dolly和Mid Journey的比较,结果如何?

    -与Dolly相比,Stable Diffusion 3在生成带有文字的图像方面表现更佳。与Mid Journey相比,Stable Diffusion 3则在准确理解和呈现提示语言方面更出色。不过Mid Journey在美学和电影质感方面可能较强。

  • Stable Diffusion 3目前处于什么阶段?普通用户如何获取?

    -Stable Diffusion 3目前处于早期预览阶段,普通用户暂时还无法使用。不过你可以在Stability AI的网站上注册等候名单,未来会陆续开放给用户使用。

  • 视频中展示了哪些Stable Diffusion 3的使用案例?

    -视频展示了几个案例,包括在图像中生成带文字的物品、在桌面上显示"欢迎"、在墙上绘制"SD3"等,展示了模型在处理文字和准确理解提示语言方面的能力。

  • Stable Diffusion 3相较于之前版本在图像质量方面有何提升?

    -根据视频,Stable Diffusion 3在图像质量方面暂时可能没有太大提升,但它在理解和生成准确文字、处理多主题提示方面有了极大进步。

  • 如何评估Stable Diffusion 3与其他模型的差异?

    -视频采取了将相同的提示语言输入到不同模型,并比较输出结果的方式。这种实际操作和对比是评估模型差异的一种合理方法。不过最终还需要等待进一步的公开测试才能全面评估。

  • 模型在处理什么类型的提示语言时表现较好?

    -根据示例,Stable Diffusion 3在处理包含具体文字内容的提示时表现更出色,能够更好地将文字元素融入到生成的图像中。

  • 开发团队对于新模型有何展望?

    -视频中提到开发团队正在准备发布一份白皮书,对新模型的技术细节和性能进行详细介绍。他们对Stable Diffusion 3在提示理解和文本生成能力方面抱有很高期望。

  • 视频对比了Stable Diffusion 3在哪些方面的表现?

    -视频主要对比了Stable Diffusion 3在提示理解、文字生成和整合、拼写能力等方面的表现,并与Dolly和Mid Journey进行了对比。

  • 你对Stable Diffusion 3有何其他看法或建议?

    -Stable Diffusion 3从示例来看在提示理解和文字处理方面确实有了长足进步,对于需求较为复杂、包含文字元素的应用场景有着良好的潜力。不过仍需要进一步公开测试和评估,特别是在视觉质量和生成效果等方面。期待后续的技术详情公布和更多案例演示。

Outlines

00:00

🆕 《稳定扩散 3 发布》

这一段介绍了稳定扩散 3 的新功能和改进,包括更好的提示理解、生成带有真实文字的图像以及更高的图像质量和拼写能力。作者比较了稳定扩散 3、Dolly 和 MidJourney 在对文字提示的处理效果,发现稳定扩散 3 能够将文字很好地融入图像中。稳定扩散 3 目前处于预览阶段,你可以注册等候名单体验。该段还透露将很快公布白皮书,并逐步向用户开放。

05:02

🔎 《稳定扩散 3 效果展示》

这一段展示了一些稳定扩散 3、Dolly 和 MidJourney 针对特定提示生成的结果图像。作者分析了三种模型在处理文字和整体视觉效果方面的差异表现。稳定扩散 3 和 Dolly 在文字生成方面较为出色,但 MidJourney 在视觉美感方面更胜一筹。作者强调这些只是初步印象,需要进一步的系统对比测试来全面评估。他还鼓励观众关注稳定扩散团队成员在社交媒体上分享的更多效果对比图。

Mindmap

Highlights

稳定 Fusion 3 是 Stability AI 最新推出的文本图像模型,具有极佳的多主题提示理解能力、图像质量和拼写能力。

稳定 Fusion 3 的文本生成质量远胜 Dolly 3 和 MidJourney,能更好地将提示中的文本融入图像。

稳定 Fusion 3 尚未对外开放,但可以注册等候名单。开发团队即将发布白皮书,之后会邀请人员预览。

Andre 分享了一些带原始提示的示例图像,展示了稳定 Fusion 3 出色的提示理解能力。

在"电脑屏幕显示'欢迎',墙上涂鸦写着 SD3"的提示中,稳定 Fusion 3 表现出色,但 Dolly 和 MidJourney 在文本细节方面略显不足。

在"布料上绣着'晚安'和一只老虎"的提示中,稳定 Fusion 3 和 Dolly 都生成了正确文本,但 MidJourney 的图像视觉效果更好。

在"三个透明瓶子上分别标有 1、2、3,颜色为红、蓝、绿"的提示中,稳定 Fusion 3 展现了卓越的提示理解能力。

在"红球、蓝立方体、绿三角形,左边一只猫,右边一只狗"的复杂提示中,稳定 Fusion 3 的表现令人印象深刻。

作为文本图像模型,稳定 Fusion 3 的图像质量暂时可能还无法超越一些定制训练的模型。

总的来说,视频认为稳定 Fusion 3 在提示理解和文本生成能力方面有了重大突破,值得期待它的正式发布。

视频中提到的对比实验并非严谨对比测试,只是为了直观展示稳定 Fusion 3 的能力,后续需要更全面的评测。

除了文本生成,视频也对稳定 Fusion 3 在视觉质量方面的表现保持了一定期待。

视频展示的是稳定 AI 团队精心挑选的示例成果,有待后续更广泛的测试和评估。

虽然视频重点介绍了稳定 Fusion 3,但也承认目前难以全面评判它与其他模型的差异。

视频呼吁观众在评论区分享对稳定 Fusion 3 的看法和期望,以促进讨论和交流。

Transcripts

play00:00

stable Fusion 3 was just announced by

play00:02

stability AI what's the big deal then

play00:04

well I'll tell you prompt understanding

play00:06

text like real proper text and is there

play00:09

anything else well let's check it out oh

play00:12

and what color is the wind

play00:17

blue

play00:19

AI let's just start off with a quick

play00:21

comparison here so here we have a prompt

play00:23

epic anime artwork of a wizard at top a

play00:25

mountain at night casting a cosmic spell

play00:28

into the dark sky of the says stable

play00:31

diffusion 3 made out of colorful energy

play00:35

and the example here is stable defusion

play00:38

3 this is obviously cherry pick so only

play00:41

have one image to go from but I took the

play00:43

same prompt here and I put it into

play00:45

dolly3 which is the middle one here and

play00:48

mid Journey which is the the one to the

play00:51

right here and I didn't cherry pick this

play00:53

at all I just took the first four Images

play00:56

out of both Dolly and mid journey I also

play00:59

did some some comparisons with

play01:02

sdxl but honestly we don't even need to

play01:05

look at that because we're not getting

play01:08

any text at all the images look fine

play01:11

that's not the issue uh but for this

play01:13

example it's all about the text and in

play01:15

the stable Fusion 3 one here we actually

play01:18

get some pretty good looking text now

play01:20

the A and the B has kind of merged

play01:22

together but it's fine you can see they

play01:24

actually says stable diffusion three in

play01:26

the dolly example here in the middle

play01:28

they're kind of cool uh we are not

play01:31

getting any text recognition at all now

play01:34

dolly is amazing for prompt

play01:38

understanding and most of the time it's

play01:39

pretty good at text but not in this

play01:41

example we're going to look at some

play01:42

examples later where uh Dolly shines a

play01:45

little bit more and in the right example

play01:47

here the mid Journey one the text is I

play01:50

mean you can see what it says and for

play01:53

one of the images here it actually is

play01:56

spelled correctly now in three of them

play01:58

it is not but it's very very close

play02:02

however the text in the mid Journey one

play02:04

isn't really getting the style of the

play02:08

prompt so it's not really casting a

play02:10

cosmic spell into the sky that says stab

play02:12

diffusion 3 in the stable diffusion

play02:15

example it actually becomes a part of

play02:17

the image I'm going to check some more

play02:20

uh comparisons in a bit now if you go to

play02:22

stability AI site they have a News Post

play02:25

basically saying stabil Fusion 3

play02:27

announcing in early preview our most

play02:30

capable text image model with greatly

play02:32

improved performance in multi-ub prompts

play02:36

image quality and spelling abilities

play02:38

what that is is basically it's going to

play02:40

be able to understand your prompts much

play02:42

much better and be able to get text in

play02:45

there is it going to be much much better

play02:47

image quality I don't believe so at this

play02:49

time but we'll need to compare On's

play02:52

custom train models are out there now

play02:55

looking at the the examples here which

play02:57

are obviously Cherry Picked you can see

play02:59

the text is is well pretty good so we

play03:01

have a text here go big or go home next

play03:04

to this apple here here we have the

play03:05

stabil fusion 3 inside of this paper

play03:08

newspaper clip magazine clip whatever

play03:11

and here we actually have a text on two

play03:13

different parts you have go on the sign

play03:15

here and dream on on the bus and if you

play03:18

look closely it actually says stable

play03:21

Fusion on the side here on the bus and

play03:24

it looks like it's not super clear but

play03:26

it looks like it's spelled correctly

play03:29

looks like one i2f and one s there so

play03:33

that's so far pretty cool now this isn't

play03:36

available for you to use yet however you

play03:39

can sign up for the wait list and you do

play03:41

that by clicking this little thingamajig

play03:43

here which will get you to this sign up

play03:46

form sign up here submit and uh you'll

play03:49

be in the wait list now I talked to a

play03:52

developer about this and we will be

play03:55

seeing a white paper in the coming days

play03:58

after that they're going to start start

play04:00

inviting people to the the preview I

play04:02

know some YouTubers have already said

play04:04

that they have officially gotten uh a

play04:07

confirmation that they've got it in yet

play04:09

I haven't ping emad about that some of

play04:12

us are actually focusing particularly on

play04:14

stable Fusion but in general looking at

play04:16

these images we can't say much because

play04:20

these are examples Here Without Really

play04:24

any prompt stuff like that however if

play04:27

you uh search around on the interet a

play04:29

little bit you can actually find that on

play04:32

Twitter some of the the people of

play04:34

stability AI in this case Andre which is

play04:37

uh working with media in stability has

play04:40

posted images with the prompts so in

play04:43

this one here photo of a 19s desktop

play04:46

computer on a work desk on the computer

play04:48

screen it says welcome on the wall on

play04:50

the background we see beautiful graffiti

play04:53

with the text sd3 very large on the wall

play04:56

so in this chair picked again example

play04:59

it's very good now if you compare this

play05:01

to for example Dolly which is this one

play05:04

here and I'm going to pull up an a mid

play05:06

Journey one which is this one to the

play05:09

right here we can see that in the dolly

play05:11

one here to the test we're getting some

play05:13

welcome on the screen looks very good

play05:15

fairly good uh we are getting the SD

play05:18

text in the wall behind here however it

play05:21

doesn't say

play05:23

sd3 it's an S here this one says

play05:27

sdp3 and the other one on I can't really

play05:30

read at all the same prompt in in mid

play05:33

Journey gives you welcome you get an sd3

play05:37

on the screen in three of the examples

play05:39

you get an sd3 behind here in some of

play05:42

them uh this one says s D3 or SDI 3 uh

play05:47

so you know it's somewhat getting it but

play05:51

not fully with comparing the prompt

play05:54

understanding just apart from the text

play05:56

I'd say they're currently on a okay

play06:00

level because we're comparing random

play06:02

results from a chair pick result so

play06:04

we'll have to do a proper comparison

play06:07

once we can start generating our

play06:08

ourselves so this is just a rough

play06:11

estimate now next up here we have a

play06:13

prompt that is resting on the kitchen

play06:15

table is an embroidered cloth with a

play06:17

text good night and an embroidered baby

play06:20

tiger next to the cloth there is a lit

play06:22

candle the lighting is dim and dramatic

play06:26

you can see that for both stable Fusion

play06:29

3

play06:30

and doly you're getting good text here

play06:32

so there's good prompt recognition

play06:34

regarding the text you can see that it

play06:37

says good night and for two of the

play06:39

images is actually well looks pretty

play06:43

good for Mid Journey we are losing the

play06:47

text in most of the images however we

play06:50

are getting a more cinematic Vibe so

play06:53

just from a visually appealing or

play06:56

aesthetically appealing sense that image

play06:59

you look looks well a little a little

play07:00

more beautiful however from a prompt

play07:03

perspective both stable Fusion 3 here

play07:05

and Dolly kind of wins in that regard if

play07:08

you want to keep browsing there are more

play07:10

images on Twitter check out emat check

play07:13

out Andre here's an example with three

play07:16

transparent glass bottles and a wooden

play07:18

table it's actually understanding that

play07:20

the left one should be red the middle

play07:22

one blue and the green one here is on

play07:25

the right and they're numbered 1 2 3 so

play07:28

that's pretty cool I would love to know

play07:29

what you feel in the comments below but

play07:31

there is more stuff if you just keep

play07:34

checking the Twitter here in this image

play07:36

we have a photo of a red sphere on top

play07:39

of a blue cube behind them is a green

play07:42

triangle on the right is a dog on the

play07:45

left is a cat and that is tremendous

play07:48

prompt understanding really good if if I

play07:51

say so myself thanks for watching see

play07:54

you

Rate This

5.0 / 5 (0 votes)

相关标签
人工智能科技前沿文本图像模型对比视觉效果提示词理解竞争格局未来趋势视频分享产品体验