No Priors Ep.61 | OpenAI's Sora Leaders Aditya Ramesh, Tim Brooks and Bill Peebles

No Priors Podcast
25 Apr 202431:24

Summary

TLDR「No Priors」というポッドキャストのエピソードで、OpenAI SoraチームのAdia、Tim、Billが登場し、新しい生成型ビデオモデルSoraについて語っています。Soraはテキストプロンプトを受け取り、高解像度で視覚的に整合性のある1分間のクリップを生成できます。彼らは、このような大規模なビデオモデルがワールドシミュレータになる可能性と、スケーラブルなTransformersアーキテクチャをビデオ分野に適用することで、人工知能(AGI)への道を模索しています。また、Soraがどのようにして創造的なコンテンツの制作を容易にし、エンターテイメントや教育、コミュニケーションの新しいパラダイムをもたらす可能性があるかについて語ります。さらに、ビデオモデルの安全性に関する議論も行われ、偽造や誤情報のリスクを最小限に抑えるための取り組みが紹介されています。

Takeaways

  • 🚀 Soraは、テキストプロンプトから高解像度で視覚的に一貫性のある、最大1分間のビデオクリップを生成できる新世代のビデオモデルです。
  • 🤖 Soraの開発チームは、人工知能(AGI)への道に重要な役割を果たすと信じており、複雑な環境や世界を神経ネットワークの重みの中でモデル化できると見ています。
  • 🌐 Soraは、将来的には人や動物、オブジェクトなどを含むワールドシミュレータとして機能し、人間がそれらと相互作用できる可能性があります。
  • 🎨 Soraはアーティストやクリエイターにアクセスを提供し、フィードバックを得ることで、ツールとして最も有用になる方法や安全に導入する方法を学んでいます。
  • 📈 Soraの能力は、コンピュートパワーやデータの追加によって向上し、より良いシミュレーションや長期的なビデオ生成が可能になる見込みです。
  • 🧩 SoraはTransformerアーキテクチャを用いてスケーラブルで、ビデオデータの複雑な関係を学習することができます。
  • 📚 Soraは、SpaceTime patchesという概念を導入し、ビデオを3D立方体として扱うことで、言語モデルのように様々なタイプの視覚データを扱えるようになりました。
  • 🔍 Soraは、物理世界の理解を深めるために、ビデオから3D情報を自ら学習し、オブジェクトの永続性や詳細な物体相互作用を改善する予定です。
  • 🌟 Soraのリリースは、ビデオモデルにおけるGPT-1のような瞬間であり、今後のバージョンでは創造性やAGIへの貢献がさらに期待されます。
  • 🤔 Soraの開発チームは、偽造動画や誤情報のリスクなどの安全性問題に注意を払いながら、テクノロジーの利点を最大限に活かす方法を模索しています。
  • ✨ Soraの進化は、人間の世界モデルよりも高精度な予測が可能になる一方で、人間の知能とは異なる知能の形を探求する可能性を示しています。

Q & A

  • Soraという新感覚の生成ビデオモデルはどのような特徴を持っていますか?

    -Soraは、テキストプロンプトを受け取り、高解像度で視覚的に整合性のある、最大1分間のクリップを生成する新しい斉次変換子アーキテクチャを応用した生成ビデオモデルです。それは非常に複雑な環境や世界をニューラルネットワークの重みだけでモデル化できる可能性を示しています。

  • SoraがAGI(人工的一般知能)への道に沿っていると信じている理由は何ですか?

    -Soraは複雑な環境をモデル化する能力を持ち、人々が相互作用し、考え方、さらには動物や様々な物体をモデル化する方法を学ぶ必要があるため、AGIへの道に沿っているとされています。

  • Soraを一般に利用可能にするためのロードマップやタイムラインはありますか?

    -現時点では、製品の即時計画やタイムラインはありませんが、アーティストやレッドチームにアクセスを提供し、フィードバックを得ることで、Soraが世界に与える影響や人々にどのように役立つかについて考えています。

  • Soraの導入によって、安全に関する懸念は何ですか?

    -Soraの導入によって、偽造動画やスプーンのリスク、誤情報の拡散など、新たな安全問題が懸念されます。また、企業やソーシャルメディア企業、ユーザーの責任分担についても検討する必要があります。

  • Soraの将来的な進化について、どのような期待を持っていますか?

    -Soraはより複雑で長期的な物理的相互作用をより正確に捉えることができるようになると期待されています。また、Soraは3D情報を学び、人間の世界をより深く理解するため、より知能的で総合的なAIモデルに貢献すると見ています。

  • Soraが持つ創造性とビジュアルの魅力について、どのように考えられていますか?

    -Soraは言語理解を通じてユーザーが生成内容を方向付けることができる能力を持っていますが、特にアディアの美学が深く埋め込まれているわけではありません。将来的には、個人のセンスに合わせてカスタマイズすることができると期待されています。

  • Soraのトークン化について説明してください。

    -Soraでは、SpaceTimeパッチという概念を使ってデータを表します。これは、画像やビデオのデータがどのように存在するかにかかわらず、それを表すことができます。これにより、Soraは720pビデオだけでなく、縦ビデオやワイドスクリーンビデオ、1:1から2:1までのアスペクト比の映像を生成できます。

  • Soraのアーキテクチャの進化について教えてください。

    -Soraのアーキテクチャは、画像ジェネレーターからビデオジェネレーターへと拡張するのではなく、1分間のHDビデオを生成するための問いを出発点として、スケーラブルで単純な方法でデータを分解することができるモデルを目指して進化しました。

  • Soraが今後のAI研究の道筋に与える影響について説明してください。

    -Soraは、ビデオデータから学び、3Dや物体の永続性などの概念を理解し、人間の世界をより深く知ることができます。これにより、AIモデルがより知能的で総合的になることが期待されており、ビデオジェネレーション以外にも幅広い影響を与える可能性があります。

  • Soraの将来のアップデートで期待されることとは何ですか?

    -Soraの今後のアップデートでは、より複雑で長期的な物理的相互作用の正確性を向上させることに加えて、個人の美学やスタイルをよりよく理解し、カスタマイズすることができるようになると期待されています。

  • Soraの導入に伴い、社会的な責任や安全性に関する問題についてどのように考えていますか?

    -Soraの導入に伴い、誤情報のリスクや偽造メディアの拡散などの問題に対処する必要があります。また、Soraを展開する企業は、技術の責任を負い、社会メディア企業やユーザーにも責任を分担してもらう必要があります。

Outlines

00:00

🎉 Soraの紹介とAGIへの道

このエピソードでは、OpenAI SoraチームのAdia、Tim、Billが招かれ、新しい生成型ビデオモデルSoraについて語ります。Soraはテキストプロンプトを受け取り、高解像度で視覚的に整合性のある1分間のクリップを生成できます。Soraは、ビデオドメインにスケーラブルなTransformersアーキテクチャを適用することで、大型ビデオモデルが世界シミュレータになる可能性を示しています。また、SoraはAGI(人工的に総じて知能を持つ)への重要な道筋だとチームが信じています。

05:01

🤖 Soraの応用と創造性

Soraの将来的な用途について語り合う。短編映画や他のメディアジャンルに組み込まれる可能性があると予想されていますが、新しい創造的な使用方法も見出されるでしょう。また、ロボティクスや物理エンジンシミュレーションなど、Soraがどのように未来のアプリケーションに貢献するかについても議論します。Timは、diffusion Transformerと呼ばれるSoraの基礎技術について説明し、その進化と将来の可能性に期待しています。

10:02

🧩 Soraのアーキテクチャとトークン化

Soraのアーキテクチャとトークン化について説明します。SpaceTimeパッチという概念を使って、ビデオを3D立方体として扱い、Transformerモデルにフィードすることで、ビデオ生成の柔軟性を高めています。これにより、Soraは解像度やビデオの長さにとらわれずに、さまざまなタイプのビジュアルコンテンツを生成できます。また、Soraの開発に必要なインフラストラクチャとシステムの構築についても触れています。

15:03

🌟 Soraの美学と創造性の今後

Soraの視覚的な美しさについて語り合う。Soraは、言語理解を通じてユーザーが生成内容を制御できるように設計されています。将来的には、アーティストが自分のポートフォリオをアップロードして、モデルがデザイン会社の専門用語や独自の美学を理解できるようになるでしょう。また、Soraが教育やコミュニケーションにどのように適用される可能性があるかについても議論し、新しいエンターテイメントパラダイムの到来を期待しています。

20:05

🚧 Soraの安全性と将来の課題

Soraの安全性と偽造可能性について話し合う。Soraは、誤情報のリスクや安全性に関する問題を慎重に対処する必要があります。また、Soraの将来の課題として、オブジェクトの永続性や複雑な物理的な相互作用の改善が挙げられます。Soraは現在では、複雑なオブジェクト相互作用を正確に扱うことができない場合があり、改善の余地があります。

25:06

🌐 Soraの研究ロードマップと社会的意義

Soraの研究ロードマップと、それがAI全体の進化に与える影響について語ります。Soraは、ビデオデータから3D情報を学び、人間の知覚や世界との相互作用を理解する力を持つ可能性があります。これは、より知能的なAIモデルにつながる重要な要素になるでしょう。また、Soraの将来のバージョンが、人間の世界モデルを超える知能を持つ可能性についても議論しています。

30:08

📈 Soraのスケーラビリティと期待

Soraのスケーラビリティと、コンピュートパワーを増やすことによる将来の進化について話します。Soraは、データ予測の単純なタスクにフォーカスしており、これはスケーラビリティにおいて効果的です。また、Soraのリリースは、GPTモデルの登場のような重要な瞬間であり、非常に迅速に進化すると期待されています。最後に、Soraのリリースとそれに伴う創造性の未来、AGIへの道のり、そして安全性に関する配慮についても触れています。

Mindmap

Keywords

💡Open AI Sora

Open AI Soraは、テキストプロンプトを受け取り、高解像度で視覚的に整合性のある、最大1分間のビデオクリップを生成できる新しい汎用ビデオモデルです。これは、テキストからビデオへの道を模索するAGI(人工的な一般知能)への道のりにおいて重要な役割を果たしているとされています。

💡World simulators

ワールドシミュレーターとは、現実世界の複雑な環境や世界をモデル化し、人々の相互作用や思考、動物やオブジェクトの振る舞いをシミュレーションする可能性を持つ、高度なビデオモデルのことを指します。Soraは将来的にこのようなシミュレーションを可能にすることができると考えられており、AGIへの道のりにおいて重要な役割を果たすでしょう。

💡Scalable Transformers architecture

スケーラブルトランスフォーマーアーキテクチャとは、大量のデータを学習し、ビデオ内の複雑で挑戦的な関係を理解できるように設計されたニューラルネットワークの構造です。Soraはこのアーキテクチャを応用して、より多くのコンピュートパワーやデータでトレーニングすることで、より優れた結果を出すことができます。

💡AGI (Artificial General Intelligence)

人工一般知能(AGI)とは、特定のタスクに特化したAIとは異なり、人間のように幅広い範囲の知能的タスクを実行できる知能を持ち、自己学習し、適応力のあるAIのことを指します。Soraの開発は、AGIへの道のりに位置づけられており、複雑な環境をモデル化する能力を持つことから、AGIの実現に向けて重要な一歩とされています。

💡Diffusion Transformer

ディフュージョントランスフォーマーは、ノイズから始まりながらも繰り返しノイズを除去することで、最終的にビデオサンプルを作成するプロセスを用いたビデオ生成モデルです。この手法は、トランスフォーマーアーキテクチャを基にしており、コンピュートパワーを増やすることで、より高精度のビデオ生成が可能になります。

💡SpaceTime patches

スペースタイムパッチとは、ビデオを3Dの立方体状のデータブロックとして扱う方法で、ビデオの各フレームを独立した画像として扱うことができます。これにより、Soraは解像度やビデオの長さにかかわらず、様々な種類のビデオデータを生成できる柔軟性を獲得しています。

💡Aesthetic

アエスセティックとは、視覚的な美しさやデザインの魅力に関する概念です。Soraの言語理解能力により、ユーザーは生成されるコンテンツのアエスセティックに方向性を与えることができます。また、将来的には、個人のアエスセティックに合わせてモデルをカスタマイズできる可能性があります。

💡Misinformation

誤情報とは、誤った情報やデマを広める行為を指します。Soraのようなビデオモデルが普及するにつれて、偽造ビデオが誤情報の源になり得ることから、開発チームは安全性と誤情報のリスクを重視し、適切な対策を模索しています。

💡Object permanence

オブジェクトパーマネンスとは、オブジェクトが視界から離れていても依然として存在するという概念です。Soraの現在のバージョンでは、複雑なオブジェクト間の相互作用を正確に扱うことが困難ですが、今後のモデルではこれらの相互作用をより正確に表現できるようになると期待されています。

💡Physical interactions

物理的相互作用とは、物体同士が接触や力を加えることで生じる現象を指します。Soraは、人々の歩行のような単純な相互作用を信頼性高く生成できますが、より複雑な物体間の相互作用はまだ改善の余地があります。将来的には、ボールを蹴って飛ばすなど、詳細な物理的相互作用を正確に再現できるようになると期待されています。

Highlights

OpenAI Sora is a new generative video model capable of creating high-definition, visually coherent clips up to a minute long from text prompts.

Sora raises the possibility of large video models acting as world simulators by applying scalable Transformers architecture to the video domain.

The team behind Sora believes that models like Sora are on a critical pathway to achieving AGI (Artificial General Intelligence).

Sora's potential future applications include world simulation where users can interact with complex environments modeled within a neural network.

OpenAI is currently providing access to Sora to a small group of artists and red teamers to gather feedback and assess the impact of the technology.

Feedback from artists suggests a need for more control and the potential for extending the model's capabilities beyond text input.

The team is inspired by the creative ways artists are using Sora to tell stories and generate compelling content.

Sora's technology is based on a diffusion Transformer, which generates data by starting from noise and iteratively removing it.

The Transformer architecture allows Sora to scale and improve as more compute power and data are applied to training the model.

Sora introduces the concept of 'SpaceTime patches', enabling the model to represent and learn from data in various video resolutions and lengths.

The team is focused on fundamental research and improvements to Sora rather than specific downstream applications like digital avatars.

Safety considerations are a priority, especially regarding the potential for misinformation and the need for responsible deployment of the technology.

Sora's development is seen as analogous to the early stages of GPT models, with expectations of rapid improvement and increased capabilities.

The team is excited about the creative potential of Sora and its future role in entertainment, education, and communication.

Sora's ability to learn about the world from visual data is expected to contribute to more intelligent AI models that better understand and interact with the world.

The team aims to make Sora more accessible by addressing computational costs and safety concerns to democratize the technology's use.

Sora's current limitations include the accuracy of complex, long-term physical interactions, which the team anticipates will improve with further development.

The public may misunderstand the potential and development trajectory of video models like Sora, which the team compares to the early stages of language model development.

Transcripts

play00:02

[Music]

play00:06

hi listeners welcome to another episode

play00:08

of no priors today we're excited to be

play00:10

talking to the team behind open AI Sora

play00:13

which is a new generative video model

play00:16

that can take a text prompt and return a

play00:18

clip that is high defin visually

play00:20

coherent and up to a minute long Sora

play00:23

also raised the question of whether

play00:25

these large video models are World

play00:26

simulators and applied the scalable

play00:28

Transformers architecture to the video

play00:31

domain we're here with the team behind

play00:33

it Adia remesh Tim Brooks and Bill pees

play00:37

welcome to no priors guys thanks so much

play00:39

for having us to start off why don't we

play00:41

just ask each of you to introduce

play00:42

yourselves so our listeners know uh who

play00:44

we're talking to did you mind starting

play00:46

us off sure I'm Adia I lead the Sora

play00:48

team together with Tim and Bill hi I'm

play00:51

Tim I also lead the Sora team I'm Bill

play00:54

also lead the Sora team simple enough um

play00:56

maybe we can just start with you know

play00:58

the open AI mission is AGI right um

play01:01

greater intelligence is text to video

play01:04

like on path to that mission how'd you

play01:05

end up working on this yeah we

play01:07

absolutely believe models like Sora are

play01:09

really on the critical Pathway to AGI we

play01:11

think one sample that illustrates this

play01:13

kind of nicely is a scene with a bunch

play01:15

of people walking through Tokyo during

play01:16

the winter and in that scene there's so

play01:18

much complexity so you have a camera

play01:20

which is flying through the scene

play01:22

there's lots of people which are

play01:23

interacting with one another they're

play01:25

talking they're holding hands they're

play01:26

people selling items at nearby stalls

play01:29

and we really think this sample

play01:30

illustrates how Sora is on a pathway

play01:32

towards being able to model extremely

play01:34

complex environments and worlds all

play01:37

within the weights of a neural network

play01:38

and looking forward you know in order to

play01:40

generate truly realistic video you have

play01:42

to have learned some model of how people

play01:44

work how they interact with others how

play01:46

they think ultimately and not only

play01:47

people also animals and really any kind

play01:49

of object you want to model and so

play01:51

looking forward as we continue to scale

play01:53

up models like Sora we think we're going

play01:55

to be able to build these like World

play01:57

simulators where essentially you know

play01:59

any body can interact with them I as a

play02:01

human can have my own simulator running

play02:02

and I can go and like give a human in

play02:04

that simulator work to go do and they

play02:06

can come back with it after they're done

play02:08

and we think this is a pathway to AGI

play02:10

which is just going to happen as we

play02:12

scale up Sora in the future it's been

play02:14

said that we're still far away despite

play02:17

massive demand for a consumer product

play02:19

like what uh is is that on the road map

play02:21

what do you have to work on before you

play02:23

you have broader access to Sora timy

play02:26

want to talk about sure yeah so we

play02:28

really want to engaged with people

play02:30

outside of open a and thinking about how

play02:33

Sora will impact the world how it will

play02:35

be useful to people and so we don't

play02:37

currently have immediate plans or even a

play02:40

timeline for creating a product but what

play02:42

we are doing is we're giving access to

play02:44

Sora to a small group of artists as well

play02:47

as to Red teamers to start learning

play02:48

about what impact Sora will have and so

play02:51

we're getting feedback from artists

play02:52

about how we can make it most useful as

play02:54

a tool for them as well as feedback from

play02:56

Red teamers um about how we can make

play02:59

this safe how we we could introduce it

play03:00

to the public and this is going to set

play03:02

our road map for our future research and

play03:04

inform if we do in the future end up

play03:07

coming up with the product or not um

play03:09

exactly what timelines that would have

play03:11

did can you tell us about some of the

play03:12

feedback you've gotten yeah so we have

play03:15

given access to Sora to like a small

play03:17

handful of artists and creators just to

play03:19

get early feedback um in general I think

play03:23

a big thing is just control ability so

play03:25

right now the model really only accepts

play03:27

text as input and while that's it's

play03:29

useful it's still pretty constraining in

play03:31

terms of being able to uh specify like

play03:34

precise descriptions of what you want so

play03:36

we're thinking about like you know how

play03:39

to extend the capabilities of the model

play03:41

potentially in the future so that you

play03:43

can supply inputs other than just text

play03:45

do you all have a favorite thing that

play03:46

you've seen artists or others use it for

play03:49

or a favorite video or something that

play03:50

you found really inspiring I know that

play03:52

when it launched a lot of people were

play03:53

really stricken by just how beautiful

play03:55

some of the images were how striking how

play03:57

you'd see the shadow of a cat in a pool

play03:59

of water things like that but as just

play04:01

curious what what you've seen sort of

play04:02

emerge as people more and more people

play04:04

have started using it yeah it's been

play04:05

really amazing to see what the artists

play04:08

do with the model because we have our

play04:10

own ideas of some things to try but then

play04:13

people who for their profession are

play04:15

making creative content are like so

play04:17

creatively brilliant and do such amazing

play04:20

things so shy kids had this really cool

play04:22

video that they made this short story uh

play04:25

Airhead with um this character that has

play04:29

a balloon and they really like made this

play04:31

story and there it

play04:34

was really cool to see a way that Sora

play04:38

can unlock and make this story easier

play04:40

for them to tell and I think there it's

play04:42

even less about like a particular clip

play04:44

or video that Sora made and more about

play04:46

this story that the these artists want

play04:49

to tell and are able to share and that

play04:50

Sora can help enable that so that is

play04:52

really amazing to see you you mentioned

play04:54

the Tokyo scene others my personal

play04:57

favorite sample that we've created is uh

play05:00

the bling Zoo so I posted this on my

play05:02

Twitter uh the day we launched Sora and

play05:05

it's essentially a multi-shot scene of a

play05:08

zoo in New York which is also a jewelry

play05:10

store and so you see like saber-tooth

play05:12

tigers kind of like decked out with

play05:13

bling it was very surreal yeah yeah and

play05:16

so I love those kinds of samples because

play05:19

as someone who you know loves to

play05:21

generate creative content but doesn't

play05:22

really have the skills to do it it's

play05:24

like so easy to go play with this model

play05:26

and to just fire off a bunch of ideas

play05:28

and uh get something that's compelling

play05:30

like the time it took to actually

play05:31

generate that in terms of iterating on

play05:33

prompts was you know really like less

play05:35

than an hour to like get something I

play05:37

really loved um so I had so much fun

play05:39

just playing with the model to get

play05:40

something like that out of it and it's

play05:41

great to see the artists are also

play05:43

enjoying using the models and getting

play05:45

great content from that what do you

play05:46

think is a timeline to broader use of

play05:48

these sorts of models for short films or

play05:50

other things because if you look at for

play05:51

example the evolution of Pixar they

play05:53

really started making these Pixar shorts

play05:55

and then a subset of them turned into

play05:56

these longer format movies and um a lot

play06:00

of it had to do with how well could they

play06:01

actually World model even little things

play06:03

like the movement of hair or things like

play06:05

that and so it's been interesting to

play06:07

watch the evolution of that prior

play06:08

generation of technology which I now

play06:10

think is 30 years old or something like

play06:11

that do you have a prediction on when

play06:13

we'll start to see actual content either

play06:15

from Sora or from other models that will

play06:17

be professionally produced and sort of

play06:19

part of the broader media genre that's a

play06:22

good question I I don't have a

play06:24

prediction on the exact timeline but but

play06:26

one thing related to this I'm really

play06:28

interested in is what things other than

play06:31

like traditional films people might use

play06:32

this for I do think that yeah maybe over

play06:35

the next couple years we'll see people

play06:36

starting to make like more and more

play06:39

films but I think people will also find

play06:41

completely new ways to use these models

play06:43

that are just different from the current

play06:45

media that we're used to because it's a

play06:47

very different Paradigm when you can

play06:50

tell these models kind of what you want

play06:52

them to see and they can responds in a

play06:54

way and maybe there are just like new

play06:56

modes of interacting with content that

play06:59

like really creative artists will come

play07:00

up with so I'm actually like most

play07:02

excited for what totally new things

play07:05

people will be doing that's just

play07:06

different from what we C have it's

play07:08

really interesting because one of the

play07:09

things you mentioned earlier this is

play07:10

also a way to do World modeling and I

play07:12

think at it you've been at open AI for

play07:13

something like five years and so you've

play07:15

seen a lot of the evolution of models in

play07:17

the company and what you've worked on

play07:18

and I remember going to the office

play07:20

really early on and it was initially

play07:21

things like robotic arms and it was

play07:23

self-playing Games and Things or

play07:24

selfplay for games and things like that

play07:26

um as you think about the capabilities

play07:28

of this world simulation model do you

play07:31

think it'll become a physics engine for

play07:33

simulation where people are you know

play07:35

actually simulating like wind tunnels is

play07:37

it a basis for robotics andus is there

play07:39

is it something else I'm just sort of

play07:40

curious where are some of these other

play07:42

futured forward applications that could

play07:44

emerge yeah I I totally think that

play07:46

carrying out simulations in the video

play07:48

model is is something that we're going

play07:50

to be able to do um in the future at

play07:52

some point um Bill actually has a lot of

play07:54

thoughts about uh this sort of thing so

play07:56

maybe you can yeah I mean I think you

play07:58

hit the nail on the head

play08:00

applications like robotics um you know

play08:02

there's so much you learn from video

play08:04

which you don't necessarily get from

play08:05

other modalities which companies like

play08:07

open ey have invested a lot in the past

play08:09

like language you know like the minutia

play08:11

of like how arms and Joints move through

play08:13

space you know again getting back to

play08:14

that scene in Tokyo how those legs are

play08:16

moving and how they're making contact

play08:17

with the ground in a physically accurate

play08:19

way so you learned so much about the

play08:21

physical world uh just from training on

play08:23

raw video that we really believe that

play08:24

it's going to be essential for uh things

play08:26

like physical embodiment moving forward

play08:29

and talking more about uh the model

play08:32

itself there are a bunch of really

play08:33

interesting Innovations here right so

play08:36

not to put you on the spot Tim but can

play08:37

you uh describe for a broad technical

play08:40

audience what a diffusion Transformer is

play08:42

totally so Sora Builds on Research from

play08:46

both the dolly models and the GPT models

play08:49

at openingi and diffusion is a process

play08:53

that creates uh data in our case videos

play08:57

by starting from noise and iteratively

play08:59

removing noise many times until

play09:01

eventually you've removed so much noise

play09:03

that it just creates a sample and so

play09:05

that is our process for generating the

play09:07

videos we start from a video of noise

play09:09

and we remove it incrementally but then

play09:12

architecturally it's really important

play09:14

that our models are scalable and that

play09:16

they can learn from a lot of data and

play09:18

learn these really complex and

play09:20

challenging relationships and videos and

play09:22

so we use an architecture that is

play09:24

similar to the GPT models and that's

play09:27

called a Transformer and so diffus

play09:29

Transformers combining these two

play09:31

concepts and the Transformer

play09:34

architecture allows us to scale these

play09:36

models and as we put more compute and

play09:38

more data into training them they get

play09:40

better and better and we even released a

play09:44

technical report on Sora and we show the

play09:46

results that you get from the same

play09:48

prompt when you use a smaller amount of

play09:50

compute an intermediate amount of

play09:52

compute and more compute and by using

play09:54

this method as you use more and more

play09:56

compute the results get better and

play09:57

better and we strongly this trend will

play10:00

continue so that by using this really

play10:02

simple methodology we'll be able to

play10:04

continue improving these models by

play10:06

adding more compute adding more data and

play10:08

they will be able to do all these

play10:10

amazing things we've been talking about

play10:11

having better simulation in longer term

play10:15

Generations Bill uh can we characterize

play10:18

at all what the scaling laws for this

play10:19

type of model look like yet good

play10:21

question so as Tim alluded to you know

play10:25

one of the benefits of using

play10:26

Transformers is that you inherit all of

play10:28

their great properties that we've seen

play10:29

in other domains like language um so you

play10:31

absolutely can begin to come up with

play10:33

scaling laws for video as opposed to

play10:35

language and this is something that you

play10:37

know we're actively looking at in our

play10:39

team and you know not only constructing

play10:40

them but figuring out ways to make them

play10:42

better so you know if I use the same

play10:44

amount of training compute can I get an

play10:46

even better loss uh without

play10:47

fundamentally increasing the amount of

play10:48

compute needed so these are a lot of the

play10:50

questions that we tackle day-to-day on

play10:52

the research team to make Sora and

play10:53

future models as good as

play10:55

possible one of the like questions about

play10:58

applying you know trans forers in this

play10:59

domain is um like tokenization right uh

play11:03

and so by the way I don't know who came

play11:04

up with this name but like late in

play11:06

SpaceTime patches is like a great sci-fi

play11:08

name here can you explain like what that

play11:11

is and like why why it is relevant here

play11:13

because you know the ability to do

play11:15

minute long Generation Um and get to uh

play11:19

like Visual and temporal coherence is

play11:22

really amazing I don't think we came up

play11:24

with it like as a name so much as like a

play11:27

descriptive thing of exactly what like

play11:28

that's what we call yeah even better

play11:31

though so one of the critical successes

play11:33

for the llm Paradigm has been this

play11:35

notion of tokens so if you look at the

play11:37

internet there's all kinds of Text data

play11:39

on it there's books there's code there's

play11:41

math and what's beautiful about language

play11:43

models is that they have this singular

play11:45

notion of a token which enables them to

play11:47

be trained on this vast swath of like

play11:49

very diverse data there's really no

play11:52

analog for prior visual generative

play11:54

models so you know what was very

play11:55

standard in the past before Sora is that

play11:57

you would train say an image generative

play11:59

model or a video generative model on

play12:01

just like 256x 256 resolution images or

play12:04

256 x 256 video that's exactly like 4

play12:07

seconds long and this is very limiting

play12:09

because it limits the types of data you

play12:11

can use you have to throw away so much

play12:13

of you know uh the visual data that

play12:15

exists on the internet and that limits

play12:17

like the generalist capabilities of the

play12:18

model so with Sora we introduced this

play12:20

notion of SpaceTime patches where you

play12:23

can essentially just represent data

play12:25

however it exists in an image and a

play12:27

really long video and like a a tall

play12:28

vertical video by just taking out cubes

play12:31

so you can essentially imagine right a

play12:32

video is just like a stack a vertical

play12:33

stack of uh individual images and so you

play12:36

can just take these like 3D cubes out of

play12:38

it and that is our notion of a token

play12:40

when we ultimately feed it into the

play12:41

Transformer and the result of this is

play12:43

that Sora you know can do a lot more

play12:45

than just generate say like 720p video

play12:48

um at for some like fixed duration right

play12:50

you can generate vertical videos

play12:51

widescreen videos you can do anything uh

play12:53

between like 1 to2 aspect ratio to 2:

play12:55

one it can generate images it's an image

play12:57

generation model and so this is really

play12:59

the first generative model of visual

play13:01

content uh that has breadth in a way

play13:04

that language models have breadth so

play13:06

that was really why we pursued this

play13:08

direction I feels just as important on

play13:09

the like input and training side right

play13:12

in in terms of being able to take in

play13:13

different types of video absolutely and

play13:15

so a huge part of this project uh was

play13:18

really developing the infrastructure and

play13:21

systems needed to be able to work with

play13:22

this vast data um in a way that hasn't

play13:25

been needed for previous image or video

play13:27

generation systems a lot of the models

play13:30

before Sora that were working on video

play13:33

were really looking at extending image

play13:34

generation models and so there was a lot

play13:37

of great work on image generation and

play13:41

what many people have been doing is

play13:43

taking an image generator and extending

play13:45

it a bit instead of doing one image you

play13:47

can do a few seconds but what was really

play13:49

important for Sora and was really this

play13:52

difference in architecture was instead

play13:54

of starting from an image generator and

play13:56

trying to add on video we started from

play13:59

scratch and we started with the question

play14:01

of how are we going to do a minute of HD

play14:04

footage and that was our goal and when

play14:06

you have that goal we knew that we

play14:08

couldn't just extend an image generator

play14:11

we knew that in order to am of HD

play14:13

footage we needed something that was

play14:14

scalable that broke down data into a

play14:16

really simple way so that we could use

play14:18

scalable models so I think that really

play14:20

was the architectural Evolution from

play14:23

image generators to what led us to Sora

play14:25

that's a really interesting framework

play14:26

because it feels like it could be

play14:27

applied to all sorts of other areas

play14:28

where people aren't currently applying

play14:30

end to end deep learning yeah I think

play14:33

that's right and it it makes sense

play14:34

because in the shortest term right we

play14:36

weren't the first to come out with a

play14:37

video generator a a lot of people and

play14:39

and a lot of people have done impressive

play14:41

work on video generation but we were

play14:43

like okay we'd rather pick a point

play14:46

further in the future and just you know

play14:49

work for a year on that um and there is

play14:53

this pressure to do things fast because

play14:55

AI is so fast and the fastest thing to

play14:57

do is oh let's take what's wor working

play14:59

now and let's kind of like add on

play15:00

something to it and that probably is as

play15:03

you're saying more General than just

play15:04

image to video but other things but

play15:06

sometimes it takes taking a step back

play15:08

and saying like what what will the

play15:11

solution to this look like in three

play15:12

years let's start building that MH yeah

play15:14

it seems like a very similar transition

play15:15

happened in self-driving recently where

play15:17

where people went from bespoke Edge case

play15:20

sort of predictions and heuristics and

play15:21

all bit of DL to like end to end deep

play15:23

learning yeah in some of the new models

play15:24

so it's it's very exciting to see it

play15:26

applied to video one of the Striking

play15:27

things about Sora is just the visual

play15:29

aesthetic of it and I'm a little bit

play15:31

curious how did you go about either uh

play15:33

tuning or crafting that aesthetic

play15:36

because I know that in some of the more

play15:38

traditional um image gen models uh you

play15:41

both have feedback that helps impact

play15:44

evolution of aesthetic over time but in

play15:45

some cases people are literally tuning

play15:47

the models and so I'm a little bit

play15:48

curious how you thought about it in the

play15:49

context of Sora yeah well to be honest

play15:52

we didn't spend a ton of effort on it

play15:54

for Sora the world is just beautiful

play15:56

yeah oh this is a great answer yeah

play15:59

I I think that's maybe the honest answer

play16:01

to most of it I think sora's language

play16:04

understanding definitely allows the user

play16:05

to steer it uh in a way that would be

play16:08

more difficult with like other models so

play16:10

you can provide a lot of like hints and

play16:11

visual cues that will sort of steer the

play16:13

model toward the type of generations

play16:15

that you want but it's not like the Adia

play16:17

aesthetic is like deeply embedded yeah

play16:20

not yet um but I think moving to the

play16:22

future you know I I Feel Like the Model

play16:25

is kind of empowering people to sort of

play16:28

um uh get it to grock your personal

play16:30

sense of aesthetic is going to be

play16:32

something that uh a lot of people will

play16:34

look forward to uh many of the artists

play16:36

and creators that we talked to they'd

play16:38

love to just like upload their whole

play16:40

portfolio of assets to the model and be

play16:42

able to draw up on like a large body of

play16:44

work when they're writing captions and

play16:46

have the model understand like the

play16:48

jargon of their design firm accumulated

play16:50

over many decades and so on um so I

play16:52

think personalization and and uh how

play16:57

that will kind of work together with

play16:59

Aesthetics is going to be a cool thing

play17:01

to explore later on I think to the point

play17:03

um Tim was making about just like a you

play17:05

know new applications Beyond traditional

play17:07

entertainment I work and I travel and

play17:09

have young kids and so I don't know if

play17:12

this is like something to be judged for

play17:13

or not but one of the things I do today

play17:15

is um generate what amount to like short

play17:18

audio books with voice cloning um Dolly

play17:21

images and you know stories in the style

play17:24

of like the Magic Treehouse or whatever

play17:26

in around some topic that either

play17:29

I'm interested in like ah you know hang

play17:31

out with Roman Emperor X right or um

play17:34

something the the girls my kids are

play17:35

interested in but this is

play17:37

computationally expensive and hard and

play17:38

not quite possible but I imagine there's

play17:41

some version of like desktop Pixar for

play17:43

everyone which is like you know I think

play17:45

kids are going to find this first but

play17:46

I'm going to narrate a story and have

play17:48

like magical visuals happen in real time

play17:51

I think that's a very different

play17:52

entertainment Paradigm than we have now

play17:55

totally I mean are we going to get it I

play17:57

yeah I think we're headed there and a

play17:59

different entertainment Paradigm and

play18:01

also a different educational Paradigm

play18:04

and a communication Paradigm

play18:06

entertainment's a big part of that but I

play18:08

think there are actually many potential

play18:10

applications once this really

play18:12

understands our world and so much of our

play18:16

world and how we experience it is Visual

play18:19

and something really cool about these

play18:20

models is that they're starting to

play18:21

better understand our world and what we

play18:24

live in and the things that we do and we

play18:26

can potentially use them to entertain us

play18:29

but also to educate us and like

play18:30

sometimes if I'm trying to learn

play18:33

something the best thing would be if I

play18:34

could get a custom tailored educational

play18:36

video to explain it to me or if I'm

play18:38

trying to communicate something to

play18:39

someone you know maybe the best

play18:41

communication I could do is make a video

play18:43

to explain my point so I think that

play18:46

entertainment but also kind of a much

play18:48

broader set of potential things that

play18:50

video models could be useful for that

play18:52

makes sense I mean that resonates in

play18:54

that I think if you ask people under

play18:55

some certain age cut off that they'd say

play18:58

the the biggest driver of educational

play18:59

world to YouTube today right Better or

play19:01

Worse yeah have you all tried applying

play19:04

this to things like digital avatars I

play19:05

mean there's companies like cesia hijen

play19:07

Etc they're doing interesting things in

play19:09

this area but having a

play19:11

true um uh something that really

play19:13

encapsulates a person in a very deep and

play19:15

Rich way uh seems kind of fascinating as

play19:18

one potential adapt adaptive approach to

play19:20

this I'm just sort of curious if you've

play19:22

tried anything along those lines yet or

play19:25

if if it's not really applicable giving

play19:26

then it's more of like text to video

play19:27

prompts so we haven't we've really

play19:30

focused on just the core technology

play19:32

behind it so far so we haven't focused

play19:34

that much on for that matter particular

play19:37

applications including the idea of

play19:38

avatars which makes a lot of sense and I

play19:40

think that would be very cool to try I

play19:42

think where we are in the trajectory of

play19:44

Sora right now is like this is the gpt1

play19:48

of these this new paradigm of visual

play19:51

models and that we're really looking at

play19:53

the fundamental Research into making

play19:55

these way better making it a way better

play19:57

Engine That Could power all these

play19:58

different things so that's so our focus

play20:00

is just on this fundamental development

play20:02

of the technology right now maybe more

play20:04

so than specific Downstream yeah one of

play20:07

the reasons I ask about the Avatar stuff

play20:09

as well is it starts to open questions

play20:10

around safety and so I was a little bit

play20:12

curious you know how you all thought

play20:13

about um safety in the context of video

play20:15

models and the potential to De fakes or

play20:17

spoofs or things like that yeah I can

play20:19

speak a little bit to that it's

play20:21

definitely a pretty complex topic I

play20:23

think a lot of the safety mitigations

play20:25

could probably be ported over from Dolly

play20:27

3 um for example the way we handle like

play20:30

Gracie images or gory images things like

play20:32

that um there's definitely going to be

play20:34

new safety issues to worry about for

play20:37

example

play20:38

misinformation um or for example like do

play20:41

we allow users to generate images that

play20:43

have offensive words on them and I think

play20:46

one key thing to figure out here is like

play20:48

how much responsibility uh do the

play20:50

companies deploying this technology bear

play20:53

uh how much should social media

play20:55

companies do for example to inform users

play20:58

that content they're seeing uh may not

play21:00

be from a trusted source and how much

play21:03

responsibility does the user bear for

play21:05

you know using this technology to create

play21:07

something in the first place um so I

play21:09

Think It's Tricky and we need to think

play21:11

hard about these issues to sort of uh

play21:14

reach a position that that we think is

play21:17

is going to be best for people that

play21:18

makes sense it's also there's a lot of

play21:19

precedent like people used to use

play21:20

Photoshop to manipulate images and then

play21:22

publish them yeah and make claims and

play21:24

it's not like uh people said that

play21:26

therefore the maker of Photoshop is

play21:27

liable for somebody abusing technology

play21:29

so it seems like there's a lot of

play21:31

precedent in terms of how you can think

play21:32

about some of these things as well yeah

play21:33

totally like we want to release

play21:34

something that people feel like they

play21:36

really have the freedom to express

play21:37

themselves and do what they want to do

play21:39

um but at the same time sometimes that's

play21:41

at odds with uh you know doing something

play21:45

that is responsible and sort of

play21:47

gradually um releasing the technology in

play21:49

a way that people can get used to it I

play21:51

guess a question follow you maybe

play21:53

starting with Tim is like and if you can

play21:54

share this great if not understood but

play21:56

uh what is the thing you're most excited

play21:57

about in terms of the your product road

play21:59

map or where you're heading or some of

play22:00

the capabilities that you're working on

play22:01

next yeah um great question I'm really

play22:05

excited about the things that people

play22:06

will create with this I think there are

play22:08

so many brilliant creative people with

play22:11

ideas of things that they want to make

play22:13

and sometimes being able to make that is

play22:15

really hard because it requires

play22:16

resources or tools or things that you

play22:18

don't have access to and there's the

play22:20

potential for this technology to enable

play22:23

so many people with brilliant creative

play22:25

ideas to make things and I'm really

play22:28

excited for what awesome things they're

play22:30

going to make and that this technology

play22:31

will help them make Bill maybe one one

play22:34

question for you would just be if this

play22:35

is um as you just mentioned like the

play22:38

gpt1 uh with have a long way to would go

play22:41

uh this isn't something that the general

play22:43

public has an opportunity experiment

play22:45

with yet can you sort of characterize

play22:46

what the limitations are or the gaps are

play22:49

that you want to work on besides the

play22:50

obvious around like length right yeah so

play22:53

I think in terms of making this

play22:54

something that's more widely available

play22:56

um you know there's a lot of

play22:59

serving kind of considerations that have

play23:00

to go in there so a big one here is

play23:02

making it cheap enough for people to use

play23:05

so we've said you know in the past that

play23:08

in terms of generating videos it it

play23:09

depends a lot on the exact parameters of

play23:11

you know like the resolution and the

play23:13

duration of the video You're creating uh

play23:15

but you know it's not instant and you

play23:16

have to wait at least like a few minutes

play23:18

uh for like these really long videos

play23:20

that we're generating and so we're

play23:22

actively working on threads here to make

play23:24

that cheaper in order to democratize

play23:26

this uh more broadly uh I think there's

play23:29

a lot of considerations as a DN we're

play23:31

alluding to on the safety side as well

play23:33

um so in order for this to really become

play23:35

more broadly accessible we need to you

play23:37

know make sure that especially in an

play23:39

election year we're being really careful

play23:40

with the potential for misinformation

play23:42

and any surrounding risks we're actively

play23:44

working on addressing these threads

play23:45

today that's a big part of our research

play23:47

road map what about just core um like uh

play23:50

for lacking about a term like quality

play23:51

issues yeah Are there specific things

play23:54

like if it's object permanence or

play23:55

certain types of interactions you're

play23:57

thinking through yeah so as we look you

play23:59

know forward to you know like the gpt2

play24:01

or gpt3 moment uh I think we're really

play24:03

excited for very complex long-term

play24:06

physical interactions to become uh much

play24:08

more accurate so to give a concrete

play24:10

example of where Sora falls short today

play24:11

you know if I have a video of someone

play24:13

like playing socer and they're kicking

play24:15

around a ball at some point you know

play24:16

that Ball's probably going to like

play24:17

vaporize and maybe come back um so it

play24:19

can do certain kinds of simpler

play24:21

interactions pretty reliably you know

play24:23

things like people walking for example

play24:25

um but these types of more detailed

play24:27

object toob interactions are definitely

play24:30

uh you know still a feature that's in

play24:31

the oven and we think it's going to get

play24:32

a lot better with scale but that's

play24:34

something to look forward to moving

play24:35

forward there's one sample that I think

play24:37

is like a glimpse of the few I mean sure

play24:39

there many but there's one I've seen uh

play24:42

which is um you know a man taking a bite

play24:44

of a burger and the bite being in the

play24:46

burger in terms of like keeping steak

play24:48

which is very cool yeah we are really

play24:50

excited about that one also there's

play24:51

another one where uh it's like a woman

play24:53

like painting with watercolors on a

play24:54

canvas and it actually leaves a trail so

play24:57

there's like glimmers of you know kind

play24:58

of capability in the current model as

play24:59

you said and we think it's going to get

play25:01

much better in the future is there

play25:03

anything you can say about how um the

play25:05

work you've done with Sora uh sort of

play25:08

affects the broader research road map

play25:10

yeah so I think something here is

play25:13

about the knowledge that Sora ends up

play25:16

learning about the world just from

play25:17

seeing all this visual data it

play25:19

understands 3D which is one cool thing

play25:22

because we haven't trained it to we

play25:24

didn't explicitly bake 3D information

play25:26

into it whatsoever we just trained it on

play25:29

video data and it learned about 3D

play25:31

because 3D exists in those videos and it

play25:33

learned that when you take a bite out of

play25:34

a hamburger that you leave a bite mark

play25:36

so it's learning so much about our world

play25:40

and when we interact with the world so

play25:43

much of it is visual so much of what we

play25:45

see and learn throughout our lives is

play25:47

visual information so we really think

play25:49

that just in terms of intelligence in

play25:52

terms of leading toward AI models that

play25:55

are more intelligent that better

play25:57

understand the world like we do this

play25:58

will actually be really important for

play25:59

them to have this grounding of like hey

play26:01

this is the world that we live in

play26:03

there's so much complexity in it there's

play26:05

so much about how people interact how

play26:08

things happen how events in the past end

play26:11

up impacting events in the future that

play26:13

this will actually lead to just much

play26:15

more intelligent AI models more broadly

play26:18

than even generating videos it's almost

play26:20

like you invented like the future visual

play26:21

cortex plus some part of the uh

play26:24

reasoning parts of the brain or

play26:26

something sort of simultaneously yeah

play26:28

and that's a cool comparison because a

play26:31

lot of the intelligence that humans have

play26:32

is actually about world modeling right

play26:34

all the time when we're thinking about

play26:37

how we're going to do things we're

play26:38

playing out scenarios in our head we

play26:40

have dreams where we're playing out

play26:41

scenarios in a head we're thinking in

play26:43

advance of doing things if I did this

play26:45

this thing would happen if I did this

play26:46

other thing what would happen right so

play26:47

we have a world model and building Sora

play26:51

as a world model is very similar to a

play26:54

big part of the intelligence that humans

play26:56

have um how do you guys think about the

play26:59

uh sort of analogy to humans as having a

play27:02

very approximate World model versus

play27:04

something that is um as accurate as like

play27:06

let's say a uh a physics engine in the

play27:09

traditional sense right because if I you

play27:12

know hold an apple and I drop it I

play27:13

expect to fall at a certain rate but

play27:15

most humans do not think of that as

play27:16

articulating a path with a speed as a

play27:19

calculation um do you think that sort of

play27:21

learning is like parallel in um large

play27:24

models I think it's a a really

play27:26

interesting observation

play27:29

I think how we think about things is

play27:30

that it's almost like a deficiency you

play27:32

know in humans that it's not so high

play27:33

fidelity so you know the fact that we

play27:36

actually can't do very accurate

play27:37

long-term prediction when you get down

play27:39

to a really narrow set of physics um

play27:43

it's something that we can improve upon

play27:44

with some of these systems and so we're

play27:46

optimistic that Sora will you know

play27:48

supersede that kind of capability and

play27:50

will you know in the long run enable it

play27:52

to be more intelligent one day than

play27:54

humans as World models um but it is you

play27:57

know certainly existence proof that it's

play27:59

not necessary for other types of

play28:01

intelligence regardless of that it's

play28:03

still something that Sora and and models

play28:06

in the future will be able to improve

play28:07

upon okay so it's very clear that the

play28:09

trajectory prediction for like throwing

play28:11

a football is going to be better than

play28:13

the next next versions of these models

play28:15

and minus let's say if if I could add

play28:18

something to that this relates to the

play28:21

Paradigm of scale and uh the bitter

play28:24

lesson a bit about how we want methods

play28:26

that as you increase compute get get

play28:28

better and better and something that

play28:29

works really well in this Paradigm is

play28:32

doing the simple but challenging task of

play28:36

just predicting data and you can try

play28:40

coming up with more complicated tasks

play28:42

for example something that doesn't use

play28:44

video explicitly but is maybe in some

play28:46

like space that simulates approximate

play28:48

things or something but all this

play28:50

complexity actually isn't beneficial

play28:53

when it comes to the scaling laws of how

play28:55

methods improve as you increase scale

play28:57

and what works really well as you

play28:58

increase scale is just predict data and

play29:01

that's what we do with text we just

play29:03

predict text and that's exactly what

play29:05

we're doing with visual data with Sora

play29:07

which is we're not making some

play29:09

complicated trying to figure out some

play29:11

new thing to optimize we're saying hey

play29:13

the best way to learn intelligence in a

play29:15

scalable manner yeah is to just predict

play29:18

data that makes sense in relating to

play29:19

what you said Bill like predictions will

play29:21

just get much better with no necessary

play29:23

limit that approximates that's right

play29:25

humans right yeah is there is there

play29:27

anything uh you feel like the general

play29:30

public misunderstands about video models

play29:32

or about Sora or you want them to

play29:34

know I think

play29:36

maybe the biggest update to people with

play29:38

the release of Sora is that internally

play29:41

we've always made an analogy as Bill and

play29:43

Tim said between Sora and GPT models in

play29:47

that um you know when gpt1 and gpt2 came

play29:51

out it started to become increasingly

play29:52

clear um to some people that simply

play29:55

scaling up these models would give them

play29:56

amazing capabilities

play29:58

and it wasn't clear right away if like

play30:00

oh we scaling up next token prediction

play30:03

result in a language model that's

play30:04

helpful for writing code um to us like

play30:07

it's felt pretty clear that applying the

play30:10

same methodology to video models is also

play30:12

going to result in really amazing

play30:14

capabilities um and I think Sora 1 is

play30:17

kind of an existence proof that there's

play30:19

one point on the scaling curve now and

play30:20

we're very excited for what this is

play30:22

going to lead to yeah amazing well I I

play30:25

don't know why it's such a surprise to

play30:26

everybody but there less and wins again

play30:28

yeah yeah I would just say that as both

play30:31

Tim and Edo were alluding to we really

play30:33

do feel like this is the gpt1 moment and

play30:35

these models are going to get a lot

play30:37

better very quickly and we're really

play30:40

excited both for the incredible benefits

play30:42

we think this is going to bring to the

play30:43

creative world what the implications are

play30:45

long-term for AGI um and at the same

play30:48

time we're trying to be very mindful

play30:49

about the safety considerations and

play30:50

building a robust stack now to to make

play30:52

sure that society's actually going to

play30:54

get the benefits of this with while

play30:55

mitigating the downsides uh but it's

play30:57

exciting times and we're looking forward

play30:59

to what future models are going to be

play31:00

capable of yeah congrats on such an

play31:02

amazing amazing

play31:04

release find us on Twitter at no priors

play31:07

pod subscribe to our YouTube channel if

play31:09

you want to see our faces follow the

play31:11

show on Apple podcasts Spotify or

play31:13

wherever you listen that way you get a

play31:15

new episode every week and sign up for

play31:17

emails or find transcripts for every

play31:18

episode at no- pri.com

Rate This

5.0 / 5 (0 votes)

Related Tags
オープンAISoraビデオモデルテキストプロンプトAGI創造性インタビューAI技術ワールドシミュレータースケーラブルトランスフォーマー
Do you need a summary in English?