WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...

Machine Learning Street Talk
1 Apr 2024109:10

Summary

TLDR本次对话深入探讨了深度学习和人工智能领域中的抽象概念,特别是如何利用范畴论来改进神经网络的结构和理解。嘉宾们讨论了从具体的算法实现到抽象的数学结构的转变,以及这种转变如何帮助我们更好地推理和理解复杂系统。范畴论提供了一种强大的工具,使我们能够超越具体的编程语言和计算范式,进入一个更高层次的思考和理论构建。

Takeaways

  • 🤖 讨论了如何通过抽象和理论框架来提高深度学习模型的可解释性和效率。
  • 🧠 强调了神经网络与数据和算法之间的关系,以及如何通过类别理论来建立这种联系。
  • 🔍 探讨了如何使用类别理论来设计特定领域的编程语言,以改善机器学习系统的构建和理解。
  • 📈 讨论了如何通过类别理论来处理神经网络中的数据表示和算法结构。
  • 🔧 强调了抽象的重要性,以及如何通过抽象来简化和理解复杂系统。
  • 🧬 通过类比生物进化中的“canalization”来解释抽象的概念和其在数学和深度学习中的应用。
  • 🌐 讨论了如何将复杂的计算问题转化为更易于管理和推理的抽象表示。
  • 📊 强调了类别理论在解决数学问题中的实用性,尤其是在处理结构和关系方面。
  • 🔑 讨论了如何使用类别理论来识别和利用数学结构中的模式和对偶性。
  • 🎯 强调了目标导向和基于代理的智能之间的区别,以及如何通过类别理论来理解和构建智能系统。
  • 💡 通过类别理论的视角,探讨了如何构建更高效、更可解释的神经网络架构。

Q & A

  • 乔治离开特斯拉的原因是什么?

    -乔治离开特斯拉是因为他不认同特斯拉提出的解决自动驾驶问题的方法,他认为特斯拉试图通过更多的数据和计算来解决所谓的'旧金山问题',并不是正确的方法。

  • 什么是'旧金山问题'?

    -'旧金山问题'是指特斯拉自动驾驶系统在旧金山特定的复杂交通环境中的表现问题,需要特别的解决方案来优化。

  • 乔治对于自动驾驶系统的看法有何不同?

    -乔治认为特斯拉的方法,即通过大量数据和计算来训练神经网络,类似于试图记忆无限的东西,他认为这并不是一个可行的解决方案。

  • 特斯拉是如何尝试学习世界的?

    -特斯拉通过大量的训练数据,尝试构建一个能够记住世界上所有可能的感知状态,并在所有情况下产生正确感知信号的函数。

  • 什么是'Operation Vacation'?

    -'Operation Vacation'是特斯拉内部的一个项目,目的是让自动驾驶系统的功能收敛并记住无限,也就是达到一个所谓的奇点时刻。

  • 保罗对于神经网络的看法是什么?

    -保罗认为神经网络是有限的机器,它们在有限的计算和内存约束下运行。虽然理论上可能构建一个能够记忆无限的神经网络,但在特斯拉车辆的计算限制下,这是不可能实现的。

  • 保罗加入的公司的主要目标是什么?

    -保罗加入的公司的主要目标是进行研究导向的创业,证明可以在没有规模扩大的情况下做很多事情,挑战传统的以规模为基础的解决方案。

  • 什么是几何深度学习(GDL)?

    -几何深度学习(GDL)是一种深度学习方法,它试图超越传统的深度学习模型,通过考虑数据的几何结构来提高学习效率和性能。

  • 什么是范畴论(Category Theory)?

    -范畴论是一种数学理论,它研究不同数学结构之间的共同性和映射关系。范畴论提供了一种抽象的语言和工具,可以用来描述和理解复杂的数学对象和它们之间的关系。

  • 为什么范畴论对于计算机科学和软件工程很重要?

    -范畴论为计算机科学和软件工程提供了一种强大的抽象工具,可以帮助理解和设计复杂的系统。通过范畴论,我们可以更好地理解数据类型、函数组合和算法结构,从而提高软件的可靠性和效率。

Outlines

00:00

🤖 人工智能与自动驾驶的思考

讨论了人工智能在自动驾驶领域的应用,特别是特斯拉的自动驾驶系统。提到了乔治对于特斯拉处理旧金山自动驾驶问题的方法的质疑,他认为特斯拉的方法是通过大量数据和计算来让系统记住世界,而不是找到正确的解决方案。乔治认为这种方法不可行,并提出了反对规模化的观点。

05:03

📈 深度学习和几何深度学习的局限性

探讨了深度学习,特别是几何深度学习的局限性。讨论了DeepMind团队试图通过放弃可逆性和可组合性的限制来推广几何深度学习。同时,讨论了神经网络作为对象和参数化映射的实例,以及如何将它们与代数结构和范畴理论联系起来。

10:07

🔢 类型理论与范畴理论的结合

讨论了类型理论与范畴理论的结合,以及如何使用这些理论来更好地理解软件和算法。提出了通过范畴理论来设计特定领域的编程语言,以便更好地理解和推理复杂系统。同时,讨论了如何使用范畴理论来解决软件工程中的分支和类型问题。

15:08

💡 类别理论对计算机科学的影响

探讨了范畴理论对计算机科学和软件工程的影响。讨论了范畴理论如何帮助我们设计更好的编程语言和类型系统,以及如何通过抽象和理解复杂结构来提高软件工程的效率。同时,提出了通过范畴理论来解决深度学习和量子计算中的一些挑战。

20:09

🌐 构建机器学习领域的特定语言

讨论了如何使用范畴理论来构建机器学习领域的特定语言。提出了通过类型驱动的机器学习和深度学习来提高模型的可解释性和效率。同时,讨论了如何通过形式化的方法来理解和验证神经网络的行为。

25:12

🧠 神经网络与有限状态自动机的比较

讨论了神经网络与有限状态自动机(FSA)和图灵机的比较。指出了神经网络在处理递归和记忆方面的能力限制,并提出了通过范畴理论来设计能够进行更高级别计算的神经网络架构。同时,讨论了如何通过结构化的方法来解决软件工程中的复杂性问题。

Mindmap

Keywords

💡神经网络

神经网络是一种模仿人脑神经元结构的计算模型,用于识别模式和处理复杂的数据。在视频中,讨论了神经网络在自动驾驶和深度学习中的应用,以及如何通过更高效的算法和结构来提升其性能。

💡深度学习

深度学习是机器学习的一个分支,通过构建深层的神经网络来学习数据的高层次特征。视频中提到深度学习在处理大量数据和复杂问题时的潜力,以及如何通过创新的方法来提高其效率和效果。

💡自动驾驶

自动驾驶是指利用各种传感器和先进的计算机系统来控制车辆的行驶,无需人工操作。视频中讨论了自动驾驶技术面临的挑战,特别是如何处理旧金山等复杂城市环境中的特殊情况。

💡风险投资

风险投资是指投资者向初创企业或具有高增长潜力的项目提供资金支持,以期在未来获得较高的回报。视频中提到了乔治为了证明自己的理论和方法,寻找风险投资来支持他的研究和公司发展。

💡计算约束

计算约束指的是在有限的计算资源下进行问题求解的限制条件。在视频中,提到了特斯拉的神经网络模型在计算约束下可能无法实现其预期的功能,需要寻找新的方法来优化和改进。

💡函数

在数学和计算机科学中,函数是指一个将一个或多个输入映射到特定输出的数学关系或计算过程。视频中提到了函数在神经网络中的应用,尤其是在处理数据和构建模型时的重要性。

💡数据标签

数据标签是指与原始数据一起提供的信息,用于指导机器学习模型正确理解和处理数据。在视频中,提到了特斯拉通过收集数据并为其添加标签来训练神经网络,以便模型能够更好地学习和预测。

💡类别理论

类别理论是数学中的一个抽象领域,它研究不同数学结构之间的关系和映射。视频中提到类别理论在设计神经网络架构和理解复杂计算过程中的潜在应用。

💡对称性

对称性是指一个对象或系统在某种变换下保持不变的性质。在视频中,对称性被用来描述和分析神经网络的结构和行为,尤其是在处理几何深度学习(GDL)时的重要性。

💡泛型

泛型是一种编程和数学中的概念,指的是不依赖于具体类型或值的通用定义或结构。视频中提到了泛型在构建神经网络和深度学习模型中的应用,特别是在提高代码复用性和灵活性方面的作用。

Highlights

讨论了深度学习和神经网络的结构化方法,以及如何利用范畴论来改进和理解这些结构。

提到了如何通过范畴论来构建特定领域的语言,以便更好地理解和设计机器学习模型。

探讨了神经网络的局限性,特别是在处理递归和抽象方面的能力。

讨论了如何通过抽象和范畴论来简化和理解复杂的数学结构和算法。

强调了抽象的重要性,以及如何通过抽象来提高问题解决的效率和清晰度。

提到了如何利用范畴论来设计下一代神经网络软件,从而提高软件工程的可解释性和可靠性。

探讨了如何通过范畴论来理解和改进深度学习中的几何先验。

讨论了如何将范畴论应用于神经网络的训练过程中,以提高模型的泛化能力和性能。

强调了在神经网络中实现类似于图灵机的计算能力的可能性和挑战。

提到了如何通过范畴论来构建具有更高级结构和操作的神经网络架构。

探讨了如何利用范畴论来改进和加速机器学习领域的研究和发展。

讨论了如何通过范畴论来理解和解决机器学习中的算法对齐问题。

强调了在设计神经网络时考虑结构和语义的重要性,以及如何通过范畴论来实现这一点。

提到了如何利用范畴论来创建可以解释和验证的神经网络模型,从而提高软件工程的质量。

探讨了如何通过范畴论来理解和改进深度学习中的模块化和组合性。

讨论了如何利用范畴论来设计能够处理无限数据集和复杂任务的神经网络。

Transcripts

play00:00

So, Paul, we're, um. Where did you come from?

play00:02

Are you are you nearby? Tim. Uh, for the last two weeks,

play00:06

I've been in London. Oh, yeah? Tell Keith the story.

play00:10

So, um, they raised some seed capital about 18 months ago,

play00:14

and then they set a date in February to do so. Yeah.

play00:17

So what the deal is, I think sometime in 2022, George and

play00:21

then co-founder and his co-founder, Taylor, they were, you know,

play00:26

George quit Tesla because he thought that the way that they were proposing

play00:31

to fix autopilot, in particular, to fix the

play00:36

"San Francisco problem" for autopilot, was just way more data for San

play00:41

Francisco specifically and just, you know, throwing compute at the

play00:45

wall. And so it was like this. So George's thought was like,

play00:47

this can't be the right way to do it. So he was like,

play00:50

I'm not exactly sure how to do this, but I want to take a bet.

play00:53

I want to take a position against scale.

play00:56

It kind of feels like they're trying to memorize infinity. That's correct.

play00:59

So, um, so would you say that your experience there informed

play01:03

some of this skepticism? Yeah. I mean, I think if you look at the AI

play01:07

day presentations that effectively go into deep technical detail about how

play01:12

Tesla learns the world or Andre's talk at NeurIPS two years ago.

play01:18

Um, you know, Tesla very much is trying to come up with a function

play01:21

that memorizes the world, you know, basically take all of

play01:26

the possible perceptual states that exist in the realm of

play01:30

driving and produce the correct perceptual signals that result

play01:34

from that under all circumstances. And they're pouring massive amounts

play01:38

of training data and making that happen, running, you know, tests in

play01:42

shadow mode to collect triggers that then feed more labels back to the

play01:47

labeling team to run that data loop. And, you know, Operation Vacation,

play01:52

which has been publicly discussed, is effectively that singularity

play01:55

moment where the function converges and has memorized infinity.

play01:59

And, you know, as you know, I've kind of talked to you about so far, I do

play02:03

not believe that that's possible. You know, obviously,

play02:06

neural networks are finite machines. They operate under finite compute

play02:09

constraints with finite memory. It may be possible to build a neural

play02:12

network that could indeed memorize infinity, given an infinite number

play02:16

of layers and infinite number of, you know, uh, fully connected

play02:20

neurons, um, infinite compute and memory to execute that model.

play02:25

But given the compute constraints in a Tesla vehicle, uh, I don't

play02:29

think that's possible to achieve. And so then he shopped around for

play02:33

a year or so to try and find a venture capitalist to fund a sort

play02:37

of research oriented venture to, you know, make a case for you can

play02:42

do a lot of things without scale. And then I came on to the

play02:45

company about six months ago, and within about a month and a half,

play02:48

we were like, okay, we don't know what we'll be doing in six months,

play02:51

but we need to be selling it to to the investors in six months.

play02:55

And so we finished the paper on February 1st or really February 2nd

play03:01

for the North American time zone, because it's, you know,

play03:04

a February 1st anywhere on Earth kind of thing.

play03:06

And then a week and a half later, we were meeting with, uh,

play03:12

the venture capitalists to get money to do this whole thing.

play03:15

Yeah, quite a whirlwind. I mean, yes, I'm.

play03:19

I'm just thankful that, uh, you know, George created this,

play03:23

and it exists because. Because I think, um, I take the

play03:27

same line conceptually, which is the answer isn't just, you know,

play03:31

more of the same or the same. You know, we need to, uh, we need to

play03:35

go back to some cleverness and, uh, and invent some new things and, uh,

play03:41

to really, to really make, you know, order of magnitude improvements.

play03:45

You should see this setup. We've got you on a big 55 inch

play03:48

plasma screen. So it's as if you're in the room

play03:50

with us. I'm sorry about that. Like, you know, that's too much.

play03:54

Dagger is what you're saying. I would have shaved or, like,

play03:58

you know, um, like, brush my hair or something.

play04:01

Well, you did, you did pat it down in the back. The headset. Yeah.

play04:05

It's like this headset hair you get, you know. Oh, yeah, a headset.

play04:08

Uh, so I'm actually going. I'm worried about that.

play04:11

Like, what's the compression going to do to my ear? Cartilage.

play04:15

Like over time, all these unknown, unknown cybernetic effects.

play04:20

A very soft version of cauliflower ear for podcasters. Right? Right.

play04:25

I guess something you could be proud of, right?

play04:27

Just like the MMA guys are proud of the cauliflower ear.

play04:31

I've had this headset on for so many hours.

play04:35

10,000 hours of Skyrim alone, I have achieved true mastery.

play04:42

Yes. Okay, well we're in. We are, we're in.

play04:46

So, um, where do we start? I mean, maybe, um, Paul,

play04:51

because Keith hasn't watched the recording of the other day.

play04:53

So maybe if you just frame up what's going on here.

play04:56

And in particular, I think we want to have the Turing

play04:58

machine discussion, um, with you, with, with with Keith on the call.

play05:02

Um, so, yeah, I mean, just just frame up the, the the, you know,

play05:06

what's this all about? Okay. What the hell is this all about?

play05:09

Um, depends on which side of the collaboration that results in

play05:16

the paper you are on. If you come from the DeepMind side

play05:20

on this, you are looking for a generation generalization, excuse me,

play05:24

of geometric deep learning, right. There are some obvious limitations

play05:28

to geometric deep learning, which is that the kind of

play05:32

transformations with which you can, or relative to which you can ask for

play05:37

a neural network to be equivariant or invertible and always composable,

play05:43

and so on their side, they're thinking, okay, we need to generalize

play05:47

away from invertibility first, and then generalize away from the

play05:50

presumption that all transformations can be composed, right?

play05:53

Because not all computations can be composed.

play05:55

So if you want to use something like GDL to align neural network

play06:00

architecture to algorithmic challenges, you need to figure out a

play06:04

way such that you can structure. You can align it to these not

play06:09

necessarily invertible, not necessarily composable steps

play06:12

of computation. Right. If you come from it,

play06:15

from where I come from, it, it is oh, I'm interested in

play06:20

neural networks as a particular. Instance of morphisms in a two

play06:28

category of objects, parametric maps and reparametrizations.

play06:32

And I'm interested in the kind of structure that they're interested in.

play06:36

But thinking about it, as you know, algebras for monads or coalgebras for

play06:41

co monads, stuff like this. Right. So this sort of like not I'm not

play06:45

going to say like higher level of abstraction, but much more

play06:47

sort of two categorical flavor, whereas they're really thinking in

play06:50

terms of sort of representation theory and a sort of generalized

play06:53

version of representation theory. Bruno and I, you know,

play06:56

Bruno first. Right? I learned what I know about

play06:58

neural networks through reading Bruno's thesis.

play07:00

And that collaboration has been really instructive.

play07:02

But, you know, we've really been thinking about this as, okay,

play07:06

there's a two categorical story here. We want to do two categorical

play07:10

universal algebra. And this gets even a gets you

play07:13

way more abstract, which makes it more fun to think about.

play07:15

But it also makes a lot of things tractable that are otherwise far

play07:18

too complicated to reason about. I think just just as a very tiny

play07:22

kind of kind of question. I think it'd be good.

play07:25

I think most of our, our listeners can, can definitely understand that

play07:29

some computations are not invertible. Yeah. You know that that that's easy.

play07:34

I mean, you know, you you add two numbers together.

play07:36

It's like you don't know what the original two numbers are.

play07:38

They could be a bunch of possibilities. It's not invertible.

play07:42

Um, but can you give us an example and some intuition behind

play07:46

computations that are not composable? Oh, yeah.

play07:49

So this is this is sort of like what in some sense this is what

play07:51

type is for. Right. Consider some like computation that

play07:55

requires that the input be a tree cannot be at least not naively

play08:00

applied to something that is lists. Right?

play08:03

You need to do a conversion between lists and trees before

play08:06

you can apply a computation that takes in trees to it, right?

play08:10

So that would be an example of a non non composable computations. Right.

play08:14

Stuff where it was like well the types of the input and output

play08:17

simply don't line up. Well. Yeah. And isn't isn't another example.

play08:21

And correct me if I'm wrong, but I think another example is

play08:23

something like like branching is not at least unless you're in a

play08:28

non-deterministic paradigm. Like branching is not composable,

play08:32

right? Like if I have an if statement

play08:33

that kind of says, you know, if it's one thing, do this,

play08:36

if it's something else, do that, then that that poses some issues

play08:39

for composition, for function composition. Right?

play08:42

I wouldn't say that. Uh, the reason I wouldn't say that is

play08:47

because the function if this do this, if this, do this right,

play08:51

I can still compose stuff afterwards. It depends on if my if do if this has

play08:57

a different type, say then if this do that then you might not be able to

play09:02

compose like stuff together naively. But I wouldn't say that

play09:06

conditionals and branching are really what the issue is.

play09:09

It really has to do with input and output types. Right.

play09:14

Well, I think I think the concern there, it is a type one

play09:17

kind of at least from. And let me first state up front.

play09:21

Uh, I'm super interested in category theory. I know very little about it.

play09:26

I only have, you know, I've tried to do some tutorials online,

play09:30

and so a lot of my knowledge is pretty far out of date.

play09:33

But I think like the issue with composability of branching at least

play09:36

like say in the Haskell, you know, kind of functional programing

play09:40

world is that, um, if even if both the results are of type T, the,

play09:46

the outcome of that branch is, is possibly a union of it's, it's,

play09:51

it's an optional or a maybe of those of those types like it may

play09:55

be one value or another value. Yeah. So it kind of creates this.

play09:58

Is that right. Well I mean so but the thing is like

play10:02

maybe something is still just one type. Right. So I really. Right.

play10:06

But it's not t but it's not type T it's a no. Exactly.

play10:09

You have to you would have higher order type. Yeah.

play10:12

Yeah it's a it's another type. Exactly.

play10:14

And so you can't simply plug maybe T into like you can't simply apply

play10:19

something that requires, you know, it takes input of type T to something

play10:23

whose output was maybe t. Right. You need to do like, oh,

play10:27

I have to check is like if is it, is this just t or is this like

play10:31

you know else or whatever? I can't remember what the Haskell

play10:33

syntax is right for the exception. So that's the problem with branches

play10:36

is as soon as you throw in branches into code, you start to you start

play10:40

to change the type into these, you know, composite. Okay.

play10:43

So I'll yeah, in that sense I will I will buy that in that composition.

play10:48

You have to be much more careful about type.

play10:49

So I think it's still a story about input and output types and

play10:53

which things you can compose. But the problem is naive composition.

play10:56

If you start doing branching of things, you realize, oh,

play10:58

I've actually changed what type I'm dealing with. Yeah.

play11:01

And so just just back to my own like personal story a second,

play11:05

because I have a question like I'd like to ask you, which is,

play11:08

uh, a long time ago, you know, I learned about the power of algebra.

play11:13

So when I first took an abstract algebra course, I was like, wow,

play11:17

this is really cool. You know, algebra is so much more

play11:20

right than I thought it was. Um, then sometime later,

play11:23

I found out just the the real beauty of of type theory.

play11:28

And this was all within the computer science perspective.

play11:31

So algebraic data type theory types, you know, I just I fell in love

play11:36

right with types. Yeah. Okay. And so then along comes somehow I

play11:39

hear about category theory okay. And I'm like, oh yeah,

play11:42

this seems to be you know, if I go learn some about category

play11:46

theory, it's going to help me understand software better.

play11:49

And, you know, algebraic type theory and everything.

play11:51

But what I sort of quickly found and again, this could be quite naive.

play11:55

And this is why I want to ask you is that.

play11:57

It seems to me like type theory is, you know, it's it's this higher

play12:01

order thing where, for example, you might ask, you might look for

play12:04

similarities between algebras. You know, like here's the

play12:08

algebra of Boolean algebra, here's the algebra of partial orders.

play12:12

And, and they're actually maybe the same algebra or they have a

play12:15

lot in common. And so there's this beautiful

play12:17

sort of high order. Um, uh, pattern finding and

play12:22

relationships that, that allows you, you might be able to prove

play12:25

something in one algebra and it gives you the tools to map it,

play12:29

like into a different algebra. But I found like it didn't really

play12:32

have a lot of relevance to me just for my everyday software engineering.

play12:36

And so I started to kind of lose, you know, the interest in it,

play12:39

like tell me where I'm going wrong. Like tell me, you know,

play12:43

in what ways category theory can can help someone who's more of a

play12:49

practitioner, a computer scientist, a software engineer, you know,

play12:53

really, um, like, understand better what they're doing or,

play12:58

or apply new insights, you know, to the kind of work that we would do.

play13:02

So unfortunately, I am not the best person to answer that question,

play13:07

but I'll give it a good shot anyway. Right.

play13:10

You know, so I come to this whole story from pure mathematics,

play13:15

and I discover type systems in the context of homotopy type theory,

play13:18

where it's like what there's a programing language that

play13:21

corresponds to space. Okay. Right. So my, my my introduction to

play13:25

this stuff, like I come to like I actually like, you know,

play13:28

way back in high school or whatever I learned scheme and stuff like that.

play13:31

But that was, you know, longer ago than I would like to

play13:33

admit and had that hadn't been, like terribly relevant to my

play13:36

intellectual life in recent years. But, um, so I would say the problem

play13:41

with trying to justify category theory in terms of functional

play13:47

programing alone is that. You've already learned most of the

play13:52

category theory that's going to be directly relevant by doing the

play13:55

functional programing itself, right? The point is, you don't get

play13:59

Haskell without category theory. But once you learned Haskell,

play14:03

you've learned a lot about a specific category, and that's the

play14:06

only one in which you're actually doing all that much work.

play14:09

So the abstractions of category theory don't necessarily give you

play14:14

all of that all that much. Right? That's sort of my naive

play14:17

perspective on like, you know, it's like, oh, category.

play14:20

You know, I remember the first day I came into the symbolic office when we

play14:24

were next door to Khosla Ventures. And I come in and, you know,

play14:27

like one person's there, you know, and they're like, oh,

play14:30

you're the new category theorist. Uh, I hear that's useless.

play14:35

And, you know, my retort is like, you like functional programing,

play14:39

right? And they're like, yeah. And I'm like,

play14:41

that wouldn't exist in its current formulation without category theory.

play14:45

It's like, yeah, but I know it through all of these other means.

play14:47

It's like, yes, you've you've grokked the essence of the category theory

play14:50

that is applied to functional programing most of the time already.

play14:54

I do think that once you get into things like dependent types and sort

play15:00

of more sophisticated type systems, the category theory side,

play15:04

you need more sophistication to make really good sense of it.

play15:07

You can have sort of a pretty good, um, elementary in the sense of like,

play15:11

I think about elements or I think about term's way of thinking about

play15:14

things like dependent type systems. But, um, I found, you know,

play15:19

of course, this is how I came to this stuff.

play15:21

So of course I think this is like the right perspective.

play15:23

But I think that category theory gives you permission to have a

play15:27

sort of more geometric interpretation of this stuff that

play15:30

makes it easier to understand and easier to reason about. Right.

play15:34

You know, one thing is like the real strength of category theory

play15:37

is not like working in any particular category, right?

play15:40

The real strength of category theory is, hey, this gives me this

play15:43

principled language in theory, for abstracting things such that I

play15:46

can remove all the extraneous detail from a particular problem and study

play15:51

just the bare bones of that thing. Get an excellent description, an

play15:54

excellent understanding, and find it in lots of different places. Right.

play15:58

So it's the idea, it's sort of, you know, like I want a good library.

play16:00

I do it once and then I apply it in lots of different contexts.

play16:03

And I'm absolutely not. One of the category theory is,

play16:06

is useless people. I think I'm in the category of I

play16:09

think there's probably a lot of really useful stuff in there.

play16:12

I just either I don't like I definitely don't understand it and

play16:15

or it hasn't been discovered yet. Um, you know, for,

play16:19

for my particular context. But let me give you an example of

play16:21

where, yeah, my intuition tells me there's probably some really useful

play16:26

insights from category theory to be explored. So, um, long ago. Okay.

play16:31

Uh, Alexander Stepanov, a really brilliant, you know, uh,

play16:36

computer scientist, was trying to, to do things within type systems

play16:41

that were just not possible in the current languages.

play16:45

So one thing he wanted to be able to do, for example, was to have a type

play16:49

system rich enough that it could express properties of an algorithm.

play16:53

So maybe you have a bunch of different Sword algorithms in

play16:56

your library. And he wanted the kind of complexity

play16:59

constraints of those Sword algorithms to be part of the type system. Yes.

play17:03

So that somebody writing a generic piece of code, you know,

play17:07

calling a sort algorithm, and there were certain properties

play17:10

required there that the type system could make the best decision,

play17:13

like it could say, well, in this particular case,

play17:16

you should use this Sword algorithm. And it was able to compute all this

play17:20

like from the type system. Yeah. Um, he eventually discovered C plus

play17:25

plus which just by accident. Okay. Just by accident,

play17:28

it had a template metaprogramming capability that wasn't by by design.

play17:34

It just kind of accidentally ended up being a Turing complete, you know,

play17:37

type system or whatever, where he was able to implement some of his ideas

play17:42

with really ugly code, by the way. So I'm not justifying, you know,

play17:46

the quality of it. It just he just was able to and

play17:49

he was able to do, you know, lots of brilliant things which

play17:51

led to like ideas like, you know, traits and categories and their

play17:56

sense like in the kind of, and, you know, algorithms with,

play17:59

with very rich type systems, template metaprogramming.

play18:02

I feel like we've hacked our way to that, like by accident,

play18:06

kind of from the, from the computer science side,

play18:09

from the programing side of things. And the category theory could

play18:13

offer a new lens to view that, um, and provide insights for designing

play18:20

like the next generation of, of languages and, you know,

play18:23

advanced type systems. Um, is there any sense to that

play18:27

or or or I'm off the. No, I think that that's so one of the

play18:33

things that like, I'm not working on this kind of stuff right now,

play18:36

but I have spent a lot of time thinking about and is essentially

play18:43

starting with a particular category and then trying to define a domain

play18:48

specific language for that, or like an internal language of that,

play18:52

such that it becomes easy to reason about the stuff in that

play18:57

category in a way that's perhaps a little bit more intuitive than.

play19:01

Going towards, like the high structuralism of

play19:03

purely category theoretic arguments. Right. An example. So I love that.

play19:07

Let's let's pause here for one second. So I love this idea.

play19:10

So category theory can help one design domain specific languages.

play19:14

Yes. No. This has been like become a huge

play19:16

deal. Like as you know, I only know of

play19:19

it from like the homotopy type theory community and the kinds of

play19:22

projects that that has spawned. But as an instance of this,

play19:26

you know, so there's this paper, I think, by Mike Schulman called

play19:31

a called A Practical Type theory for symmetric monoidal categories.

play19:35

And what he does in this paper is say, okay, there are a bunch of type

play19:40

systems that are interpreted into symmetric monoidal categories, but no

play19:44

one uses them for proving things about symmetric monoidal categories.

play19:48

And he asked the question why? And he essentially, you know,

play19:52

nails down a couple of points that all of the existing type theories

play19:55

that are interpreted into symmetric monoidal categories fail to satisfy.

play19:59

And like the one that I found the most compelling is like they

play20:02

don't have this sets with elements flavor to them. Right.

play20:06

And what's prerequisite for sets with elements.

play20:09

And that feeling is, well, you need tuples of terms to be

play20:12

in tuples of types. And that needs to be symmetric on

play20:15

both sides of judgment. Right. And so he says, okay,

play20:19

I'm going to design a new type system that has that property

play20:22

with my intended semantics in a symmetric monoidal categories.

play20:28

So the idea is you can design a programing like you say,

play20:31

you can like you can have like the semantics that you want this thing to

play20:34

have and then design the programing language to fit that right.

play20:38

And I think this is an incredibly powerful, uh.

play20:42

Powerful trick that I do believe it's, you know, it's taking longer

play20:45

than people thought it would, but that's okay, you know,

play20:48

because it's hard. But the idea is,

play20:51

you know what you can select. Kuhn has this line about, you know,

play20:55

the only reason science works is because you can forget how you

play20:58

got there and just start from new foundations each time. Right?

play21:02

It's kind of like a less snarky analog of the, you know, science

play21:06

proceeds and funerals kind of thing. It's like, you don't want you don't

play21:09

exactly you don't want to have to recapitulate all of mathematical

play21:13

history to work with something. And we're finding out that

play21:16

there's all of these, like, complicated mathematical structures

play21:19

that we really want to reason about, and we don't want to have to

play21:21

build them from scratch. Instead, we would like to start with

play21:24

something like a programing language, or a logic that just deals with all

play21:27

the complexity of that naturally, by forcing you to be careful

play21:30

about syntax. Yeah, right. And this is the kind.

play21:33

And so this is exactly this whole study of like domain specific

play21:36

languages for categories or better in like in math or language.

play21:40

This is like internal language hypotheses or internal language

play21:43

statements like, I want the internal language of this

play21:45

flavor of category or the internal language of this flavor of category.

play21:49

And so maybe like so this is you know, you can ask the question is

play21:53

like, why the hell are we getting to why, why, why now? Right.

play21:58

There's a there's a good deal of why now, right?

play21:59

I've just said it's like, hey, we've just had a successful funding

play22:01

round. We've got $31 million. We're assembling like the world's

play22:05

largest industrial category theory and machine learning team.

play22:08

The immediate response to this like, aren't you the only like, uh,

play22:12

category theory and machine learning team in the world?

play22:14

It's like, no, there's probably like a couple of them.

play22:16

There's like Petar's team at DeepMind, that kind of stuff.

play22:20

So we're not the only people in the world doing this,

play22:22

but we'll probably over the next like six months, become the biggest.

play22:25

And the question is like, how the hell do you get money for this?

play22:28

And like, one of the reasons is, you know, if you look in the history

play22:31

of computation, there have been people banging the drum of like

play22:34

purely functional perspectives and category theory as like the

play22:38

right foundations for computing. But you haven't needed it before,

play22:41

right? You've been able to hack your

play22:43

way with other paradigms. But deep learning and quantum

play22:48

both pose problems to the existing paradigms.

play22:51

It's just like it suddenly becomes too hard to make progress without

play22:55

categorically informed treatments of those computational substrates.

play23:01

Right. And so let me let me ask you about

play23:03

this because I and and and open I haven't read the paper yet.

play23:07

Um, so apologies for that. But I think if I'm going to take

play23:10

a wild stab, based on what you've everything you've said so far about

play23:14

where you might be going from this perspective, is it the case that,

play23:19

um, category theory will inform? The construction of domain specific

play23:26

languages for machine learning. That is definitely one outcome

play23:32

that I foresee. Okay. And so and so and my kind of

play23:36

let's say, uh, lay more lay understanding of that.

play23:41

It would be like, say, as a practitioner, I would now have a

play23:44

language that I could use to maybe even, uh, you know, define, um,

play23:50

modules, machine learning modules or constructs compositions of,

play23:55

you know, neural networks in a way that I can actually reason about

play23:58

as a human being, because there will be a new set of concepts,

play24:01

a new language that I can program in, if you will, which will ensure the

play24:06

semantics and structure that that I need to to do something useful. Yes.

play24:10

No. Right. That is exactly right. One of the outcomes of this

play24:14

research program program, we hope, will be something we

play24:17

can call like type driven ML or type driven deep learning. Yeah.

play24:22

I mean on on that we had a very interesting discussion just

play24:24

before we hit the record button. And there's a real problem now

play24:28

with ML engineering, but also just large scale ML systems like

play24:32

Facebook and its inscrutability. And this is this is a real problem,

play24:37

right. Because we tend to just blame

play24:40

the system. And it it gives gainful employment to

play24:44

lots of engineers because we lie. We say that these things reason

play24:48

that they that they operate in a in a rational way and they don't.

play24:51

And when they go wrong, we just blame the system.

play24:53

We just say, oh, you know, it's just a complicated neural

play24:56

network that it's like alchemy both in how we build the systems and

play25:00

how we explain the failure modes. And what you're talking about is

play25:04

a future towards not only having a formal way of understanding

play25:08

and verifying programs, but just being on a more solid

play25:11

footing for how we build systems. Yeah, no, precisely like this.

play25:16

Like our aim is very much to refound deep learning in such a

play25:22

way that you can actually do. Careful reasoning about it, right?

play25:29

That you can actually do theorem proving about the behavior of

play25:33

these neural systems. Right. Whereas currently we're hacking

play25:37

everything together and we, you know, we use benchmarks and

play25:41

heuristics and all of this stuff, like we can do better. Right.

play25:45

We know that we can do better with classical programing.

play25:47

So why can't we do better with neural networks?

play25:51

You know, lots of people say, oh, you simply can't do that.

play25:54

Maybe software engineering has to be like neuroscience.

play25:57

And I think it's like, well, there's definitely going to be

play25:59

something like that there. But a lot of it is formally

play26:03

structured so that the key will be to events that which is formally

play26:07

structured, and pull as much out of that as we can and make that

play26:11

explicit so that we know the bounds over which this thing is doing.

play26:16

Some like inscrutable curve fitting, whereas the other in other places

play26:20

it's doing this very formal stuff. So I mean, that that point is

play26:23

really interesting that, um, people have said to me that they

play26:26

think the future of software engineering is like neuroscience.

play26:30

And what I mean by that is we don't know how the brain works.

play26:33

We stick probes in and we try and kind of like, you know,

play26:36

figure out what's going on a bit like the blind men and the elephant

play26:39

by having all of these different views on this inscrutable mountain.

play26:43

And, um, it's just a little bit weird, isn't it? Right.

play26:46

So, you know, we're building these multi-agent systems and they

play26:49

have all these complex dynamics, and we try to apply engineering

play26:52

techniques by doing monitoring and alerting and having these, like,

play26:55

weird thresholds and just fixing problems as, as, as they show up.

play27:00

And a lot of people just say, well, this is the way things are, this is

play27:03

the way software is going to go. And as I understand software

play27:07

engineering, because Keith and I have have written a whole bunch of complex

play27:10

software recently, and part of the reason for having abstraction is

play27:15

to create a cognitive interface. So it's not necessarily about

play27:19

how the software runs because it gets compiled into like,

play27:21

you know, into low level code, native code on, on the machine.

play27:24

But it's about creating an abstraction to help us understand

play27:29

and communicate what's going on and also do like, you know,

play27:31

forms of verification later. Yeah. Well, and that abstraction is a

play27:36

domain specific language. And so every, every programmer,

play27:40

when they're, you know, when they're creating whatever, just, you know.

play27:44

A bunch of functions, a bunch of modules,

play27:46

a bunch of classes, like, you know, whether you know it or not, you're

play27:50

designing a domain specific language. And and we kind of just do it

play27:54

intuitively without much like, sort of, uh, math helping us.

play27:58

And I think the point is, like, if I understand both of you correctly,

play28:01

machine learning and quantum, you know, quantum computing, it's just

play28:04

beyond our capability to do that. Like, you know,

play28:06

kind of just intuitively, we need we need math like it's

play28:10

it's it's just too hard to hack it together anymore. Right? Right. Yeah.

play28:17

Yeah. So I think that's beautiful. I mean, and I, uh, in fact,

play28:21

there's a, there's a very old, um, programing book from like the 1970s.

play28:26

Um, I think it's called like programing languages or designing

play28:28

programing languages or something, which tries to make this point too.

play28:32

And it makes it makes two interesting points.

play28:34

It says, you know, when you're writing a program,

play28:36

whether or not you know it, you're creating a domain specific language.

play28:40

And so knowing something about programing, language design is

play28:44

helpful even to just writing code. So you said there were two or at

play28:49

least one thing you hope comes out of this project.

play28:52

Um, Paul is, is, uh, um, machinery to create domain specific

play28:57

languages for, for machine learning. What are the other things that

play29:02

you hope comes out of it? The other things that I hope

play29:05

come out of it are. Completely novel architectures. Yeah.

play29:11

So completely novel novel architectures that can do kinds

play29:14

of things that we currently don't know how to do. Right.

play29:16

So one thing that you might want to do is like, well, you know,

play29:19

so I'm in industry now. I'm not an academic anymore. Right?

play29:22

Suddenly I have to figure out how to eventually generate revenue

play29:26

with this foundational research. Say, what's one of the like?

play29:29

So consider Microsoft for example. Right.

play29:32

And this is, you know, a story that we've been we've been telling for

play29:34

some time is consider Microsoft and all of the different places that

play29:38

they are deploying AI systems or LMS specifically throughout their whole,

play29:44

you know, panoply of offerings. The only one that's actually making

play29:47

any money is copilot. Right. But copilot is not actually very

play29:51

good, right? LMS are not actually very good

play29:55

at generating code. Right. You know, they'll they'll like

play29:58

regurgitate some snippets, but like a staggering portion of

play30:01

the time they're wrong. The fact that we're used to this

play30:04

is because like when humans generate snippets,

play30:06

snippets of code for a staggering proportion of time, they're wrong.

play30:09

So we're not offended by this, right? But so if you could make something

play30:14

better such that if you say, have an LM that you can interact

play30:18

with in natural language, but that actually emulates a type system.

play30:22

Right, you will end up with an LM capable

play30:24

of doing higher order processing, higher order organization.

play30:28

You'll note that I'm intentionally shying away from using the word

play30:31

reasoning, because I sort of take exception to this language.

play30:34

I understand it to be the lingua franca of the practice,

play30:37

but I don't really like the language of reasoning, but I'll accept it.

play30:40

Exception to it as well. Are we? Very much do.

play30:42

But can we can we just linger on this?

play30:44

But this is a fascinating point. Now, um,

play30:46

there are a lot of people out there drinking the Kool-Aid right now.

play30:49

So there was the Codex model, which is what is built into VS code,

play30:53

and that's garbage. And GPT four turbo is pretty

play30:56

damn good. Anthropic opus is pretty damn good.

play30:59

There's this thing called Devin, which came out recently,

play31:02

and people are saying, oh my God, there's no limit.

play31:04

You know, we're going to start, um, getting this thing to generate code,

play31:07

and then we're going to fine tune the base model on the

play31:09

outputs of the code. And it's going to be a bit like

play31:12

evolution. It's just going to Complexify

play31:13

and Complexify. And we're going to memorize all

play31:16

of the types of reasoning and all of the types of things we're

play31:18

ever going to see. And I think that a lot of the

play31:20

people who are drinking the Kool-Aid about this just don't

play31:23

do serious software engineering. And that's because, like,

play31:26

for example, I was playing with opus three last night.

play31:29

I got it to generate an animation of a bunch of mathematical, you know,

play31:33

graphics, and it worked really well. I thought, bloody hell,

play31:37

it just worked. First time I then said, um, you know,

play31:39

can you modify it a bit and can you add this thing very quickly?

play31:42

It diverged and it just it turned into garbage and it was useless.

play31:46

And that's because it's almost like we're wrestling on the

play31:50

boundary of chaos. Right? So the complexity of software

play31:54

blows up so fast it is possible to do something simple.

play31:57

But as soon as you, you know, you modify it and you change it

play32:00

and you change it, you're in no man's land very, very quickly.

play32:03

And I think part of the reason of having this structured approach,

play32:06

you know, this abstraction approach is to wrestle with that complexity.

play32:11

So you're making the argument that if you're doing something like a

play32:15

large language model does now, but in the abstraction space

play32:18

rather than the low level syntax space that we do now, then we

play32:22

can get further than we are now. That is that is precisely the claim.

play32:25

There's a I think it's a relatively recent paper by I

play32:29

think the team was led by Tanya. Tanya Ringer or Talia Ringer.

play32:34

Excuse me. I can't remember which university

play32:36

it's based out of, but, uh, it's a great paper, and it

play32:41

demonstrates pretty conclusively that transformers just don't learn

play32:45

recursion. Right? Yeah. Of course. There you go. And so, like.

play32:50

Why would you ever think that by scaling an architecture that is

play32:55

likely to have fundamentally architectural, architecturally,

play33:00

like by construction limits on the kinds of things it can even

play33:03

figure out? Yeah, right. Why would you ask a system that,

play33:07

you know, can't do recursion, which we know to be the essence

play33:10

of programing, at least in some sense, right.

play33:13

Why would you presume that that model is ever going to be capable of doing

play33:17

the kind of software engineering that we actually need to do in the world?

play33:21

I mean, I think it's almost even worse than that because they they can

play33:24

kind of mimic recursion to a fixed depth, you know, because they have a

play33:28

fixed amount of computation. Right. But even worse than that,

play33:31

the real problem is they can't create abstractions because creating

play33:35

abstractions is it's abduction. Yeah. And I think they probably do

play33:39

create abstractions. Right. They've got like latent

play33:42

representations. But the problem is like the

play33:43

abstractions are not good. They're not principled.

play33:46

Oh, but I would argue that we create abstractions and we we embed them

play33:51

in our language and, and then they get memorized by, by a language

play33:54

model and they can be reused. But what we do when we design

play33:57

software, we, we, we solve this intractable problem.

play34:01

It's very mysterious how this happens.

play34:03

We select an abstraction from an infinite set, an infinite set.

play34:07

And then we use that to to represent a concept.

play34:10

And language models do not do that. No. Yeah.

play34:14

I mean, on the, uh, the recursion thing is really interesting because

play34:16

we were having a conversation the other day about, um, so there's,

play34:20

there's this really exciting technology you're working on.

play34:23

And as we were just saying, it's a way of building a semantically

play34:27

principled, um, way to design the future of neural network software.

play34:32

So you're rather than dealing with these inscrutable neural networks,

play34:37

we're actually dealing with compositional structure, with clear

play34:40

semantics from, from a high level. But the problem is,

play34:42

as I understand it, the the end result is still a neural network.

play34:47

And a neural networks are finite state automata. Right?

play34:50

So they they don't they don't do recursion.

play34:52

And then there's this notion that came up because we were talking about

play34:55

this over dinner, and there is such a thing as a recursive neural network,

play34:59

but a recursive neural network is still a finite state automata.

play35:03

The difference is it can we can modify backprop to recurse over

play35:09

a structured input. So for example, we can put trees

play35:11

into these things. Yes. So they can do structural recursion.

play35:15

Exactly, exactly. So so um we can because a

play35:19

recursive algorithm has a has a kind of stopping condition or a

play35:22

base case if you like. So we could put a tree in and then

play35:25

the algorithm can recurse that tree. And what it's doing every time is

play35:29

it's just embedding, um, everything. You know, you start with the leaf

play35:33

node and then you go up and you go up until you get to the top.

play35:35

And it's it's just incrementally embedding that datum into the

play35:40

neural network. But the neural network itself is

play35:42

still doing a fixed amount of computation for that input. Yeah.

play35:47

So so I guess the question is what what we're really excited about.

play35:51

Peter's been working on this for years is algorithmically

play35:54

aligning neural networks. And even GDL more broadly is about

play35:58

constraining neural networks so that they generalize better within

play36:01

distribution on, on geometric priors. And but the thing is, they still

play36:07

only generalize within distribution. So they, they,

play36:11

they don't act like Turing machines, which is to say they can't they

play36:14

can't address an expandable memory potential. Infinity.

play36:17

They can't, you know, they can't just go to Costco and get

play36:20

some more memory when they need it. They do a fixed amount of

play36:24

computation per datum. Well, so this is you know,

play36:27

this particular question is outside of my domain of exact expertise.

play36:31

So I want to be like careful about saying, you know,

play36:34

anyone who's like says, oh, this person said something wrong.

play36:36

It's like, yeah, I'm not claiming to be an expert in this particular

play36:38

domain, but this sounds like a story. It's like they can do coalgebras,

play36:43

however, right. And coalgebras are things where,

play36:45

yeah, you can just like keep popping something off, you know,

play36:49

ad infinitum. You can just like, query it.

play36:50

Hey, give me this part of this, give me this part of this.

play36:52

And there's no stop to how many times I could ask for that, right?

play36:55

So I can do coalgebraic stuff too. But my sense is that maybe this

play37:02

is some sort of more definitional confusion than it is really a

play37:08

statement of the kinds of things that neural architectures can do.

play37:13

Yes, I think I can. I can probably help clarify it

play37:16

because for better or worse, we've been having a lot of

play37:20

debates in our discord, you know, recently about this topic.

play37:23

And so I think I think maybe we'll test it out here today I've

play37:27

learned how to communicate this, you know, a little bit better.

play37:30

Like to try and explain the the pragmatic, you know,

play37:34

problems with it. Um, and so if we think about,

play37:38

first of all, just, just to set up something like very clearly, um.

play37:43

If we all understand what a finite state machine is, just imagine

play37:46

it as a graph like it can. It sits there and it can get input.

play37:50

And by the way, it can get an unbounded amount of input.

play37:53

You know, like for example, a soda machine.

play37:56

Yeah, you can keep feeding it instructions as long as you want.

play37:59

You can keep pressing the button and putting in quarters in it,

play38:02

like, you know, all day long. And, you know, uh, stuff will

play38:05

continue to happen forever. The point is that it has a fixed

play38:09

number of states that are defined like it build time when it was

play38:12

constructed, and it just navigates between these states, you know,

play38:15

always kind of responding, responding to the input.

play38:18

That's a finite state automata, a Turing machine is and I say nothing

play38:24

more than but but by the way, guys, this was a genius.

play38:27

You know, discovery like back in the day, a Turing machine can be

play38:31

boiled down to nothing more than a finite state machine that has

play38:37

an expandable read write memory. Or another way to think about it

play38:41

is it can have two stacks. It can push a value onto one

play38:45

stack or another stack. It can pop a value off any one of

play38:48

those stacks and push it down. So it has this expandable read write

play38:52

memory that's that's unbounded. Yeah. Um, that machine alone,

play38:57

that simple machine alone, in fact, is powerful enough to perform all

play39:02

computations that all what are called effective computations.

play39:07

So any mechanical thing you can construct, any, you know,

play39:10

physical thing that you devise that has like a digital nature and kind of

play39:14

do kind of stuff, you know, is no more powerful than this machine.

play39:18

And that includes, by the way, quantum computing.

play39:21

So quantum computers are still just Turing machines.

play39:25

They can do certain problems faster, but they can't fundamentally

play39:28

solve any problem that a Turing machine couldn't solve. Okay.

play39:32

So getting to neural networks like it should be immediately

play39:36

obvious to anybody that that a neural network has that finite

play39:40

state machine component. That's what the neural network is.

play39:43

And now you ask, can I hook that up to read write memory. Right.

play39:48

And have that that neural network do something useful?

play39:51

The problem we run into is training, training the neural networks to

play39:57

do something useful with unbounded read write memory.

play40:02

Like we don't usually do that. So all recursive neural networks

play40:06

that are trained today, they have a fixed window ahead of time.

play40:11

That's known at training time where they do like operations on whatever a

play40:16

context of 100,000, or they they sit there and operate on, um, you know,

play40:22

100 windows or something like that. And then you take the answer

play40:26

that you get at that point, like you've got an output signal.

play40:29

Whenever people try to train them to do unbounded calculations,

play40:34

like, for example, maybe I'm going to have an RNN and

play40:37

one of my whenever one of my bits lights up in the output, then I'm

play40:41

going to treat that as as halting. And I'm going to take the answer

play40:45

at that point okay. So I'm going to run the Turing

play40:47

machine during training. As soon as bit number 26 lights up,

play40:51

I'm going to treat that as the halting bit.

play40:53

And then I'm going to take the answer.

play40:55

That could be a Turing machine. The problem is when they try to

play40:58

train machines like that, we just can't do it.

play41:02

Um, and so if you think about the space of all possible programs like.

play41:08

The space of all possible programs are Turing the Turing machines. Okay.

play41:13

And we're doing a certain search. So today we've got the capability

play41:17

to construct all kinds of neural networks and use stochastic

play41:20

gradient descent or whatever. We're searching that space of all

play41:24

possible programs, and we're finding one that does something useful.

play41:28

What we're finding is that that space that it searches is a very small

play41:33

subspace of all possible programs. And it's really those that are

play41:38

that are also reachable by finite state automata.

play41:42

So we're not getting out of that boundary up into context free

play41:47

grammars, context sensitive grammars, recursively enumerable languages.

play41:52

So there's this huge other space that we just don't know how to

play41:56

effectively search efficiently with machine driven techniques yet. Yeah.

play42:04

I mean, I will confess I could be wrong, but I don't think that that is

play42:10

I think that's sort of an accident of particular definitions and the

play42:14

way that we've been structuring neural architectures. Right.

play42:18

And what I mean by aspects of definitions is like, yeah,

play42:20

if you pretend that your thing is of like fixed depth, but if you

play42:24

allow feedback into the system, then suddenly you can get I do think

play42:29

you could get access to that, that, you know, higher order computation.

play42:34

Well, I think it's actually just an accident of our of our,

play42:38

our optimization procedures. It's really it's an accident.

play42:43

Not so much of like the architecture is, is fine.

play42:46

Like you said, people do have, you know, RNNs that have feedback

play42:50

and, and all this sort of thing. But it's an accident of the way

play42:52

we train them. So when we when we go to train them,

play42:56

nobody currently trains in the way that I just said.

play42:59

We're like, when I do a training pass and I go to propagate my gradients,

play43:03

I never sit there and run the neural net, the RNN until a particular bit

play43:10

lights up and then take the answer. I run it for a fixed number of

play43:15

steps and take the answer. Yeah, well, because the problem is if

play43:18

you if you try to run it until a bit lights up, it may never light up.

play43:23

Well, no, you can halt. Uh oh. Okay. So I see, see the concern.

play43:27

Well, so what if you consider that? Hmm. Yeah.

play43:30

I mean, I don't think this is a real problem.

play43:34

I think this is a question of misconceptions.

play43:36

Maybe some that I have or maybe some that are popular in the the ML

play43:41

community, to which I, you know, I'm like a quasi expert in like,

play43:46

CT world and, you know, like I'm learning, you know,

play43:49

using CT to, you know. Speed up my process of, you know,

play43:57

developing expertise in ML. But I don't think that.

play44:03

This is a permanent problem, like in the sense of like,

play44:08

just like something about neural computation makes this impossible.

play44:11

I don't think that's the case. Right? Right.

play44:14

I think in particular, you could do this thing where it's like, oh,

play44:17

if this bit lights up, it stops. I do think that that's possible.

play44:21

Right. Well, so and in particular,

play44:23

I mean, and I agree with you like it is possible.

play44:26

And so there are attempts at this like the differentiable

play44:28

neural computers. Right. Or some configurations of,

play44:32

of neural Turing machines. But the it seems like the universe

play44:36

is conspiring against us. Because the funny thing that happens

play44:40

is when you generalize those neural networks and try to train them in

play44:45

this way, it just becomes so much more difficult to train them.

play44:50

I mean, maybe we should discuss, um, the so what question.

play44:53

So the the thing that Petr has been driving at is, you know,

play44:58

this algorithmic alignment of neural networks and the hypothesis

play45:02

is there are certain fundamental forms of reasoning where you do

play45:08

require this kind of, um, you know, quantification over potentially

play45:13

infinite sets, um, just computing the nth digit of pi or just sorting,

play45:18

um, uh, you know, like a set of numbers to, to arbitrary length.

play45:22

Now, the the incredible discovery of neural networks is so many

play45:26

things in the world, you can essentially interpolate within

play45:31

distribution and it just works. Now I'm a big believer in this

play45:36

system two idea that, you know, there's there's a lot of gnarly

play45:39

perception type stuff. And then there's this system to

play45:41

where we do need to do this kind of symbolic reasoning.

play45:45

Is there a space where we need to have Turing computation for AI?

play45:51

Even if you could solve all interesting problems, you know,

play45:56

to humanity with, with a, with a big enough Turing machine,

play46:00

you know, like you, you construct one the size of the sun and, you know,

play46:05

it's made out of neutronium and like, whatever else and like,

play46:08

it can finally do everything we want. You can do it with a lot less energy,

play46:12

a lot less matter, you know, if you can, if you can use, uh,

play46:19

programs that that can utilize some unbounded, you know,

play46:24

input sensitive amount of memory. So I think like that's again like

play46:30

it's about this program search thing, which is, um, the best solution

play46:35

in the sense of like ones that are practical, energy efficient,

play46:38

whatever many problems, the best solution lies in this broader space

play46:44

of kind of Turing complete problems. It's certainly been the case for

play46:48

human programing, like when, you know, people could just

play46:51

write programs that were fsas, like, we could have done that.

play46:54

We never needed to develop, you know, von Neumann machines and kind of

play46:58

like what's sitting on your desk. But we did because they were

play47:01

extremely, you know, useful for solving real problems.

play47:04

And and I agree with Paul that there's no theoretical limitation why

play47:10

you can't train a neural network, you know, to to search the space

play47:16

of Turing programs. It's just we haven't figured out

play47:18

how to do that yet. And there's some tricks that

play47:20

we're missing. And I'm just saying all the current

play47:23

architectures do not search that space. They search a small subspace.

play47:28

And there are many interesting problems outside of that space. Yeah.

play47:33

Can I frame it a slightly different way, which is there.

play47:35

There are two directions here. One direction is what Francois

play47:39

Chollet says. And you've just been alluding to

play47:41

Keith, which is that we build a kind of AI, which is almost

play47:45

purely a discrete program search. And the the other school of

play47:49

thought is we do some kind of neurosymbolic combination.

play47:52

And Josh Tenenbaum has been doing a lot of interesting work here with,

play47:56

you know, Bayesian program search and Dreamcoder and so on,

play47:58

and even things like fun Search and Q star and Tree of Thought

play48:02

and stuff like that is a step in this direction, which is to say,

play48:06

we start with this interpolative continuous vector space,

play48:11

which has been trained on a whole bunch of stuff, and then it's

play48:14

still traversable in that space. So if we do some kind of search

play48:17

controller on the top, I mean, of course it's not going to be

play48:20

searching the space of Turing machines, but it will give us a

play48:22

margin on the top, which will find all sorts of interesting things.

play48:26

So just doing this Q star algorithm where we're in an LM

play48:29

and we search a whole bunch of trajectories and we have some kind

play48:33

of cost function or whatever. And that margin might give us a

play48:36

dramatic improvement. But I still think that there's

play48:41

there's a whole space that we're missing that is not traversable

play48:45

through neural networks. Let me just say as well that what's

play48:50

happening right now is the real intelligence of of these GPT models

play48:55

is the collective search process. It's the collective intelligence.

play48:58

So we're all using it and we are just putting different inputs into it.

play49:03

So we're a bit like Tree of Thought. We're a bit like Q star.

play49:06

All of these people all over the world are just finding these

play49:08

interesting prompts, and they're kind of traversing the the input space,

play49:12

and they form like a layer of intelligence on top of the LMS.

play49:17

And I guess the question we're asking is maybe the first step would be

play49:20

just to replace that human search process with something automatic.

play49:24

Well, I mean, I'm just going to jump in real quick and then let Paul go,

play49:27

which is which is a two things. Again, there's no in my mind

play49:31

there's no theoretical reason why. So again,

play49:34

a Turing machine is a finite state automata with a read write memory.

play49:38

There's no reason that finite state automata can't be a neural network.

play49:42

It's just that we haven't yet figured out how to train neural

play49:46

networks that function in that way, at least not for general purpose,

play49:50

efficiently, etc. and I think like the other, as to whether or not

play49:55

like which direction is ultimately going to be the more successful.

play49:58

Pragmatically, I don't know. It's a big mystery to me,

play50:00

but it seems like, you know, the efforts of, you know, Paul and,

play50:04

you know, and George's company. Is working towards that route,

play50:08

which is like, give us, give us a domain specific language.

play50:12

That's like, in a sense, uh, many generations higher than Dreamcoder.

play50:18

It's like it's like a domain specific language that lets people program,

play50:24

um, these, these complicated, you know, modules and machine

play50:27

learning more effectively. And I see no reason why that

play50:30

wouldn't also empower, um, automated, you know,

play50:34

programing programing as well. Yeah. Broadly, I like that story. Yeah.

play50:39

Like I, I like I think I've been confused when this question has come

play50:44

up because I thought it was the case. I thought it was the claim that,

play50:47

like, we somehow can't do this and I don't know, like I might be wrong

play50:53

again, but I don't at my current level of understanding see any

play50:56

like fundamental obstruction to developing neural networks that are,

play51:02

you know, Turing complete or whatever the magical language,

play51:04

magical words are for it. Right? I don't actually see any like,

play51:08

essential obstruction. We may not have figured out how

play51:11

to do it yet, but I don't see why that's impossible.

play51:15

Well, they have to be a neural network with a read write memory.

play51:18

Yeah. And once once they have that. So they're augmented, you know,

play51:22

neural networks. And then once we figure out how

play51:24

to train them. Bikes kind of open season on on

play51:30

every computable problem. Yeah. I mean, I think we can agree that

play51:33

it's probably possible in principle and currently not in practice.

play51:38

And it is a very, very difficult problem.

play51:41

I mean, we can we can all agree on that. Yeah.

play51:43

Um, well it's currently it's definitely currently not not,

play51:46

not nobody's doing it, you know, that I know of currently.

play51:49

I think it is really difficult. I think there's some magic tricks

play51:53

that we haven't found yet. And they may lie in this direction

play51:56

of, you know, starting to apply category theory to, to, to design,

play52:02

um, domain specific languages that, that have the semantics that we need.

play52:08

Yeah. I mean, another thing was when I,

play52:10

when I first read your paper, I assumed that there was some

play52:13

kind of I guess I visualized it as a kind of compiler.

play52:17

So now it's, it's a bit like, imagine we had something like

play52:20

Jax and we did to Jax what TypeScript did to JavaScript.

play52:24

So, you know, we add typing to it, we add compositional semantics

play52:27

and so on. And what we've done is we've now

play52:30

created this whole like interface basically to

play52:34

understand how to build the next generation of neural networks.

play52:37

And then the system is a bit like a compiler.

play52:40

So it will compile down to neural network modules that have,

play52:44

you know, GDL constraints or algorithmic constraints and so on.

play52:47

And in the future, maybe even symbolic networks

play52:50

that it would just it would just automatically do all of this stuff.

play52:54

Because what we want to get to is we need to get away from the alchemy,

play52:57

right? We need we need this compiler to kind

play53:00

of do the hard work for us so that, you know, just all of these problems

play53:05

that we're seeing at the moment just are a thing of the past.

play53:08

I have a I have a I have just a quick question because like,

play53:11

is it is one proper way to think about this, that. We are.

play53:17

We are pushing a lot of a lot of the work.

play53:20

We're pushing a lot of the necessary semantic work to syntax where it can,

play53:26

where it can then be automated. Like that seems to be really

play53:28

like the story of mathematics, too, as a whole.

play53:31

It's like creating new syntaxes and new sets of rules,

play53:35

new domain specific languages that effectively systematize and

play53:40

allow to be automated. You know, the necessary reasoning.

play53:45

Yeah, I think that the way that the mathematician might phrase this, and

play53:49

I think this is what Andrew actually provided a very nice description of,

play53:54

is, you know, the history of mathematics has taught us that.

play54:02

Abstracting a lot of the specificity away and making your problem

play54:09

statement more and more general. Well,

play54:11

you might think would make it harder, actually makes it easier, right?

play54:16

Because the point is, if there's a lot going on,

play54:18

I don't know what to grab on to. But if I reduce the things that

play54:21

I'm allowed to grab onto, I can think more clearly about them.

play54:25

And the elegance allows me to reason carefully about them.

play54:28

And here I do mean reason, right? In the sense of coming up with

play54:31

reasons for things right. Why does this do this?

play54:33

How does this work? Those kinds of things. Right.

play54:36

So you abstract away the sort of like essentially analytic detail

play54:39

to this, to a problem, and then suddenly it becomes tractable.

play54:42

It becomes conceivable. I would say that what really

play54:45

category theory does is it abstracts the idea of a semantics.

play54:50

Right. That's really what it is. Right, is because you say, oh,

play54:53

what is what is a category? It is a universe of some

play54:56

particular kind of thing. You know, your objects are the the

play55:00

instances of that kind of thing, and then your morphisms are the

play55:03

computations that go from one instance of that kind of thing

play55:06

to another instance of that kind of thing. Right.

play55:08

So it is really a theory of abstract semantics.

play55:12

Just a quick question on that. Like you're you're aren't aren't

play55:15

you allowed to have more than one type of object in a category or. No.

play55:19

Or does that have a different name. Well, so any any so what do you

play55:24

mean kind. What what does one mean? I mean, can I have a more morphism

play55:28

that transfers an object of type T to an object of type S and

play55:32

then a different morphism. Yeah. But they are all objects of the

play55:35

same category right. Different types can be category

play55:38

but different types. Exactly. So the point is like the notions

play55:41

of object and type are not are not exact.

play55:44

They're not exactly the same thing. Right.

play55:46

You know, for example, if you're thinking in terms of Haskell, which,

play55:49

which is like the most likely way that someone like listening to

play55:52

this or watching this like this is going to be the most familiar thing

play55:55

is there's this category, Hask, which is whose objects are all of the

play56:01

possible types that can be formed with Haskell, and whose morphisms

play56:06

are all of the possible function, all of the possible well-typed

play56:09

functions. Okay. Right. So is there is there a particular

play56:14

name for a category with objects of more than one type?

play56:19

Well, it's type category or something.

play56:21

I mean, I'm sure that I think it's the kind of thing that once

play56:25

you stare at it long enough, the question ceases to make sense.

play56:28

I think I think it's that kind of thing. Right.

play56:32

Whereas it's sort of like it's it's that it's trying to like, mix these

play56:35

metaphors a little bit too rigidly. That introduces that I would say is

play56:39

like any given category is like. The universe of a particular

play56:44

kind of thing. Right? And then if you want to talk

play56:47

about categories where you've got multiple different kinds of things,

play56:50

those you're actually talking about working in the category of

play56:54

categories and talking about functors and natural transformations

play56:58

between functors. Right. You're talking about relations

play57:01

between this, the the universe of things of this kind and the

play57:05

universe of things of this kind. And, you know, translations from

play57:08

the universe of things of this kind into things of this kind.

play57:11

So I'd say for any given category, you've got one kind of thing.

play57:14

But the point is, that doesn't mean that category theory,

play57:17

at any given time, is only about one kind of thing.

play57:19

It's actually about relational, structural relationships between

play57:23

the universes of different kinds of things all at once.

play57:27

I'm going to have to go back to those those tutorials, by the way,

play57:29

there there are these online tutorials done by, uh, Brendan Fong

play57:33

and some others that those are the ones I was trying to go through.

play57:37

Yeah. Uh, no, those are great. And and it's fun because, um.

play57:41

You know, it's really enjoyable when you come across a topic

play57:45

that that blows your mind. And the way, like I, I kind of

play57:49

quantify a mind blowing thing is, is if I understand it the day that I

play57:53

learn it and then somehow the next day totally no longer understand it.

play57:59

It's kind of like it's kind of mind blowing.

play58:01

Um, so here's a here's just kind of a funny question for you.

play58:04

And answer this any way you want, but, uh, which do you think is,

play58:07

uh, more mind blowing math? Um, category theory or number theory?

play58:13

Oh. Category theory. Um, the thing is, number theory.

play58:18

I don't understand it well enough to have my mind blown. Right?

play58:21

I don't know what the weird results of it. This this better.

play58:25

I don't know what the the weird results coming from number

play58:28

theory mean. I have no way to interpret them.

play58:30

It's like what? There's a number that does that,

play58:32

right? This particular number field is like,

play58:34

you know, has some weird, you know, abstract property

play58:36

relative to another one. I don't I don't even know what

play58:38

those things mean. Right. You know,

play58:40

because I'm not a number theorist. I haven't spent forever thinking

play58:42

about this. But, you know, I have at this point

play58:44

spent, you know, like at least ten years thinking about categories.

play58:48

And so I've found it to be a lot more mind blowing, because to

play58:51

have your mind blown, you have to understand what it means. Right?

play58:56

And I do feel like at least whether it is my own sensibility

play59:00

or essential to the discipline, I won't say, but I kind of think

play59:03

that category theory is more mind blowing. Otherwise I'd be.

play59:06

Otherwise I'd be a number theorist. But, um, you know, no, I like that.

play59:10

In order to have your mind blown, you have to first understand.

play59:13

So that can be, uh, blown away. Yeah, because otherwise it's

play59:15

just complete mystery. And it's like,

play59:17

mystery is not interesting, right? UN mystery that has the flavor of.

play59:21

I can understand this. That's interesting.

play59:24

Otherwise it's just like, huh. That's weird.

play59:26

And I move on with my day. Got it right.

play59:29

It's got to it's got to trigger the sense.

play59:30

It's like, oh, this something weird happened and I can make sense of it.

play59:34

I can make sense of it. I haven't made sense of it,

play59:36

but I can, right? I have to have that taste.

play59:39

Otherwise it's boring. That makes sense. That makes sense.

play59:43

So, um, one other thing, um, that that confused me a little bit

play59:46

about the category paper is, um. You were talking about being able to,

play59:51

for example, um, create an analytical pathway, if you like, between,

play59:57

um, let's say a list or a tree. And I, I'm understanding the

play60:02

work as being about, um, data rather than algorithms.

play60:06

And I think the way I understood it was, again,

play60:10

I was kind of brought up on Peter's work with algorithmic alignment.

play60:13

So he's thinking about, um, you know, getting neural networks to behave as

play60:18

if they are doing sorting or doing like Dijkstra or something like that.

play60:23

And I was thinking about, you know, this, this could become a kind of

play60:26

system to toolkit where I need to have these algorithmic primitives

play60:30

and I can compose them together. So do I understand correctly at

play60:32

the moment that it's almost about, um, like a types of data rather

play60:37

than types of algorithm? I don't know if those are

play60:41

actually that different. Right. And the reason why is if I have a

play60:44

well-typed function from one data type to another one, that's both

play60:50

about algorithm and data, right? That talks about what you're

play60:53

allowed to do with data is exactly what the algorithm does. Right.

play60:57

So the point is that I don't think that that distinction is meaningful.

play61:01

Interesting. Hmm'hmm. Yeah. Can I, I mean, I think I,

play61:07

I sort of, uh. I have a lot of sympathy to that

play61:10

point of view, because this goes back to, um,

play61:13

when I when I first learned about, you know, algebraic data type

play61:18

theory and its application to, to programing languages,

play61:22

I started to really realize, yeah, you know, data and, uh,

play61:27

I don't know what the right word is, but but data and type data and set

play61:32

of transformations allowed on that data are, are almost intertwined

play61:37

into the. You know, like a saint. Like one isn't really useful

play61:41

without the other. It's kind of a weird duality or

play61:44

something. I'm not sure how to phrase it.

play61:45

Yeah, no, I mean, like, I might, you know, hazarded some sort of like,

play61:49

you know, dialectical fixed point is like one informs the other

play61:52

and like the notion of type and well-typed function reinforce

play61:56

and define each other. Right. It's a very much a functionalist

play61:59

kind of kind of thing. Right? You know,

play62:01

type theory is the core of like, you know, a syntactic treatment of

play62:05

a synthetic mathematics, right? At least that's, you know,

play62:07

the weird, you know, not fuzzy, but like rarefied and

play62:12

unbelievably abstract world that, you know, through which I

play62:15

discovered type systems. Right. But yeah, maybe, maybe another

play62:19

way to think about this. Maybe this is related to, um,

play62:22

get your opinion on this, Paul. That may help Tim.

play62:24

For us to think about is, you know, for many, let's suppose you have

play62:29

and I'm just going to say class, I'm not trying to be like object

play62:32

oriented or something. But suppose you have a type,

play62:34

you know, that's that's interesting. Like whatever complex numbers,

play62:37

you know, you're you're doing a lot of programing and complex

play62:39

numbers are useful. Yet there are many representations

play62:43

of that complex number. You know,

play62:45

you could like literally store it as a piece of like literal data.

play62:49

That was an angle and a magnitude or two, you know, x,

play62:53

y coordinates or whatever. There's many possible data

play62:57

representations of it, and it's just irrelevant.

play63:00

It's like you choose one based on like, you know, whatever works,

play63:03

whatever's efficient, whatever. I feel like, you know,

play63:06

whereas all the, all the important and interesting stuff is, is and

play63:10

is at a higher level than that. I would hazard that I would be

play63:15

uncomfortable with the statement that it's irrelevant. Right.

play63:18

And in fact, so here's one of the like. So you know.

play63:22

Maybe this is not what happens if you're like in graduate school

play63:26

for mathematics, in places where category theory doesn't have this

play63:29

sort of like, you know, somewhat deprecated history using deprecated

play63:33

in this kind of like a slang way. I was like, oh,

play63:34

this is kind of like not approved kind of mathematics, right?

play63:38

You know, North American math, like specifically American Academy.

play63:41

Yeah, exactly. It's heretical. Or like, people don't like it so

play63:43

much. Like why? It's like, well, because it doesn't

play63:46

feel like you're doing anything right. Why do you do mathematics?

play63:49

What is the pleasure? It's like, I want to feel like

play63:51

I'm doing the computations. And category theory has this

play63:53

nasty habit of telling you that you didn't do anything right.

play63:56

But it's like that scene in good Will hunting where you know, they're up.

play64:00

There's some circles and lines drawn on the board,

play64:03

and he goes up and erases one and draws a line somewhere else,

play64:06

and they're like, oh, you know, it's like, you know, well, what?

play64:10

Yeah, I mean, the thing that it's sort of about category theory is

play64:14

actually, you know, Grothendieck had this statement that was like,

play64:19

if it's not obvious, you're not ready to work on the problem yet.

play64:21

So and, you know, he had this whole, you know, principle of like theory

play64:25

building for problem solving. And one of the aspects of theory

play64:28

building for problem solving is you develop a vast library of equivalent

play64:33

or at least related representations of similar or the same concept,

play64:37

and you solve a problem by finding the one in which it's obvious. Right?

play64:43

It's sort of it's a shamanic mathematics.

play64:45

It's a mathematics about understanding something and then not

play64:48

really having to do that much. Right. The point is, think about this in the

play64:52

right way and it becomes trivial. This is or not trivial,

play64:55

or at least like clear, right? You know, this is very much like

play64:59

the task of science is to carve nature at the joints. Right.

play65:02

Like that's the thing is like, well, then you have a bunch of different

play65:05

ways you could carve this up, find the one in which it's easy.

play65:10

Right. So that makes sense. So I mean, like thinking about

play65:13

all the different possible representations of complex

play65:16

numbers as a data representation. It's still again informative of

play65:21

the algebra that you're, you know. Yeah. Exactly.

play65:23

Which what what are you going to do with it. Intricately linked. Yeah.

play65:26

Like what are you going to do with it that determines which representation

play65:29

you should use. Yeah. Yeah. So you're telling me the scheme

play65:32

people were right all along. That code is data. It's just.

play65:35

Oh, absolutely. No, they were completely right.

play65:39

Yeah. No, I. What was it? Uh, 10th grade learning scheme.

play65:43

Um, yeah. Uh, that definitely informed my

play65:48

sensibilities a lot more than I realized.

play65:50

You know, because I went away from doing, like, math and science

play65:52

for quite a while before coming back to it as an undergrad.

play65:55

And as I've gotten deeper into mathematics, I was like, oh,

play65:59

I really did drink this. Like functional programing kool

play66:02

kool aid of this, like deeply pure 1970s variety a long time ago.

play66:07

And whether I understood that that's what I had done, I absolutely had.

play66:10

But I do think that they were right. Like the difference between data

play66:13

and code is, is like there isn't one, right?

play66:15

Actually, like the difference between data and code pops up in

play66:19

neural networks a lot, right? You know, people like thinking

play66:22

like data is like, you know, training data is like what is like

play66:25

trading data is the goddamn program. The neural network is the compiler,

play66:29

right? You are compiling training data into

play66:31

the final state, which is going to be the function that you're

play66:33

actually going to use, right? This is why I think eventually all of

play66:37

the copyright lawsuits against the major AI companies will go through

play66:41

because they are using copyrighted data to, to program their machines.

play66:46

That's exactly what they're doing. It's not learning as bullshit.

play66:49

It's it's it's programing. Right? It's a weird programing that works

play66:54

by way of like differential, like sort of, you know, like a slightly

play66:57

mutant differential geometry. But it is just programing, right?

play67:01

You are literally programing your models with data. Right.

play67:05

So any distinction between data and code is nonsense.

play67:09

That is very interesting. Some great soundbites from this

play67:11

conversation, Tim. Uh, for me, for me, it was kind of

play67:18

a compound of little different, you know, things that I was taught.

play67:22

Like when I learned that you could exponentiate matrices. Yes.

play67:26

You know, I was like, wow, this is weird. Uh, yeah.

play67:29

You know, when, uh, when I thank God for power.

play67:33

Yeah, yeah. Oh, yeah. Yeah. Well, that's another cool thing

play67:36

about power series is when when you finally learn y, you know,

play67:40

e to the I pi equals minus one. Yeah. And you look at the power like

play67:43

that's that's kind of mind blowing. And then, uh, and then when I,

play67:47

when I saw how to construct, I think it was Peano arithmetic from

play67:50

this kind of algebraic type theory, you know, perspective and things like

play67:55

that just all adds up to the same. Really fascinating.

play67:59

Yeah, I don't know fact about the universe or fact about logic

play68:03

or something. Yeah. Um,

play68:06

so we've been using some terminology like syntax and semantics.

play68:10

Can you, can we just like, zoom out a little bit?

play68:12

What do those terms mean and how are you using those terms in

play68:15

this context. Okay. So what. I mean to say is okay, syntax.

play68:22

Formal expressions that you do not assume have any meaning,

play68:26

but instead you just have rules that allow you to combine that thing.

play68:31

So purely formal constructions, usually in terms of symbols,

play68:34

whether it's graphical or text, that sort of irrelevant.

play68:38

But the point is it's purely formal manipulations of symbols. Right.

play68:41

And semantics is when those are interpreted as meaningful. Right.

play68:45

And the kind of two category theory that I've been talking about is

play68:50

like an abstract semantics, right? Whereas this category with like a

play68:56

single object in these endomorphisms, this group,

play68:59

this like classifying groupoid of a group thing that is the.

play69:05

It's sort of like that's the categorical avatar in this case of

play69:08

like the syntax of being able to write g dot x and g prime dot g dot

play69:13

x, that kind of thing. Okay, okay. So, um, you know,

play69:16

from a linguistics point of view, you know, like people understand

play69:19

semantics to me, to be the meaning, um, the aboutness of, of a word.

play69:24

So, for example, I could say Wagga and dog and it would point

play69:28

to the same cognitive category, which is a dog.

play69:31

Um, but in, in this framework, just help me understand exactly

play69:35

like what is the semantics? That is a good question, right?

play69:40

Because you're asking about semantics in this very

play69:43

particular way about as like, what is this thing about? And. Well.

play69:49

The syntax allows you to write down constructions or operations

play69:54

which are manifest in the semantic category. Right?

play69:58

So the category is the world in which those in which that syntax

play70:02

is interpreted as, be it things like elements, or be it things

play70:06

like functions or be it things like transformations. Yes. Right.

play70:10

So this so the category that you're so there's this semantic category.

play70:15

And then the fact that they're also encoding like syntax as a category

play70:18

itself that ends up, you know, opening up another sort of can of

play70:21

conceptual or werm sort of breaking down like the easy distinction

play70:24

between syntax and semantics. But the point is the story that

play70:28

they were telling, and this is sort of instructive.

play70:30

And this is sort of very much what Andrew says is, okay,

play70:32

so you start with syntax. You can write that down into the

play70:36

universal category that has that syntax interpreted as morphisms.

play70:40

And then you look at functorial semantics of that.

play70:43

So that's the interpretation of this universal category in which those

play70:47

terms have meaning. Yeah. Right. Which means they have no other

play70:51

meaning other than the formal relationships between them that

play70:53

you specified in your syntax. And then you take a functor out

play70:57

of that thing into something else and that interprets them.

play71:00

So that's, this is, that's this notion of functorial semantics.

play71:03

The story that I'm talking about is I'm not even worried about the ever

play71:06

writing down any individual terms. I'm just interested in sort of this

play71:11

two dimensional graphical picture that allows you to talk about what

play71:14

kind of thing algebra is at all. Right.

play71:17

So for the story that I was interested in this and the way I

play71:20

come to this is we don't, don't even really ever study the elements.

play71:25

Like I want to get away from that. Right.

play71:26

Because the elementary perspective, while it's useful and helpful for

play71:31

like the analytic construction of these neural networks, it's not

play71:35

that good for proving properties of those of those networks. Right?

play71:39

You know, it's pretty easy to like you write an

play71:42

expression that has 5 to 10 terms. Suddenly it's very confusing.

play71:45

Whereas you draw an element like a diagram that has just like 5 or 6,

play71:48

you know, 1 or 2 dimensional pieces, and you can derive formal properties

play71:51

about it's a little bit easier. Okay, okay.

play71:54

I mean, because we're saying earlier that, um, we could use the example

play71:57

of group theory and I could look at all of the, like, symmetry

play72:01

preserving transformations of a triangle and the aboutness, you know,

play72:04

the semantics is the triangle. Yeah. Is the transformation the.

play72:07

Exactly. It's about the triangle. It's about the actual

play72:09

transformations of the triangle. And then the syntax is just these

play72:13

things as elements of the abstract symmetry group that that makes sense.

play72:17

So when I was discussing this with Andrew, he said that the

play72:22

reason this separation is useful is because if you think about it,

play72:26

it's like you're talking about you're talking about two, two worlds.

play72:29

And he was arguing that we want to or we have less representational

play72:34

ambiguity in the world of semantics. Yes. So there's a kind of asymmetry.

play72:39

So for example, you know, using this, this um group

play72:41

analogy of the triangle, we have these high precision,

play72:45

you know, we can we can actually draw a grid of all of the, um, you know,

play72:49

symmetry preserving transformations. And now we've got a very high

play72:53

resolution way of understanding that triangle.

play72:57

But I guess the question is, though, is it is it really true to say that

play73:02

there is less representational ambiguity in the semantics?

play73:07

I guess, like the question I'm asking here is, um, are these

play73:11

semantics ordained by God or are they purely a human construct?

play73:15

I mean, does semantics just mean something to humans?

play73:19

I wouldn't say that they are ordained by God, but I would say that what you

play73:23

can do is like you can have like the initial semantics of something,

play73:27

which is some. Right? So if a category is some like

play73:31

abstract universe in which stuff does, in which things like a universe

play73:35

of a kind of thing and then morphisms between that kind of thing. Right.

play73:39

You can talk about like the initial semantics of something. Right?

play73:42

And this means the one that that like the semantics that has only the

play73:47

properties that are necessitated by the syntax and no other. Um, right.

play73:52

And that is genuinely necessarily less ambiguous than any particular

play73:57

syntactic presentation of something. Right.

play74:00

So I'm not going to be able to come up with two distinct

play74:03

presentations of a group. But the point is, I can write

play74:06

down things like groups by way of generators and relations. Right.

play74:10

But there may be and usually are, many different ways I could write

play74:14

down the same algebraic structure with different syntaxes.

play74:18

However, there is only one semantics for that thing. Um, cool.

play74:23

Now, um, so so you've written this paper, right?

play74:27

If you were to give me a one minute elevator pitch of the paper,

play74:31

what would it be? Title of the paper is categorical

play74:34

Deep Learning an algebraic Theory of Architectures.

play74:38

Uh, one thing that, like the. You know, hardcore category theorist

play74:43

amongst us might take objection to is that the words algebraic theory

play74:47

actually have a formal meaning in category theory, and that's not

play74:52

actually exactly what we mean, right? An algebraic theory to a category

play74:56

theorist is like a linear theory. It's a it's one of these syntactic

play75:00

categories of like the allowed manipulations, stuff like that.

play75:03

Whereas what we mean is specifically architecture as being the semantics

play75:07

of the sort of these universal algebraic notions. Um, so.

play75:13

That's the title of the paper. Gotten distracted by this statement

play75:16

that we said algebraic theories. Maybe that's not the best name,

play75:19

but it's a pretty good name. Why is it a good name?

play75:21

Because the point is that we want to say that all of the things

play75:23

that have been studied in GDL and significantly more examples,

play75:26

all of them are in fact instances of algebraic structures,

play75:31

be they structure preserving maps, i.e. morphisms between algebras

play75:34

for monads or structure maps of algebras for monads or the

play75:38

appropriate dual constructions. Say these are sort of these like

play75:41

universal structures, and then we're interpreting them

play75:44

into a two category whose objects are vector spaces, whose morphisms

play75:48

are parametric maps. Right. A good example of the kind of

play75:51

parametric map that pops up in deep learning is a ReLU network

play75:55

is a parametric function, right? Because for any given choice of

play75:59

weights in all of your various layers, right, you get one function,

play76:03

but you don't really want to think about those functions as distinct.

play76:05

You want to sort of group them all together.

play76:07

And so this is why there's the innovation of this notion of the

play76:09

parametric function. And then you want to talk about being

play76:12

able to change those parameters. So that requires the introduction of

play76:15

this notion of reparametrization. Right.

play76:18

So specifically what we develop or we don't develop it in this paper.

play76:21

Right. This para has been around for

play76:23

some time. But what we really do is we say,

play76:27

hey, if we do two dimensional universal algebra valued in these

play76:32

categories of objects, parametric maps and reparametrizations,

play76:35

that allows us to talk at a very high level of abstraction about the

play76:38

kinds of things like the geometric priors of GDL or the structure maps,

play76:43

which are exactly the representations of groups that

play76:47

are like the foundation for GDL. It allows you to talk about it

play76:50

in a far more abstract way that more naturally generalizes to

play76:56

algorithmic constraints, as opposed to merely sort of conservation law

play77:00

type constraints, which is that which groups are really good for.

play77:03

Okay, so let's just talk broadly about abstraction. Right.

play77:07

So what is the purpose of abstraction.

play77:09

And in this particular case give us an example of a powerful

play77:12

abstraction that would, you know, change the way that we reason

play77:15

about neural networks. Okay. What is the purpose of abstraction

play77:19

from a mathematician's standpoint if you want to be utilitarian about it?

play77:24

Uh, the purpose is. Do I want to climb a mountain with a

play77:30

lot of incredibly specific gear, or do I want to learn to be a

play77:35

mountain goat? Right. That's really what the purpose

play77:39

of abstraction is. The purpose of abstraction is

play77:41

getting away from all of these specifically adapted tools and say,

play77:46

okay, these are the tools that I'm going to allow myself to use for it.

play77:50

Suddenly, the problem is more general.

play77:52

And like you might think that that's going to make it harder,

play77:54

but it doesn't. It just teaches, you know, don't hold

play77:57

on to the like, individual unique specificities of each instance.

play78:01

Those are distractions. Those are not the structural

play78:03

properties. Right. Those are the those are those are the

play78:06

specifics of specific instances. You want to get away from those

play78:08

because those are distractions from how you solve the problem

play78:12

in a way that's, you know, actually comprehensible. Okay, okay.

play78:15

I mean, I think there's an element of there's a sea of complexity

play78:18

that we need to tackle. But, um, the Mountain goats,

play78:22

quite an interesting example because I would say that's

play78:24

actually an example of specificity rather than generalization.

play78:27

But but then we get to Canalization. Yeah. So that that's fair. Right.

play78:31

This is a question of mixed metaphors, right.

play78:33

You know, and getting getting excited about

play78:36

one and getting stuck in it. Right. You know, so this is a train of

play78:38

thought that gets a little bit too far down in one direction.

play78:40

But what's what's the point? The point of abstraction is to get

play78:43

away from all of this extra baggage that you might think is providing

play78:48

you information, but isn't right. You might think that like the

play78:51

easiest way to solve a problem about the natural numbers or about

play78:55

the integers would be to like, know a lot about what how two

play78:59

works or how three works. Turns out, it's way easier to develop

play79:04

the abstract notion of a ring and and develop this like massive library

play79:08

of theorems about rings. Right? And it's actually by abstracting

play79:12

away any of the details that forces you to think in a principled way.

play79:16

Yeah. So I get all of that. And duality, for example, is,

play79:20

is is a wonderful feature of category theory because you can prove

play79:23

something in one domain and because you have all these categories set up,

play79:27

you have you have the morphisms and so on, then that propagates.

play79:30

And I guess this was something that we used to do manually in

play79:32

scientific theories. You know, we would have to kind

play79:34

of prove duality on a case by case basis. Yeah.

play79:37

So the and this is a sense where like category theory allows you to do

play79:41

it once and then interpret it into various contexts and see it as an

play79:46

instance of the same thing. Right. But coming back to the mountain

play79:49

goat example then. So because because that's a

play79:51

weird juxtaposition. It is because it's both

play79:54

specificity and generalization. The reason I said canalization

play79:58

is because, you know, we we know how evolution works, you know,

play80:00

and how how living things work. There's this canalization.

play80:03

So like, you know, this, this goat, it might seem like it's the,

play80:07

you know, the perfectly, um, well adapted and specified,

play80:11

you know, physical form to, to work in this situation.

play80:14

But actually, if you, um, you know, we're talking about compositionality.

play80:18

If you decompose it into its component parts, those parts are,

play80:21

you know, they represent the ultimate form of generality.

play80:25

Yeah, I would buy that. Okay. But but then one thing I think

play80:28

about here is, um, just how how fundamental, how universal these,

play80:33

um, general components are because, you know, we want to build a

play80:37

structured picture of the world. Yeah,

play80:39

out of building blocks which appear universal and probably they're not.

play80:45

Maybe some of them are universal, maybe some of them are not universal.

play80:48

But, you know, we want to have the Lego set for the universe.

play80:51

No, we definitely want to have the Lego set for the universe.

play80:54

I don't know what the Lego set for the universe is,

play80:55

and I doubt that there's only one. But what the last, you know, 60,

play81:00

maybe 80 years of mathematics have told us now is that category

play81:05

theory is a damn good one, right? It's a principled theory of

play81:10

abstraction, right? It says, okay,

play81:12

how do you what is a good abstraction for a notion of structure?

play81:16

And what are the principles by which you should reason about it? Right.

play81:20

And it becomes this task of, hey, you know, the category theorist sees

play81:23

a problem in some particular domain and says, what are the tools that I

play81:26

know that kind of map onto this? How can I it tells you which parts

play81:31

of that might actually matter and which parts of it might not. Right.

play81:34

So you develop a categorical abstraction about it,

play81:36

and then suddenly it becomes often easier to think about and it becomes

play81:40

easier to prove things about. Yes. And I think that the miracle of human

play81:43

cognition in particular, as we were alluding to with Keith earlier,

play81:46

is, is this I think abduction is the same as categorization.

play81:49

So I think, you know, we we select these categories,

play81:53

we create, maybe we select them, maybe we create them.

play81:56

I mean, that's an interesting philosophical thing.

play81:57

But but we have we have the ability to do that.

play82:00

And we're kind of we're carving up the world.

play82:03

And it's our way of overcoming the sea of complexity in the universe.

play82:07

I would buy that. That's that's a a good story is

play82:11

you don't, you know, it's sort of like, you know,

play82:12

what happens in neural networks is like, do you want trillions

play82:15

and trillions of parameters that allow you to recreate exactly this?

play82:19

Or do you want a much lower dimensional space of explanations

play82:24

from which you can rederive and constantly be checking against

play82:27

what you're experiencing? Yes, and this is the story of

play82:30

parsimony in general. So on the one hand, you could have an

play82:33

inscrutable large language model, which is just a sea of neurons.

play82:37

On the other hand, maybe there is some kind of

play82:40

decomposition of language model. Maybe just forget the language model.

play82:43

Maybe there is a decomposition of reality and no one knows how complex

play82:47

that thing will be, right? You know? Maybe it's possible to really

play82:50

decompose it down into like a thousand types or something

play82:53

like that, but that must be a better way to go than to have

play82:57

the inscrutable language model. I definitely think so.

play83:00

And that's definitely the bet we're playing with.

play83:02

Symbolica okay, now there's the question of what

play83:06

we mean by reasoning as well. So I think like we intuitively agree,

play83:10

I think that in order to reason, we have to be working on this kind of

play83:14

decomposed, parsimonious view of, of, of the universe and the

play83:20

there's a relationship between reasoning and generalization.

play83:23

So type two generalization, as we understand how I,

play83:27

how I reason about chess, for example, is we have this discrete

play83:30

categorical model of the chess board. And reasoning is kind of like taking

play83:35

trajectories and doing various forms of deduction and inference

play83:39

and so on in that discrete set. So that's reasoning.

play83:42

Um, is reasoning more complicated than that?

play83:44

I mean, maybe what do you think reasoning is?

play83:46

My definition for what reasoning is, is heavily informed by what I know

play83:52

from the work of Sperber and Mercier. Right?

play83:55

I am really influenced by the idea that what reasoning is, is the

play83:59

coming up with reasons for things. Reasoning is about explanations,

play84:04

right? So this is one of the reasons why

play84:06

I sort of bristle against the the sort of common ML use of reasoning.

play84:10

Right, is I'm not entirely comfortable with the term because I

play84:13

think it's a category mistake. Right? I think reasoning is coming up with

play84:17

explanations for things. Right. Deduction is really what we're

play84:21

talking about. Or induction, right? Things like that.

play84:24

That's mostly what we're talking about.

play84:25

But and so we say inductive reasoning and deductive reasoning.

play84:28

And I know that that's sort of popular language.

play84:31

But I've sort of fixated on this perhaps idiosyncratic use of the

play84:34

word reasoning that has sort of informed how I think about a lot

play84:38

of the stuff in the last couple of years. Interesting. Yeah.

play84:41

So, so, so, um, there is a school of thought, as you say,

play84:43

that reasoning is deduction, induction and abduction.

play84:46

Neural networks definitely do not do deduction.

play84:49

I mean, if they do do it, it's only because they've

play84:51

memorized instances of it. Um, they possibly do do a form of

play84:55

induction and they definitely do not do any abduction, which is,

play84:58

you know, like generating reasonable explanations for things.

play85:01

Or if they do do, it's only because they've

play85:02

memorized human instances of it. Does it matter if because this

play85:07

is kind of like, you know, what Bostrom says about, um, intelligence

play85:10

and goals are orthogonal. And what he's saying is that,

play85:13

um, goals are universal. And this is a really common

play85:16

thing in symbolic AI. And, you know, people think that

play85:20

intentionality means that we have goals and these goals are fixed.

play85:25

So the reason you behave the way you are is because you had a goal.

play85:28

And that goal is like a universal thing. It was always there.

play85:31

And that guides your behavior. Another school of thought is

play85:34

that intentionality is as if, which means like you have agency.

play85:38

If, um, you could construct an explanation

play85:42

that explained your behavior about yourself or other people,

play85:44

but is that a good explanation? If it's just a model rather than

play85:48

being what you actually did? Um, it's a really good explanation

play85:53

if you can make that thing formal, right? If you can say, okay.

play86:00

Here is my sort of purely syntactic description, say,

play86:04

of what that reasoning is. And then I can have a formal

play86:06

interpretation of that and say, actually,

play86:08

this did work like that, right? It's certainly not going to be

play86:11

the only explanation. But the point is it is a good one.

play86:15

Right. So that's that's the kind of thing

play86:17

that I find most interesting. And what but what makes a good

play86:21

explanation is, is it just predictability or is

play86:24

it intelligibility or is it something else?

play86:25

I think intelligibility is hugely important.

play86:29

An inscrutable reason is like having no reason at all. It's exactly so.

play86:34

So we intuitively agree that the reason why language models are it's

play86:40

not so much that because there is obviously a reason why they do

play86:45

things, but we don't understand what the reason is, right?

play86:47

And we intuit that the reasoning is not robust because it.

play86:52

First of all, if you just look at the activation

play86:53

pattern of a neural network, it's like you perturb the input by a

play86:57

tiny amount and it's almost like it activates completely differently.

play87:01

So so we think that the reason it does stuff is super, super brittle,

play87:05

whereas we think that the reason we do things is robust and maybe

play87:09

we're just being human chauvinists or whatever, but but we kind of

play87:11

think that that kind of reasoning has an elevated status. Yeah.

play87:16

I mean, I think it I think it does, but I don't think it's elevated

play87:20

in the sense that neural networks can't do it.

play87:22

I think it's that existing architectures don't.

play87:24

So what would we need to do to a neural network to make it more

play87:27

robust in that way? Um, this is in some sense what

play87:30

the paper is about. Right? So the paper is about how you

play87:35

might talk about algebraically structured neural networks.

play87:39

So is this almost like when we deliberately canalize a neural

play87:43

network? We are I mean, obviously this is

play87:46

this is all on a spectrum. But by reducing this kind of chaotic,

play87:50

you know, random trajectories, random activations and so on,

play87:54

by by creating a robust activation pattern, which is perhaps more,

play87:59

if not completely intelligible, then it's going towards reasoning.

play88:03

Yeah, I think so, yeah. So we should do some category theory

play88:07

101, because I'm sure that the, the audience know nothing about

play88:11

category theory on, on average. So in your paper you actually had

play88:14

a really beautiful introduction or a primer to category theory

play88:18

where we kind of like go through the concepts one by one.

play88:20

Can can we do some of that now? Yeah. I'm glad you think so.

play88:24

Um, so one story of where you get to categories is the one that Peter

play88:30

will have told you, you know, 20 minutes ago in this in this video.

play88:34

Right. And he talks about it as okay. So you start with groups and then

play88:39

first thing you do is that you relax the requirement that all

play88:44

transformations are invertible, that they can be undone.

play88:47

And then the next thing that you do is you relax the criterion that all

play88:53

transformations can be composed. And that does indeed get you to

play88:58

a notion of category. Right. Because a category, at least a small

play89:01

one, is a set of objects together with a set of morphisms that have a

play89:05

composability criterion on them, which is at the end of one arrow,

play89:09

has to be the beginning of the next one, and they have a composition

play89:12

law that is associative, meaning if I've got three of them, if I compose

play89:15

the first two and then the third, or if I compose the first one,

play89:18

and then the composition of the of the of the second and the third,

play89:21

I get exactly the same thing. Okay. So there's this associativity axiom.

play89:26

And there's also an identity axiom, right? Yes.

play89:28

There is always an identity endomorphism of every object okay

play89:33

okay. So let's just be clear here. So a category it's like a set of

play89:38

objects and morphisms. Yeah it is a set of objects.

play89:41

And then it is a parameterized family of sets of morphisms.

play89:47

And it's parameterized by pairs of objects one being the source,

play89:51

the other one being the target. Or in different language,

play89:53

the domain and the codomain. Okay. So I mean,

play89:56

let's just sketch out an example so that in the context of like GDL,

play89:59

what would that be? Oh okay. Let's go do something easier.

play90:02

Let's look at the category of sets. Right. Right. So I've got.

play90:07

My objects are sets. My morphisms are functions of sets.

play90:12

Not all functions of sets are composable, because I can't apply

play90:15

a function that takes in input from a set of three things to a

play90:19

set that has two things, right? So that's where the composability

play90:22

criterion comes in okay. Okay. So let's talk about morphing

play90:26

morphisms. How do they work. So they're just a this is this is

play90:29

the magic of category theory is like the sense of how they work is.

play90:33

Well, they compose in an associative way and there's nothing else to them.

play90:36

You've abstracted away any of the specificity of those things being

play90:40

a computation or doing something. They're simply there. Right.

play90:45

And you're reasoning about or here I go using reasoning again.

play90:49

But the point is you're thinking and like doing inference reasoning.

play90:53

Fine. I'll just use, say, reasoning. You're reasoning about how you can

play90:56

compose them and what properties you know, those compositions have. Okay.

play91:00

So um, in the, in the GDL, um, show. Yeah, there were these,

play91:06

these groups and, you know, there could be the group of triangles.

play91:09

Um, but we're also talking about, you know, things like, um,

play91:11

so three or things like, you know, ones that preserve rotations and

play91:16

angles or there is ones that preserve like translational

play91:20

equivariance and so on. So um, and these groups needed to

play91:24

have um, like four axioms that, that, that they, um, uh, you know,

play91:28

conformed to. Yeah. Right. So groups are sets together with

play91:33

a binary operation and a specified element such that,

play91:39

you know, that specified element, the unit of the group operation acts

play91:44

as the identity transformation. If you put it in the first argument

play91:46

or in the second argument and that that multiplication is associative.

play91:51

Okay. So so these morphisms, it's a

play91:54

kind of similar concept. Right. So it's a transformation between

play91:57

two things. But there has to be some kind of

play91:59

structure preserving characteristic. Well. This is.

play92:02

The funny thing is, what you're doing is you're abstracting away

play92:05

the structure that it preserves. And simply talking about what

play92:08

kinds of things can you do with structure preserving maps?

play92:11

It's like, I don't know, but I can compose them.

play92:14

And so the idea is, hey, let's just talk about what we can know about

play92:19

mathematics, just in terms of the notion of composable morphisms,

play92:23

of composable functions. We don't look inside the

play92:26

functions anymore. We specifically abdicate that

play92:29

position. We say, okay,

play92:30

I'm going to reason about how I can compose stuff and nothing

play92:32

else and see how far I can get. And what's absolutely demented is

play92:37

how far you can get with just that. Right?

play92:40

You might think that, no, I really need to see what's

play92:42

inside these things. It's like for a lot of properties of

play92:44

mathematical structures you don't write is if you abstract that away,

play92:48

suddenly the same claims can be made and proved in various

play92:51

different contexts. So, I mean, so just to really

play92:54

spell it out, when you say you can compose these morphisms together

play92:57

because you were talking a little bit before about it almost being,

play93:00

um, an interface, a bit of Legos, a good example, you know, so,

play93:03

so there's, there's, um, a square peg, there's a square hole.

play93:07

These two things go together. So for them to be composable,

play93:10

it just means that if there's a compatibility between them,

play93:12

you can kind of, you know, plug them together in any configuration you

play93:15

want or not any configuration. Right. There's usually only compatible.

play93:18

Yeah, exactly. There's,

play93:18

there's like the way that the ways that you are allowed to write.

play93:21

So suppose I have one function f. It goes from a set x to a set y.

play93:25

And I've got another function g or yeah that goes from y to z. Right.

play93:30

I'm allowed to compose first f, then g.

play93:33

I'm not allowed to compose first g then f. Um okay.

play93:37

So so we've done, we've done the basic category thing.

play93:40

Um, let's go to the next level of abstraction.

play93:42

Let's talk about things like, you know, um, uh, you know,

play93:45

functors and endofunctors. Okay. So what is a functor. Right.

play93:49

So let's first we've said a category is, you know, the class

play93:53

or a collection or whatever. Usually it's a set,

play93:55

a set of objects together with this doubly parameterized family of

play93:59

morphisms that you can compose well. There's that's a mathematical

play94:04

structure in and of itself. So I can ask what are the structure

play94:07

preserving maps between categories. These are the things that are called

play94:09

functors. What does a functor do? A functor comprises an assignment of

play94:15

the objects of the first category to the objects of the second category,

play94:18

and the morphisms of the first category to the morphisms of

play94:20

category in a way that preserves composition. Okay.

play94:25

So again, let's bring this to life, because we were talking earlier

play94:27

with Keith about, um, you could, for example, transform a list into

play94:32

a tree or something like that. So it seems like the objects are

play94:35

very different, but you're still transforming between

play94:39

them in a way that, you know, maintains these properties.

play94:42

Yeah. So lists and trees. So let's talk about, you know, lists

play94:48

valued in a particular set. Right. So what do I mean by lists

play94:51

valued in a particular set. I can think about the monoid of

play94:54

lists valued in a set X. I can also talk about binary

play94:58

trees valued in a set X. Right. And it's still a set function

play95:02

between them. But it might preserve some portion

play95:06

of the of the monoid structure of lists and of the algebra for

play95:11

a monad structure of trees. Um, so let's let's move up one

play95:15

level of abstraction. So let's go to two categories.

play95:18

How does that work okay. So. We've talked about categories,

play95:23

and now we've introduced this notion of functor as structure preserving

play95:26

maps between between categories. Natural transformations suddenly

play95:30

pop up. What the hell are they? Well, first off,

play95:33

they're essentially the reason why category theory is invented, right?

play95:36

This is sort of like, you know, I think if you ask McLean and the

play95:38

other guy whose name for some reason I can't remember Eilenberg excuse me.

play95:42

If you ask Sammy Eilenberg and Saunders MacLane like,

play95:45

why did you invent category theory? The answer is, well,

play95:47

we needed enough to explain what natural transformations were.

play95:50

Right? And so what happens? So suppose I have two functors.

play95:53

I've got some category C over here and I've got some category

play95:56

D over here. What a natural transformation is is a

play96:00

compatible family of deformations. It's a deformation from one

play96:04

functor into another one. Right. So if I've got C over here and D

play96:07

over here right. And I've got one functor F and

play96:11

I've got another functor D, a natural transformation is a two

play96:15

dimensional morphism that goes from one functor to the other one.

play96:18

Data wise what does it comprise? It is for every object of the

play96:23

source category. It is a morphism in the target

play96:27

category that goes from the image of that object under the first

play96:30

functor to the image of the object under the second functor,

play96:34

and it satisfies this criterion called Naturality, which is that

play96:38

this deformation is compatible with. The morphisms of C interpreted by the

play96:46

functor F and the functor G, right? So these things called naturality

play96:49

squares, if you, you know, they're sort of, you know, say,

play96:52

suppose I had some function f from A to b in the category C,

play96:57

and I had two functors capital f and capital G. Right.

play97:01

The natural transformation has components which are going to be

play97:03

maps of the form f of a to g of a, and those are supposed to be

play97:08

compatible in the sense that a particular square commutes.

play97:11

And what is that square? Well, you have f of f goes from f of

play97:15

a to f of b, you have g of f goes from g of a to g of B, and then

play97:20

you've got these two vertical things. This one is the component of this

play97:23

natural transformation at the object A this one is the component of the

play97:27

object, the component of the natural transformation at this object.

play97:30

B and the criterion is that this diagram commutes.

play97:33

What does that mean? That means that if I compose this

play97:36

thing, and this thing that is equal as morphisms to the composition

play97:41

of this one and this one, let's talk about monads.

play97:45

So what are monads? Okay. So what monads are is a gorgeous

play97:52

abstraction for things like group actions.

play97:56

Or excuse me, not for things like group actions, but for the

play97:59

notion of group action. Right? So monads are things that have

play98:03

algebras. Let's look at a group action for

play98:07

a moment. So suppose I wanted a group action

play98:10

on an object x. Monads are hard. The kind of thing that, like I used

play98:15

to find them really confusing. And now I still struggle to

play98:17

explain them. But it's the kind of thing that

play98:19

you're like, well, you've accepted this long enough.

play98:22

It's like what's there's this notion. It's like there is no figuring out.

play98:24

There is only getting over. Yes. Yeah, yeah,

play98:27

it's very much that kind of thing. I mean,

play98:29

but part of part of the motivation here is Andrew in particular.

play98:33

Um, his lens of analysis, as I understand it is,

play98:36

is monads that, that that's, that's his main view on this work.

play98:41

I would actually say so it was in writing the paper.

play98:45

Bruno and I came to the DeepMind team with, hey,

play98:49

let's talk more about monads. Right. They were thinking in terms of let's

play98:53

talk about functorial semantics of syntactic categories. Right.

play98:58

And we said, well, fortunately, if you've got a syntactic category

play99:01

and you're looking at all of the things which are models of that

play99:04

syntax, there is a monad whose algebras are exactly the same thing

play99:09

as models of that syntax. Right. What the monad does is it purely

play99:14

semantically bundles up all of the operations. Right.

play99:18

Here's a good example of a monad. List blank. What does list blank do?

play99:23

It takes a set X to the set of all lists valued in x.

play99:28

What is the extra structure of this monad?

play99:30

Because what I've just described is just an Endofunctor.

play99:32

I haven't described the extra structure of a monad.

play99:35

A monad has a unit and it also has a multiplication, right?

play99:38

What is the multiplication of this endofunctor?

play99:41

Well, what if I take list of list of x?

play99:45

So what are the elements of the set? List of list of x?

play99:48

Those are lists of lists whose entries are elements of the set X.

play99:53

What's the multiplication? I can erase the inside parentheses.

play99:59

I can turn any list of lists into a list by forgetting the

play100:02

next nested parentheses. That's the multiplication of the

play100:05

list monad. What is the unit of the list monad?

play100:08

The unit allows you to interpret things of type x as things of type

play100:12

list x, right? How do I do that? I take the element x and I

play100:16

associate to it the list that just has x in it. Right?

play100:21

And the laws of a monad are compatibility between these two

play100:25

notions, between the multiplication and and the unit.

play100:28

What it means to be an algebra for a monad, right, is to have a

play100:33

structure map that goes from, you know, the monad operating on

play100:36

your object down to your object. What this does is this interprets.

play100:41

Like this formally interpreted. Right.

play100:43

So, you know, the monad is this abstract, this purely semantic

play100:46

representation of the syntax. Right. So I think about like if I've got

play100:50

some monad t, I think about what t is, that's all the terms that I'm

play100:55

allowed to write with x and all the syntax that defines the monad t.

play101:00

And then I want to interpret that back end to x because that's

play101:03

what it means to have like the structure I mean blah blah blah.

play101:07

This is tautological. That's what it means to have the

play101:09

structure of an algebra for T, right. What does that mean in the case

play101:12

of say, this list monad? Well, the algebras for the list monad

play101:14

are these things called monoids, where monoids just have a binary

play101:17

operation that has a unit you don't have invertibility.

play101:21

How does that make sense? Okay, let's consider an instance

play101:25

of a monoid with which we are all familiar natural numbers.

play101:30

Okay, so I can take a list of natural numbers, and I can

play101:34

interpret it as a natural number. What's the right way to do that?

play101:38

Well, I add them. Right. This is the structure map of the

play101:43

natural numbers as a as a monoid. What if there isn't a natural way to,

play101:49

you know, let's say turn a list into a single number?

play101:54

I mean, for example, you could multiply them. Like, why add them?

play101:57

That's just one of the structures. There are lots of different ways

play102:00

in which a particular set can be endowed with the extra structure

play102:05

of being an algebra. For lists, there are lots of

play102:09

different monoids on the same set. Oh, I see,

play102:13

I see so so you would have one per operation or whatever. Exactly.

play102:17

That's the point, is each like abstract conception of operation

play102:20

gives rise to a different monad. Yeah.

play102:22

Whose algebras are all of the different structures that can be

play102:26

written in terms of that operation. Okay.

play102:28

But the magic of the theory of the monad is it gets you away from

play102:31

having to write those things down syntactically, and instead you can

play102:34

just look at the two dimensional, two dimensional semantics of this. Yeah.

play102:39

I mean, so it's almost like a kind of, um, templating, um, function.

play102:43

I mean, it's a, it's a pattern language for algebra.

play102:46

Yeah. It's exactly, exactly. And I mean, this this brings me

play102:50

to Haskell, for example. So Haskell has a few, like, um,

play102:52

monads built into it. Yeah. How does it work in Haskell?

play102:55

Oh, okay. So these are some of my favorite

play102:58

monads, right. So I am not a Haskell expert,

play103:00

so hopefully none of the FP zealots in the audience will,

play103:05

uh, crucify me in the comments for getting this wrong.

play103:08

But so in Haskell, a lot of the time, what do you do?

play103:12

You start with an endofunctor F yeah, right.

play103:15

And then you want to talk about the monad generated by F, right.

play103:20

The free monad on F. Well how do you do this.

play103:23

It's like well you sort of like. Infinitely compose F with itself,

play103:27

right? Why does that give you a monad?

play103:30

It's like well, because the. You know, take omega,

play103:34

i.e. the size of the natural, like the the infinity which is

play103:37

the size of the natural numbers. Do f omega times and then do it

play103:42

one more time. That's the same thing as just

play103:44

doing f omega times. So that means f to the omega

play103:47

composed with F to the Omega is just F to the omega. Right.

play103:51

So that's the multiplication of a monad.

play103:53

So you can go from the structure of an endofunctor to the structure of a

play103:56

monad in this relatively elementary way by believing in infinities.

play104:00

Yeah, yeah. So? So what else? What else do we need to cover on

play104:04

category three for the purposes of our paper? Right.

play104:07

What do you really need to know? Right.

play104:09

That's the question you've asked. You need to know what a category is.

play104:12

You need to know what a functor is. You need to know what a natural

play104:14

transformation is. You need to know what an endo

play104:17

functor is, an endofunctor. That's one of the easiest ones,

play104:19

right? An Endofunctor is just a functor

play104:22

from one category back to itself. You need to know what a monad is.

play104:25

A monad is not is an endofunctor, but it's also more.

play104:29

It's an Endofunctor together with some extra data.

play104:31

That extra data is a natural transformation from.

play104:38

The composition of that endofunctor with itself down to

play104:41

just one copy of that endofunctor. And it's a natural transformation

play104:46

from the identity endofunctor into that Endofunctor that

play104:50

satisfies some properties. Right. And these are the axioms of a monad.

play104:53

Okay. And why do we need endofunctors?

play104:55

I mean, it might seem like a naive question, but why do we need to

play104:59

have a way of transforming something back to itself?

play105:01

This is what constructors of a type are. Right. So how do I build lists?

play105:08

I have a notion of what the empty list is, and I have a notion of how

play105:10

to append something to it. Yeah. So the endofunctor that begets lists

play105:15

one plus blank. Like the one. So. So if I've got a map from one

play105:23

Co-product x down to x, well, what am I doing?

play105:26

I'm picking out a special element in the set X and I'm picking out.

play105:31

An automobile, not an automorphism. An endomorphism of the set X I'm

play105:35

picking out zero and I'm picking out successor.

play105:38

So I said that was lists is really like making them like interpret

play105:41

the natural numbers. Yeah. Right. But it's that kind of thing.

play105:43

So the point is the endofunctors are interpretations of abstract terms.

play105:48

Like. So that's exactly why, like,

play105:50

monads are particularly nicely structured. Yeah.

play105:53

You know, notions of abstract terms that where the algebras

play105:57

for monads interpret those terms into the original object.

play106:00

And where can people at home learn about this in more detail?

play106:04

Um, I don't know, where the hell did I learn about monads?

play106:08

Uh, I think I learned about monads in honestly, I learned about

play106:13

monads from watching people give a bunch of talks about monads and

play106:16

eventually believing in them. If you want to actually read

play106:19

something about it, you know. Sure. Probably the best thing to do is

play106:23

go on YouTube and look up monad and find, you know,

play106:26

find someone not from the FP world to tell you what a monad is.

play106:29

I'm sure like some of the, you know, the casters videos will do a great

play106:32

job of telling you what that is. Um, likewise, I'm sure there's some

play106:36

of like the Phong and Spivak videos from their MIT course years ago.

play106:40

That'll tell you what it is. Uh, it's I'm sure it's covered in

play106:44

seven sketches, and it's definitely covered in all of the sort of pure

play106:49

math oriented texts about, uh, category theory, like, you know,

play106:53

the classic, you know, categories for the working mathematician, um,

play106:58

or Steve Out.is book, which I learned a lot of category theory from.

play107:01

I finally understood the Yoneda lemma by reading that book a

play107:04

couple of times. Um. So it's going to be it's going

play107:09

to be in lots of places, essentially just just pick one

play107:13

and fight with it long enough and eventually you'll get it. Yes.

play107:16

Now, you mentioned Livia theory earlier. Yes. Okay.

play107:20

Formally, a Livia theory is a small category which is complete under

play107:26

the formation of finite products. Okay, that's not a terribly useful,

play107:32

um, description of it. Right? It's the kind of it's.

play107:35

Excuse me to say that it's not a terribly useful description of

play107:37

it is completely wrong, but it's the kind of.

play107:42

You know, there's like the description of things.

play107:43

That's exactly right. Once you know what it is. Yes.

play107:46

But if you're going to. But it's not the right way to

play107:48

learn about it, what is. So why do you want it to be

play107:50

closed under finite products? Well, because finite products

play107:54

are exactly what allow you to phrase functions that are going

play107:59

from tuples of things to other tuples of things. Right?

play108:02

So say I want to encode the laws of a monoid.

play108:06

Well, there's a binary operation for a monoid.

play108:08

So to encode that I need a map from x times x down to x. Right.

play108:13

So what a linear theory is right? Say the linear theory.

play108:16

I'm not going to do it specifically because I'm fucking

play108:18

up if I try and do this right now. But what a linear theory is, is.

play108:23

It is a category which encodes. The compatibilities and

play108:31

relationships between a bunch of n to m array operations. Right.

play108:37

So it is like it is a syntactic category that has no structure except

play108:43

specified relationships between a bunch of different end to end area

play108:47

operations. Mhm. Poon is an honor. Yeah. Thank you so much.

play108:52

Yeah, that feels pretty good.

Rate This

5.0 / 5 (0 votes)

Related Tags
深度学习神经网络代数理论范畴论人工智能技术创新学术讨论科技前沿模型优化计算理论
Do you need a summary in English?