# WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...

### Summary

TLDR本次对话深入探讨了深度学习和人工智能领域中的抽象概念，特别是如何利用范畴论来改进神经网络的结构和理解。嘉宾们讨论了从具体的算法实现到抽象的数学结构的转变，以及这种转变如何帮助我们更好地推理和理解复杂系统。范畴论提供了一种强大的工具，使我们能够超越具体的编程语言和计算范式，进入一个更高层次的思考和理论构建。

### Takeaways

- 🤖 讨论了如何通过抽象和理论框架来提高深度学习模型的可解释性和效率。
- 🧠 强调了神经网络与数据和算法之间的关系，以及如何通过类别理论来建立这种联系。
- 🔍 探讨了如何使用类别理论来设计特定领域的编程语言，以改善机器学习系统的构建和理解。
- 📈 讨论了如何通过类别理论来处理神经网络中的数据表示和算法结构。
- 🔧 强调了抽象的重要性，以及如何通过抽象来简化和理解复杂系统。
- 🧬 通过类比生物进化中的“canalization”来解释抽象的概念和其在数学和深度学习中的应用。
- 🌐 讨论了如何将复杂的计算问题转化为更易于管理和推理的抽象表示。
- 📊 强调了类别理论在解决数学问题中的实用性，尤其是在处理结构和关系方面。
- 🔑 讨论了如何使用类别理论来识别和利用数学结构中的模式和对偶性。
- 🎯 强调了目标导向和基于代理的智能之间的区别，以及如何通过类别理论来理解和构建智能系统。
- 💡 通过类别理论的视角，探讨了如何构建更高效、更可解释的神经网络架构。

### Q & A

### 乔治离开特斯拉的原因是什么？

-乔治离开特斯拉是因为他不认同特斯拉提出的解决自动驾驶问题的方法，他认为特斯拉试图通过更多的数据和计算来解决所谓的'旧金山问题'，并不是正确的方法。

### 什么是'旧金山问题'？

-'旧金山问题'是指特斯拉自动驾驶系统在旧金山特定的复杂交通环境中的表现问题，需要特别的解决方案来优化。

### 乔治对于自动驾驶系统的看法有何不同？

-乔治认为特斯拉的方法，即通过大量数据和计算来训练神经网络，类似于试图记忆无限的东西，他认为这并不是一个可行的解决方案。

### 特斯拉是如何尝试学习世界的？

-特斯拉通过大量的训练数据，尝试构建一个能够记住世界上所有可能的感知状态，并在所有情况下产生正确感知信号的函数。

### 什么是'Operation Vacation'？

-'Operation Vacation'是特斯拉内部的一个项目，目的是让自动驾驶系统的功能收敛并记住无限，也就是达到一个所谓的奇点时刻。

### 保罗对于神经网络的看法是什么？

-保罗认为神经网络是有限的机器，它们在有限的计算和内存约束下运行。虽然理论上可能构建一个能够记忆无限的神经网络，但在特斯拉车辆的计算限制下，这是不可能实现的。

### 保罗加入的公司的主要目标是什么？

-保罗加入的公司的主要目标是进行研究导向的创业，证明可以在没有规模扩大的情况下做很多事情，挑战传统的以规模为基础的解决方案。

### 什么是几何深度学习（GDL）？

-几何深度学习（GDL）是一种深度学习方法，它试图超越传统的深度学习模型，通过考虑数据的几何结构来提高学习效率和性能。

### 什么是范畴论（Category Theory）？

-范畴论是一种数学理论，它研究不同数学结构之间的共同性和映射关系。范畴论提供了一种抽象的语言和工具，可以用来描述和理解复杂的数学对象和它们之间的关系。

### 为什么范畴论对于计算机科学和软件工程很重要？

-范畴论为计算机科学和软件工程提供了一种强大的抽象工具，可以帮助理解和设计复杂的系统。通过范畴论，我们可以更好地理解数据类型、函数组合和算法结构，从而提高软件的可靠性和效率。

### Outlines

### 🤖 人工智能与自动驾驶的思考

讨论了人工智能在自动驾驶领域的应用，特别是特斯拉的自动驾驶系统。提到了乔治对于特斯拉处理旧金山自动驾驶问题的方法的质疑，他认为特斯拉的方法是通过大量数据和计算来让系统记住世界，而不是找到正确的解决方案。乔治认为这种方法不可行，并提出了反对规模化的观点。

### 📈 深度学习和几何深度学习的局限性

探讨了深度学习，特别是几何深度学习的局限性。讨论了DeepMind团队试图通过放弃可逆性和可组合性的限制来推广几何深度学习。同时，讨论了神经网络作为对象和参数化映射的实例，以及如何将它们与代数结构和范畴理论联系起来。

### 🔢 类型理论与范畴理论的结合

讨论了类型理论与范畴理论的结合，以及如何使用这些理论来更好地理解软件和算法。提出了通过范畴理论来设计特定领域的编程语言，以便更好地理解和推理复杂系统。同时，讨论了如何使用范畴理论来解决软件工程中的分支和类型问题。

### 💡 类别理论对计算机科学的影响

探讨了范畴理论对计算机科学和软件工程的影响。讨论了范畴理论如何帮助我们设计更好的编程语言和类型系统，以及如何通过抽象和理解复杂结构来提高软件工程的效率。同时，提出了通过范畴理论来解决深度学习和量子计算中的一些挑战。

### 🌐 构建机器学习领域的特定语言

讨论了如何使用范畴理论来构建机器学习领域的特定语言。提出了通过类型驱动的机器学习和深度学习来提高模型的可解释性和效率。同时，讨论了如何通过形式化的方法来理解和验证神经网络的行为。

### 🧠 神经网络与有限状态自动机的比较

讨论了神经网络与有限状态自动机（FSA）和图灵机的比较。指出了神经网络在处理递归和记忆方面的能力限制，并提出了通过范畴理论来设计能够进行更高级别计算的神经网络架构。同时，讨论了如何通过结构化的方法来解决软件工程中的复杂性问题。

### Mindmap

### Keywords

### 💡神经网络

### 💡深度学习

### 💡自动驾驶

### 💡风险投资

### 💡计算约束

### 💡函数

### 💡数据标签

### 💡类别理论

### 💡对称性

### 💡泛型

### Highlights

讨论了深度学习和神经网络的结构化方法，以及如何利用范畴论来改进和理解这些结构。

提到了如何通过范畴论来构建特定领域的语言，以便更好地理解和设计机器学习模型。

探讨了神经网络的局限性，特别是在处理递归和抽象方面的能力。

讨论了如何通过抽象和范畴论来简化和理解复杂的数学结构和算法。

强调了抽象的重要性，以及如何通过抽象来提高问题解决的效率和清晰度。

提到了如何利用范畴论来设计下一代神经网络软件，从而提高软件工程的可解释性和可靠性。

探讨了如何通过范畴论来理解和改进深度学习中的几何先验。

讨论了如何将范畴论应用于神经网络的训练过程中，以提高模型的泛化能力和性能。

强调了在神经网络中实现类似于图灵机的计算能力的可能性和挑战。

提到了如何通过范畴论来构建具有更高级结构和操作的神经网络架构。

探讨了如何利用范畴论来改进和加速机器学习领域的研究和发展。

讨论了如何通过范畴论来理解和解决机器学习中的算法对齐问题。

强调了在设计神经网络时考虑结构和语义的重要性，以及如何通过范畴论来实现这一点。

提到了如何利用范畴论来创建可以解释和验证的神经网络模型，从而提高软件工程的质量。

探讨了如何通过范畴论来理解和改进深度学习中的模块化和组合性。

讨论了如何利用范畴论来设计能够处理无限数据集和复杂任务的神经网络。

### Transcripts

So, Paul, we're, um. Where did you come from?

Are you are you nearby? Tim. Uh, for the last two weeks,

I've been in London. Oh, yeah? Tell Keith the story.

So, um, they raised some seed capital about 18 months ago,

and then they set a date in February to do so. Yeah.

So what the deal is, I think sometime in 2022, George and

then co-founder and his co-founder, Taylor, they were, you know,

George quit Tesla because he thought that the way that they were proposing

to fix autopilot, in particular, to fix the

"San Francisco problem" for autopilot, was just way more data for San

Francisco specifically and just, you know, throwing compute at the

wall. And so it was like this. So George's thought was like,

this can't be the right way to do it. So he was like,

I'm not exactly sure how to do this, but I want to take a bet.

I want to take a position against scale.

It kind of feels like they're trying to memorize infinity. That's correct.

So, um, so would you say that your experience there informed

some of this skepticism? Yeah. I mean, I think if you look at the AI

day presentations that effectively go into deep technical detail about how

Tesla learns the world or Andre's talk at NeurIPS two years ago.

Um, you know, Tesla very much is trying to come up with a function

that memorizes the world, you know, basically take all of

the possible perceptual states that exist in the realm of

driving and produce the correct perceptual signals that result

from that under all circumstances. And they're pouring massive amounts

of training data and making that happen, running, you know, tests in

shadow mode to collect triggers that then feed more labels back to the

labeling team to run that data loop. And, you know, Operation Vacation,

which has been publicly discussed, is effectively that singularity

moment where the function converges and has memorized infinity.

And, you know, as you know, I've kind of talked to you about so far, I do

not believe that that's possible. You know, obviously,

neural networks are finite machines. They operate under finite compute

constraints with finite memory. It may be possible to build a neural

network that could indeed memorize infinity, given an infinite number

of layers and infinite number of, you know, uh, fully connected

neurons, um, infinite compute and memory to execute that model.

But given the compute constraints in a Tesla vehicle, uh, I don't

think that's possible to achieve. And so then he shopped around for

a year or so to try and find a venture capitalist to fund a sort

of research oriented venture to, you know, make a case for you can

do a lot of things without scale. And then I came on to the

company about six months ago, and within about a month and a half,

we were like, okay, we don't know what we'll be doing in six months,

but we need to be selling it to to the investors in six months.

And so we finished the paper on February 1st or really February 2nd

for the North American time zone, because it's, you know,

a February 1st anywhere on Earth kind of thing.

And then a week and a half later, we were meeting with, uh,

the venture capitalists to get money to do this whole thing.

Yeah, quite a whirlwind. I mean, yes, I'm.

I'm just thankful that, uh, you know, George created this,

and it exists because. Because I think, um, I take the

same line conceptually, which is the answer isn't just, you know,

more of the same or the same. You know, we need to, uh, we need to

go back to some cleverness and, uh, and invent some new things and, uh,

to really, to really make, you know, order of magnitude improvements.

You should see this setup. We've got you on a big 55 inch

plasma screen. So it's as if you're in the room

with us. I'm sorry about that. Like, you know, that's too much.

Dagger is what you're saying. I would have shaved or, like,

you know, um, like, brush my hair or something.

Well, you did, you did pat it down in the back. The headset. Yeah.

It's like this headset hair you get, you know. Oh, yeah, a headset.

Uh, so I'm actually going. I'm worried about that.

Like, what's the compression going to do to my ear? Cartilage.

Like over time, all these unknown, unknown cybernetic effects.

A very soft version of cauliflower ear for podcasters. Right? Right.

I guess something you could be proud of, right?

Just like the MMA guys are proud of the cauliflower ear.

I've had this headset on for so many hours.

10,000 hours of Skyrim alone, I have achieved true mastery.

Yes. Okay, well we're in. We are, we're in.

So, um, where do we start? I mean, maybe, um, Paul,

because Keith hasn't watched the recording of the other day.

So maybe if you just frame up what's going on here.

And in particular, I think we want to have the Turing

machine discussion, um, with you, with, with with Keith on the call.

Um, so, yeah, I mean, just just frame up the, the the, you know,

what's this all about? Okay. What the hell is this all about?

Um, depends on which side of the collaboration that results in

the paper you are on. If you come from the DeepMind side

on this, you are looking for a generation generalization, excuse me,

of geometric deep learning, right. There are some obvious limitations

to geometric deep learning, which is that the kind of

transformations with which you can, or relative to which you can ask for

a neural network to be equivariant or invertible and always composable,

and so on their side, they're thinking, okay, we need to generalize

away from invertibility first, and then generalize away from the

presumption that all transformations can be composed, right?

Because not all computations can be composed.

So if you want to use something like GDL to align neural network

architecture to algorithmic challenges, you need to figure out a

way such that you can structure. You can align it to these not

necessarily invertible, not necessarily composable steps

of computation. Right. If you come from it,

from where I come from, it, it is oh, I'm interested in

neural networks as a particular. Instance of morphisms in a two

category of objects, parametric maps and reparametrizations.

And I'm interested in the kind of structure that they're interested in.

But thinking about it, as you know, algebras for monads or coalgebras for

co monads, stuff like this. Right. So this sort of like not I'm not

going to say like higher level of abstraction, but much more

sort of two categorical flavor, whereas they're really thinking in

terms of sort of representation theory and a sort of generalized

version of representation theory. Bruno and I, you know,

Bruno first. Right? I learned what I know about

neural networks through reading Bruno's thesis.

And that collaboration has been really instructive.

But, you know, we've really been thinking about this as, okay,

there's a two categorical story here. We want to do two categorical

universal algebra. And this gets even a gets you

way more abstract, which makes it more fun to think about.

But it also makes a lot of things tractable that are otherwise far

too complicated to reason about. I think just just as a very tiny

kind of kind of question. I think it'd be good.

I think most of our, our listeners can, can definitely understand that

some computations are not invertible. Yeah. You know that that that's easy.

I mean, you know, you you add two numbers together.

It's like you don't know what the original two numbers are.

They could be a bunch of possibilities. It's not invertible.

Um, but can you give us an example and some intuition behind

computations that are not composable? Oh, yeah.

So this is this is sort of like what in some sense this is what

type is for. Right. Consider some like computation that

requires that the input be a tree cannot be at least not naively

applied to something that is lists. Right?

You need to do a conversion between lists and trees before

you can apply a computation that takes in trees to it, right?

So that would be an example of a non non composable computations. Right.

Stuff where it was like well the types of the input and output

simply don't line up. Well. Yeah. And isn't isn't another example.

And correct me if I'm wrong, but I think another example is

something like like branching is not at least unless you're in a

non-deterministic paradigm. Like branching is not composable,

right? Like if I have an if statement

that kind of says, you know, if it's one thing, do this,

if it's something else, do that, then that that poses some issues

for composition, for function composition. Right?

I wouldn't say that. Uh, the reason I wouldn't say that is

because the function if this do this, if this, do this right,

I can still compose stuff afterwards. It depends on if my if do if this has

a different type, say then if this do that then you might not be able to

compose like stuff together naively. But I wouldn't say that

conditionals and branching are really what the issue is.

It really has to do with input and output types. Right.

Well, I think I think the concern there, it is a type one

kind of at least from. And let me first state up front.

Uh, I'm super interested in category theory. I know very little about it.

I only have, you know, I've tried to do some tutorials online,

and so a lot of my knowledge is pretty far out of date.

But I think like the issue with composability of branching at least

like say in the Haskell, you know, kind of functional programing

world is that, um, if even if both the results are of type T, the,

the outcome of that branch is, is possibly a union of it's, it's,

it's an optional or a maybe of those of those types like it may

be one value or another value. Yeah. So it kind of creates this.

Is that right. Well I mean so but the thing is like

maybe something is still just one type. Right. So I really. Right.

But it's not t but it's not type T it's a no. Exactly.

You have to you would have higher order type. Yeah.

Yeah it's a it's another type. Exactly.

And so you can't simply plug maybe T into like you can't simply apply

something that requires, you know, it takes input of type T to something

whose output was maybe t. Right. You need to do like, oh,

I have to check is like if is it, is this just t or is this like

you know else or whatever? I can't remember what the Haskell

syntax is right for the exception. So that's the problem with branches

is as soon as you throw in branches into code, you start to you start

to change the type into these, you know, composite. Okay.

So I'll yeah, in that sense I will I will buy that in that composition.

You have to be much more careful about type.

So I think it's still a story about input and output types and

which things you can compose. But the problem is naive composition.

If you start doing branching of things, you realize, oh,

I've actually changed what type I'm dealing with. Yeah.

And so just just back to my own like personal story a second,

because I have a question like I'd like to ask you, which is,

uh, a long time ago, you know, I learned about the power of algebra.

So when I first took an abstract algebra course, I was like, wow,

this is really cool. You know, algebra is so much more

right than I thought it was. Um, then sometime later,

I found out just the the real beauty of of type theory.

And this was all within the computer science perspective.

So algebraic data type theory types, you know, I just I fell in love

right with types. Yeah. Okay. And so then along comes somehow I

hear about category theory okay. And I'm like, oh yeah,

this seems to be you know, if I go learn some about category

theory, it's going to help me understand software better.

And, you know, algebraic type theory and everything.

But what I sort of quickly found and again, this could be quite naive.

And this is why I want to ask you is that.

It seems to me like type theory is, you know, it's it's this higher

order thing where, for example, you might ask, you might look for

similarities between algebras. You know, like here's the

algebra of Boolean algebra, here's the algebra of partial orders.

And, and they're actually maybe the same algebra or they have a

lot in common. And so there's this beautiful

sort of high order. Um, uh, pattern finding and

relationships that, that allows you, you might be able to prove

something in one algebra and it gives you the tools to map it,

like into a different algebra. But I found like it didn't really

have a lot of relevance to me just for my everyday software engineering.

And so I started to kind of lose, you know, the interest in it,

like tell me where I'm going wrong. Like tell me, you know,

in what ways category theory can can help someone who's more of a

practitioner, a computer scientist, a software engineer, you know,

really, um, like, understand better what they're doing or,

or apply new insights, you know, to the kind of work that we would do.

So unfortunately, I am not the best person to answer that question,

but I'll give it a good shot anyway. Right.

You know, so I come to this whole story from pure mathematics,

and I discover type systems in the context of homotopy type theory,

where it's like what there's a programing language that

corresponds to space. Okay. Right. So my, my my introduction to

this stuff, like I come to like I actually like, you know,

way back in high school or whatever I learned scheme and stuff like that.

But that was, you know, longer ago than I would like to

admit and had that hadn't been, like terribly relevant to my

intellectual life in recent years. But, um, so I would say the problem

with trying to justify category theory in terms of functional

programing alone is that. You've already learned most of the

category theory that's going to be directly relevant by doing the

functional programing itself, right? The point is, you don't get

Haskell without category theory. But once you learned Haskell,

you've learned a lot about a specific category, and that's the

only one in which you're actually doing all that much work.

So the abstractions of category theory don't necessarily give you

all of that all that much. Right? That's sort of my naive

perspective on like, you know, it's like, oh, category.

You know, I remember the first day I came into the symbolic office when we

were next door to Khosla Ventures. And I come in and, you know,

like one person's there, you know, and they're like, oh,

you're the new category theorist. Uh, I hear that's useless.

And, you know, my retort is like, you like functional programing,

right? And they're like, yeah. And I'm like,

that wouldn't exist in its current formulation without category theory.

It's like, yeah, but I know it through all of these other means.

It's like, yes, you've you've grokked the essence of the category theory

that is applied to functional programing most of the time already.

I do think that once you get into things like dependent types and sort

of more sophisticated type systems, the category theory side,

you need more sophistication to make really good sense of it.

You can have sort of a pretty good, um, elementary in the sense of like,

I think about elements or I think about term's way of thinking about

things like dependent type systems. But, um, I found, you know,

of course, this is how I came to this stuff.

So of course I think this is like the right perspective.

But I think that category theory gives you permission to have a

sort of more geometric interpretation of this stuff that

makes it easier to understand and easier to reason about. Right.

You know, one thing is like the real strength of category theory

is not like working in any particular category, right?

The real strength of category theory is, hey, this gives me this

principled language in theory, for abstracting things such that I

can remove all the extraneous detail from a particular problem and study

just the bare bones of that thing. Get an excellent description, an

excellent understanding, and find it in lots of different places. Right.

So it's the idea, it's sort of, you know, like I want a good library.

I do it once and then I apply it in lots of different contexts.

And I'm absolutely not. One of the category theory is,

is useless people. I think I'm in the category of I

think there's probably a lot of really useful stuff in there.

I just either I don't like I definitely don't understand it and

or it hasn't been discovered yet. Um, you know, for,

for my particular context. But let me give you an example of

where, yeah, my intuition tells me there's probably some really useful

insights from category theory to be explored. So, um, long ago. Okay.

Uh, Alexander Stepanov, a really brilliant, you know, uh,

computer scientist, was trying to, to do things within type systems

that were just not possible in the current languages.

So one thing he wanted to be able to do, for example, was to have a type

system rich enough that it could express properties of an algorithm.

So maybe you have a bunch of different Sword algorithms in

your library. And he wanted the kind of complexity

constraints of those Sword algorithms to be part of the type system. Yes.

So that somebody writing a generic piece of code, you know,

calling a sort algorithm, and there were certain properties

required there that the type system could make the best decision,

like it could say, well, in this particular case,

you should use this Sword algorithm. And it was able to compute all this

like from the type system. Yeah. Um, he eventually discovered C plus

plus which just by accident. Okay. Just by accident,

it had a template metaprogramming capability that wasn't by by design.

It just kind of accidentally ended up being a Turing complete, you know,

type system or whatever, where he was able to implement some of his ideas

with really ugly code, by the way. So I'm not justifying, you know,

the quality of it. It just he just was able to and

he was able to do, you know, lots of brilliant things which

led to like ideas like, you know, traits and categories and their

sense like in the kind of, and, you know, algorithms with,

with very rich type systems, template metaprogramming.

I feel like we've hacked our way to that, like by accident,

kind of from the, from the computer science side,

from the programing side of things. And the category theory could

offer a new lens to view that, um, and provide insights for designing

like the next generation of, of languages and, you know,

advanced type systems. Um, is there any sense to that

or or or I'm off the. No, I think that that's so one of the

things that like, I'm not working on this kind of stuff right now,

but I have spent a lot of time thinking about and is essentially

starting with a particular category and then trying to define a domain

specific language for that, or like an internal language of that,

such that it becomes easy to reason about the stuff in that

category in a way that's perhaps a little bit more intuitive than.

Going towards, like the high structuralism of

purely category theoretic arguments. Right. An example. So I love that.

Let's let's pause here for one second. So I love this idea.

So category theory can help one design domain specific languages.

Yes. No. This has been like become a huge

deal. Like as you know, I only know of

it from like the homotopy type theory community and the kinds of

projects that that has spawned. But as an instance of this,

you know, so there's this paper, I think, by Mike Schulman called

a called A Practical Type theory for symmetric monoidal categories.

And what he does in this paper is say, okay, there are a bunch of type

systems that are interpreted into symmetric monoidal categories, but no

one uses them for proving things about symmetric monoidal categories.

And he asked the question why? And he essentially, you know,

nails down a couple of points that all of the existing type theories

that are interpreted into symmetric monoidal categories fail to satisfy.

And like the one that I found the most compelling is like they

don't have this sets with elements flavor to them. Right.

And what's prerequisite for sets with elements.

And that feeling is, well, you need tuples of terms to be

in tuples of types. And that needs to be symmetric on

both sides of judgment. Right. And so he says, okay,

I'm going to design a new type system that has that property

with my intended semantics in a symmetric monoidal categories.

So the idea is you can design a programing like you say,

you can like you can have like the semantics that you want this thing to

have and then design the programing language to fit that right.

And I think this is an incredibly powerful, uh.

Powerful trick that I do believe it's, you know, it's taking longer

than people thought it would, but that's okay, you know,

because it's hard. But the idea is,

you know what you can select. Kuhn has this line about, you know,

the only reason science works is because you can forget how you

got there and just start from new foundations each time. Right?

It's kind of like a less snarky analog of the, you know, science

proceeds and funerals kind of thing. It's like, you don't want you don't

exactly you don't want to have to recapitulate all of mathematical

history to work with something. And we're finding out that

there's all of these, like, complicated mathematical structures

that we really want to reason about, and we don't want to have to

build them from scratch. Instead, we would like to start with

something like a programing language, or a logic that just deals with all

the complexity of that naturally, by forcing you to be careful

about syntax. Yeah, right. And this is the kind.

And so this is exactly this whole study of like domain specific

languages for categories or better in like in math or language.

This is like internal language hypotheses or internal language

statements like, I want the internal language of this

flavor of category or the internal language of this flavor of category.

And so maybe like so this is you know, you can ask the question is

like, why the hell are we getting to why, why, why now? Right.

There's a there's a good deal of why now, right?

I've just said it's like, hey, we've just had a successful funding

round. We've got $31 million. We're assembling like the world's

largest industrial category theory and machine learning team.

The immediate response to this like, aren't you the only like, uh,

category theory and machine learning team in the world?

It's like, no, there's probably like a couple of them.

There's like Petar's team at DeepMind, that kind of stuff.

So we're not the only people in the world doing this,

but we'll probably over the next like six months, become the biggest.

And the question is like, how the hell do you get money for this?

And like, one of the reasons is, you know, if you look in the history

of computation, there have been people banging the drum of like

purely functional perspectives and category theory as like the

right foundations for computing. But you haven't needed it before,

right? You've been able to hack your

way with other paradigms. But deep learning and quantum

both pose problems to the existing paradigms.

It's just like it suddenly becomes too hard to make progress without

categorically informed treatments of those computational substrates.

Right. And so let me let me ask you about

this because I and and and open I haven't read the paper yet.

Um, so apologies for that. But I think if I'm going to take

a wild stab, based on what you've everything you've said so far about

where you might be going from this perspective, is it the case that,

um, category theory will inform? The construction of domain specific

languages for machine learning. That is definitely one outcome

that I foresee. Okay. And so and so and my kind of

let's say, uh, lay more lay understanding of that.

It would be like, say, as a practitioner, I would now have a

language that I could use to maybe even, uh, you know, define, um,

modules, machine learning modules or constructs compositions of,

you know, neural networks in a way that I can actually reason about

as a human being, because there will be a new set of concepts,

a new language that I can program in, if you will, which will ensure the

semantics and structure that that I need to to do something useful. Yes.

No. Right. That is exactly right. One of the outcomes of this

research program program, we hope, will be something we

can call like type driven ML or type driven deep learning. Yeah.

I mean on on that we had a very interesting discussion just

before we hit the record button. And there's a real problem now

with ML engineering, but also just large scale ML systems like

Facebook and its inscrutability. And this is this is a real problem,

right. Because we tend to just blame

the system. And it it gives gainful employment to

lots of engineers because we lie. We say that these things reason

that they that they operate in a in a rational way and they don't.

And when they go wrong, we just blame the system.

We just say, oh, you know, it's just a complicated neural

network that it's like alchemy both in how we build the systems and

how we explain the failure modes. And what you're talking about is

a future towards not only having a formal way of understanding

and verifying programs, but just being on a more solid

footing for how we build systems. Yeah, no, precisely like this.

Like our aim is very much to refound deep learning in such a

way that you can actually do. Careful reasoning about it, right?

That you can actually do theorem proving about the behavior of

these neural systems. Right. Whereas currently we're hacking

everything together and we, you know, we use benchmarks and

heuristics and all of this stuff, like we can do better. Right.

We know that we can do better with classical programing.

So why can't we do better with neural networks?

You know, lots of people say, oh, you simply can't do that.

Maybe software engineering has to be like neuroscience.

And I think it's like, well, there's definitely going to be

something like that there. But a lot of it is formally

structured so that the key will be to events that which is formally

structured, and pull as much out of that as we can and make that

explicit so that we know the bounds over which this thing is doing.

Some like inscrutable curve fitting, whereas the other in other places

it's doing this very formal stuff. So I mean, that that point is

really interesting that, um, people have said to me that they

think the future of software engineering is like neuroscience.

And what I mean by that is we don't know how the brain works.

We stick probes in and we try and kind of like, you know,

figure out what's going on a bit like the blind men and the elephant

by having all of these different views on this inscrutable mountain.

And, um, it's just a little bit weird, isn't it? Right.

So, you know, we're building these multi-agent systems and they

have all these complex dynamics, and we try to apply engineering

techniques by doing monitoring and alerting and having these, like,

weird thresholds and just fixing problems as, as, as they show up.

And a lot of people just say, well, this is the way things are, this is

the way software is going to go. And as I understand software

engineering, because Keith and I have have written a whole bunch of complex

software recently, and part of the reason for having abstraction is

to create a cognitive interface. So it's not necessarily about

how the software runs because it gets compiled into like,

you know, into low level code, native code on, on the machine.

But it's about creating an abstraction to help us understand

and communicate what's going on and also do like, you know,

forms of verification later. Yeah. Well, and that abstraction is a

domain specific language. And so every, every programmer,

when they're, you know, when they're creating whatever, just, you know.

A bunch of functions, a bunch of modules,

a bunch of classes, like, you know, whether you know it or not, you're

designing a domain specific language. And and we kind of just do it

intuitively without much like, sort of, uh, math helping us.

And I think the point is, like, if I understand both of you correctly,

machine learning and quantum, you know, quantum computing, it's just

beyond our capability to do that. Like, you know,

kind of just intuitively, we need we need math like it's

it's it's just too hard to hack it together anymore. Right? Right. Yeah.

Yeah. So I think that's beautiful. I mean, and I, uh, in fact,

there's a, there's a very old, um, programing book from like the 1970s.

Um, I think it's called like programing languages or designing

programing languages or something, which tries to make this point too.

And it makes it makes two interesting points.

It says, you know, when you're writing a program,

whether or not you know it, you're creating a domain specific language.

And so knowing something about programing, language design is

helpful even to just writing code. So you said there were two or at

least one thing you hope comes out of this project.

Um, Paul is, is, uh, um, machinery to create domain specific

languages for, for machine learning. What are the other things that

you hope comes out of it? The other things that I hope

come out of it are. Completely novel architectures. Yeah.

So completely novel novel architectures that can do kinds

of things that we currently don't know how to do. Right.

So one thing that you might want to do is like, well, you know,

so I'm in industry now. I'm not an academic anymore. Right?

Suddenly I have to figure out how to eventually generate revenue

with this foundational research. Say, what's one of the like?

So consider Microsoft for example. Right.

And this is, you know, a story that we've been we've been telling for

some time is consider Microsoft and all of the different places that

they are deploying AI systems or LMS specifically throughout their whole,

you know, panoply of offerings. The only one that's actually making

any money is copilot. Right. But copilot is not actually very

good, right? LMS are not actually very good

at generating code. Right. You know, they'll they'll like

regurgitate some snippets, but like a staggering portion of

the time they're wrong. The fact that we're used to this

is because like when humans generate snippets,

snippets of code for a staggering proportion of time, they're wrong.

So we're not offended by this, right? But so if you could make something

better such that if you say, have an LM that you can interact

with in natural language, but that actually emulates a type system.

Right, you will end up with an LM capable

of doing higher order processing, higher order organization.

You'll note that I'm intentionally shying away from using the word

reasoning, because I sort of take exception to this language.

I understand it to be the lingua franca of the practice,

but I don't really like the language of reasoning, but I'll accept it.

Exception to it as well. Are we? Very much do.

But can we can we just linger on this?

But this is a fascinating point. Now, um,

there are a lot of people out there drinking the Kool-Aid right now.

So there was the Codex model, which is what is built into VS code,

and that's garbage. And GPT four turbo is pretty

damn good. Anthropic opus is pretty damn good.

There's this thing called Devin, which came out recently,

and people are saying, oh my God, there's no limit.

You know, we're going to start, um, getting this thing to generate code,

and then we're going to fine tune the base model on the

outputs of the code. And it's going to be a bit like

evolution. It's just going to Complexify

and Complexify. And we're going to memorize all

of the types of reasoning and all of the types of things we're

ever going to see. And I think that a lot of the

people who are drinking the Kool-Aid about this just don't

do serious software engineering. And that's because, like,

for example, I was playing with opus three last night.

I got it to generate an animation of a bunch of mathematical, you know,

graphics, and it worked really well. I thought, bloody hell,

it just worked. First time I then said, um, you know,

can you modify it a bit and can you add this thing very quickly?

It diverged and it just it turned into garbage and it was useless.

And that's because it's almost like we're wrestling on the

boundary of chaos. Right? So the complexity of software

blows up so fast it is possible to do something simple.

But as soon as you, you know, you modify it and you change it

and you change it, you're in no man's land very, very quickly.

And I think part of the reason of having this structured approach,

you know, this abstraction approach is to wrestle with that complexity.

So you're making the argument that if you're doing something like a

large language model does now, but in the abstraction space

rather than the low level syntax space that we do now, then we

can get further than we are now. That is that is precisely the claim.

There's a I think it's a relatively recent paper by I

think the team was led by Tanya. Tanya Ringer or Talia Ringer.

Excuse me. I can't remember which university

it's based out of, but, uh, it's a great paper, and it

demonstrates pretty conclusively that transformers just don't learn

recursion. Right? Yeah. Of course. There you go. And so, like.

Why would you ever think that by scaling an architecture that is

likely to have fundamentally architectural, architecturally,

like by construction limits on the kinds of things it can even

figure out? Yeah, right. Why would you ask a system that,

you know, can't do recursion, which we know to be the essence

of programing, at least in some sense, right.

Why would you presume that that model is ever going to be capable of doing

the kind of software engineering that we actually need to do in the world?

I mean, I think it's almost even worse than that because they they can

kind of mimic recursion to a fixed depth, you know, because they have a

fixed amount of computation. Right. But even worse than that,

the real problem is they can't create abstractions because creating

abstractions is it's abduction. Yeah. And I think they probably do

create abstractions. Right. They've got like latent

representations. But the problem is like the

abstractions are not good. They're not principled.

Oh, but I would argue that we create abstractions and we we embed them

in our language and, and then they get memorized by, by a language

model and they can be reused. But what we do when we design

software, we, we, we solve this intractable problem.

It's very mysterious how this happens.

We select an abstraction from an infinite set, an infinite set.

And then we use that to to represent a concept.

And language models do not do that. No. Yeah.

I mean, on the, uh, the recursion thing is really interesting because

we were having a conversation the other day about, um, so there's,

there's this really exciting technology you're working on.

And as we were just saying, it's a way of building a semantically

principled, um, way to design the future of neural network software.

So you're rather than dealing with these inscrutable neural networks,

we're actually dealing with compositional structure, with clear

semantics from, from a high level. But the problem is,

as I understand it, the the end result is still a neural network.

And a neural networks are finite state automata. Right?

So they they don't they don't do recursion.

And then there's this notion that came up because we were talking about

this over dinner, and there is such a thing as a recursive neural network,

but a recursive neural network is still a finite state automata.

The difference is it can we can modify backprop to recurse over

a structured input. So for example, we can put trees

into these things. Yes. So they can do structural recursion.

Exactly, exactly. So so um we can because a

recursive algorithm has a has a kind of stopping condition or a

base case if you like. So we could put a tree in and then

the algorithm can recurse that tree. And what it's doing every time is

it's just embedding, um, everything. You know, you start with the leaf

node and then you go up and you go up until you get to the top.

And it's it's just incrementally embedding that datum into the

neural network. But the neural network itself is

still doing a fixed amount of computation for that input. Yeah.

So so I guess the question is what what we're really excited about.

Peter's been working on this for years is algorithmically

aligning neural networks. And even GDL more broadly is about

constraining neural networks so that they generalize better within

distribution on, on geometric priors. And but the thing is, they still

only generalize within distribution. So they, they,

they don't act like Turing machines, which is to say they can't they

can't address an expandable memory potential. Infinity.

They can't, you know, they can't just go to Costco and get

some more memory when they need it. They do a fixed amount of

computation per datum. Well, so this is you know,

this particular question is outside of my domain of exact expertise.

So I want to be like careful about saying, you know,

anyone who's like says, oh, this person said something wrong.

It's like, yeah, I'm not claiming to be an expert in this particular

domain, but this sounds like a story. It's like they can do coalgebras,

however, right. And coalgebras are things where,

yeah, you can just like keep popping something off, you know,

ad infinitum. You can just like, query it.

Hey, give me this part of this, give me this part of this.

And there's no stop to how many times I could ask for that, right?

So I can do coalgebraic stuff too. But my sense is that maybe this

is some sort of more definitional confusion than it is really a

statement of the kinds of things that neural architectures can do.

Yes, I think I can. I can probably help clarify it

because for better or worse, we've been having a lot of

debates in our discord, you know, recently about this topic.

And so I think I think maybe we'll test it out here today I've

learned how to communicate this, you know, a little bit better.

Like to try and explain the the pragmatic, you know,

problems with it. Um, and so if we think about,

first of all, just, just to set up something like very clearly, um.

If we all understand what a finite state machine is, just imagine

it as a graph like it can. It sits there and it can get input.

And by the way, it can get an unbounded amount of input.

You know, like for example, a soda machine.

Yeah, you can keep feeding it instructions as long as you want.

You can keep pressing the button and putting in quarters in it,

like, you know, all day long. And, you know, uh, stuff will

continue to happen forever. The point is that it has a fixed

number of states that are defined like it build time when it was

constructed, and it just navigates between these states, you know,

always kind of responding, responding to the input.

That's a finite state automata, a Turing machine is and I say nothing

more than but but by the way, guys, this was a genius.

You know, discovery like back in the day, a Turing machine can be

boiled down to nothing more than a finite state machine that has

an expandable read write memory. Or another way to think about it

is it can have two stacks. It can push a value onto one

stack or another stack. It can pop a value off any one of

those stacks and push it down. So it has this expandable read write

memory that's that's unbounded. Yeah. Um, that machine alone,

that simple machine alone, in fact, is powerful enough to perform all

computations that all what are called effective computations.

So any mechanical thing you can construct, any, you know,

physical thing that you devise that has like a digital nature and kind of

do kind of stuff, you know, is no more powerful than this machine.

And that includes, by the way, quantum computing.

So quantum computers are still just Turing machines.

They can do certain problems faster, but they can't fundamentally

solve any problem that a Turing machine couldn't solve. Okay.

So getting to neural networks like it should be immediately

obvious to anybody that that a neural network has that finite

state machine component. That's what the neural network is.

And now you ask, can I hook that up to read write memory. Right.

And have that that neural network do something useful?

The problem we run into is training, training the neural networks to

do something useful with unbounded read write memory.

Like we don't usually do that. So all recursive neural networks

that are trained today, they have a fixed window ahead of time.

That's known at training time where they do like operations on whatever a

context of 100,000, or they they sit there and operate on, um, you know,

100 windows or something like that. And then you take the answer

that you get at that point, like you've got an output signal.

Whenever people try to train them to do unbounded calculations,

like, for example, maybe I'm going to have an RNN and

one of my whenever one of my bits lights up in the output, then I'm

going to treat that as as halting. And I'm going to take the answer

at that point okay. So I'm going to run the Turing

machine during training. As soon as bit number 26 lights up,

I'm going to treat that as the halting bit.

And then I'm going to take the answer.

That could be a Turing machine. The problem is when they try to

train machines like that, we just can't do it.

Um, and so if you think about the space of all possible programs like.

The space of all possible programs are Turing the Turing machines. Okay.

And we're doing a certain search. So today we've got the capability

to construct all kinds of neural networks and use stochastic

gradient descent or whatever. We're searching that space of all

possible programs, and we're finding one that does something useful.

What we're finding is that that space that it searches is a very small

subspace of all possible programs. And it's really those that are

that are also reachable by finite state automata.

So we're not getting out of that boundary up into context free

grammars, context sensitive grammars, recursively enumerable languages.

So there's this huge other space that we just don't know how to

effectively search efficiently with machine driven techniques yet. Yeah.

I mean, I will confess I could be wrong, but I don't think that that is

I think that's sort of an accident of particular definitions and the

way that we've been structuring neural architectures. Right.

And what I mean by aspects of definitions is like, yeah,

if you pretend that your thing is of like fixed depth, but if you

allow feedback into the system, then suddenly you can get I do think

you could get access to that, that, you know, higher order computation.

Well, I think it's actually just an accident of our of our,

our optimization procedures. It's really it's an accident.

Not so much of like the architecture is, is fine.

Like you said, people do have, you know, RNNs that have feedback

and, and all this sort of thing. But it's an accident of the way

we train them. So when we when we go to train them,

nobody currently trains in the way that I just said.

We're like, when I do a training pass and I go to propagate my gradients,

I never sit there and run the neural net, the RNN until a particular bit

lights up and then take the answer. I run it for a fixed number of

steps and take the answer. Yeah, well, because the problem is if

you if you try to run it until a bit lights up, it may never light up.

Well, no, you can halt. Uh oh. Okay. So I see, see the concern.

Well, so what if you consider that? Hmm. Yeah.

I mean, I don't think this is a real problem.

I think this is a question of misconceptions.

Maybe some that I have or maybe some that are popular in the the ML

community, to which I, you know, I'm like a quasi expert in like,

CT world and, you know, like I'm learning, you know,

using CT to, you know. Speed up my process of, you know,

developing expertise in ML. But I don't think that.

This is a permanent problem, like in the sense of like,

just like something about neural computation makes this impossible.

I don't think that's the case. Right? Right.

I think in particular, you could do this thing where it's like, oh,

if this bit lights up, it stops. I do think that that's possible.

Right. Well, so and in particular,

I mean, and I agree with you like it is possible.

And so there are attempts at this like the differentiable

neural computers. Right. Or some configurations of,

of neural Turing machines. But the it seems like the universe

is conspiring against us. Because the funny thing that happens

is when you generalize those neural networks and try to train them in

this way, it just becomes so much more difficult to train them.

I mean, maybe we should discuss, um, the so what question.

So the the thing that Petr has been driving at is, you know,

this algorithmic alignment of neural networks and the hypothesis

is there are certain fundamental forms of reasoning where you do

require this kind of, um, you know, quantification over potentially

infinite sets, um, just computing the nth digit of pi or just sorting,

um, uh, you know, like a set of numbers to, to arbitrary length.

Now, the the incredible discovery of neural networks is so many

things in the world, you can essentially interpolate within

distribution and it just works. Now I'm a big believer in this

system two idea that, you know, there's there's a lot of gnarly

perception type stuff. And then there's this system to

where we do need to do this kind of symbolic reasoning.

Is there a space where we need to have Turing computation for AI?

Even if you could solve all interesting problems, you know,

to humanity with, with a, with a big enough Turing machine,

you know, like you, you construct one the size of the sun and, you know,

it's made out of neutronium and like, whatever else and like,

it can finally do everything we want. You can do it with a lot less energy,

a lot less matter, you know, if you can, if you can use, uh,

programs that that can utilize some unbounded, you know,

input sensitive amount of memory. So I think like that's again like

it's about this program search thing, which is, um, the best solution

in the sense of like ones that are practical, energy efficient,

whatever many problems, the best solution lies in this broader space

of kind of Turing complete problems. It's certainly been the case for

human programing, like when, you know, people could just

write programs that were fsas, like, we could have done that.

We never needed to develop, you know, von Neumann machines and kind of

like what's sitting on your desk. But we did because they were

extremely, you know, useful for solving real problems.

And and I agree with Paul that there's no theoretical limitation why

you can't train a neural network, you know, to to search the space

of Turing programs. It's just we haven't figured out

how to do that yet. And there's some tricks that

we're missing. And I'm just saying all the current

architectures do not search that space. They search a small subspace.

And there are many interesting problems outside of that space. Yeah.

Can I frame it a slightly different way, which is there.

There are two directions here. One direction is what Francois

Chollet says. And you've just been alluding to

Keith, which is that we build a kind of AI, which is almost

purely a discrete program search. And the the other school of

thought is we do some kind of neurosymbolic combination.

And Josh Tenenbaum has been doing a lot of interesting work here with,

you know, Bayesian program search and Dreamcoder and so on,

and even things like fun Search and Q star and Tree of Thought

and stuff like that is a step in this direction, which is to say,

we start with this interpolative continuous vector space,

which has been trained on a whole bunch of stuff, and then it's

still traversable in that space. So if we do some kind of search

controller on the top, I mean, of course it's not going to be

searching the space of Turing machines, but it will give us a

margin on the top, which will find all sorts of interesting things.

So just doing this Q star algorithm where we're in an LM

and we search a whole bunch of trajectories and we have some kind

of cost function or whatever. And that margin might give us a

dramatic improvement. But I still think that there's

there's a whole space that we're missing that is not traversable

through neural networks. Let me just say as well that what's

happening right now is the real intelligence of of these GPT models

is the collective search process. It's the collective intelligence.

So we're all using it and we are just putting different inputs into it.

So we're a bit like Tree of Thought. We're a bit like Q star.

All of these people all over the world are just finding these

interesting prompts, and they're kind of traversing the the input space,

and they form like a layer of intelligence on top of the LMS.

And I guess the question we're asking is maybe the first step would be

just to replace that human search process with something automatic.

Well, I mean, I'm just going to jump in real quick and then let Paul go,

which is which is a two things. Again, there's no in my mind

there's no theoretical reason why. So again,

a Turing machine is a finite state automata with a read write memory.

There's no reason that finite state automata can't be a neural network.

It's just that we haven't yet figured out how to train neural

networks that function in that way, at least not for general purpose,

efficiently, etc. and I think like the other, as to whether or not

like which direction is ultimately going to be the more successful.

Pragmatically, I don't know. It's a big mystery to me,

but it seems like, you know, the efforts of, you know, Paul and,

you know, and George's company. Is working towards that route,

which is like, give us, give us a domain specific language.

That's like, in a sense, uh, many generations higher than Dreamcoder.

It's like it's like a domain specific language that lets people program,

um, these, these complicated, you know, modules and machine

learning more effectively. And I see no reason why that

wouldn't also empower, um, automated, you know,

programing programing as well. Yeah. Broadly, I like that story. Yeah.

Like I, I like I think I've been confused when this question has come

up because I thought it was the case. I thought it was the claim that,

like, we somehow can't do this and I don't know, like I might be wrong

again, but I don't at my current level of understanding see any

like fundamental obstruction to developing neural networks that are,

you know, Turing complete or whatever the magical language,

magical words are for it. Right? I don't actually see any like,

essential obstruction. We may not have figured out how

to do it yet, but I don't see why that's impossible.

Well, they have to be a neural network with a read write memory.

Yeah. And once once they have that. So they're augmented, you know,

neural networks. And then once we figure out how

to train them. Bikes kind of open season on on

every computable problem. Yeah. I mean, I think we can agree that

it's probably possible in principle and currently not in practice.

And it is a very, very difficult problem.

I mean, we can we can all agree on that. Yeah.

Um, well it's currently it's definitely currently not not,

not nobody's doing it, you know, that I know of currently.

I think it is really difficult. I think there's some magic tricks

that we haven't found yet. And they may lie in this direction

of, you know, starting to apply category theory to, to, to design,

um, domain specific languages that, that have the semantics that we need.

Yeah. I mean, another thing was when I,

when I first read your paper, I assumed that there was some

kind of I guess I visualized it as a kind of compiler.

So now it's, it's a bit like, imagine we had something like

Jax and we did to Jax what TypeScript did to JavaScript.

So, you know, we add typing to it, we add compositional semantics

and so on. And what we've done is we've now

created this whole like interface basically to

understand how to build the next generation of neural networks.

And then the system is a bit like a compiler.

So it will compile down to neural network modules that have,

you know, GDL constraints or algorithmic constraints and so on.

And in the future, maybe even symbolic networks

that it would just it would just automatically do all of this stuff.

Because what we want to get to is we need to get away from the alchemy,

right? We need we need this compiler to kind

of do the hard work for us so that, you know, just all of these problems

that we're seeing at the moment just are a thing of the past.

I have a I have a I have just a quick question because like,

is it is one proper way to think about this, that. We are.

We are pushing a lot of a lot of the work.

We're pushing a lot of the necessary semantic work to syntax where it can,

where it can then be automated. Like that seems to be really

like the story of mathematics, too, as a whole.

It's like creating new syntaxes and new sets of rules,

new domain specific languages that effectively systematize and

allow to be automated. You know, the necessary reasoning.

Yeah, I think that the way that the mathematician might phrase this, and

I think this is what Andrew actually provided a very nice description of,

is, you know, the history of mathematics has taught us that.

Abstracting a lot of the specificity away and making your problem

statement more and more general. Well,

you might think would make it harder, actually makes it easier, right?

Because the point is, if there's a lot going on,

I don't know what to grab on to. But if I reduce the things that

I'm allowed to grab onto, I can think more clearly about them.

And the elegance allows me to reason carefully about them.

And here I do mean reason, right? In the sense of coming up with

reasons for things right. Why does this do this?

How does this work? Those kinds of things. Right.

So you abstract away the sort of like essentially analytic detail

to this, to a problem, and then suddenly it becomes tractable.

It becomes conceivable. I would say that what really

category theory does is it abstracts the idea of a semantics.

Right. That's really what it is. Right, is because you say, oh,

what is what is a category? It is a universe of some

particular kind of thing. You know, your objects are the the

instances of that kind of thing, and then your morphisms are the

computations that go from one instance of that kind of thing

to another instance of that kind of thing. Right.

So it is really a theory of abstract semantics.

Just a quick question on that. Like you're you're aren't aren't

you allowed to have more than one type of object in a category or. No.

Or does that have a different name. Well, so any any so what do you

mean kind. What what does one mean? I mean, can I have a more morphism

that transfers an object of type T to an object of type S and

then a different morphism. Yeah. But they are all objects of the

same category right. Different types can be category

but different types. Exactly. So the point is like the notions

of object and type are not are not exact.

They're not exactly the same thing. Right.

You know, for example, if you're thinking in terms of Haskell, which,

which is like the most likely way that someone like listening to

this or watching this like this is going to be the most familiar thing

is there's this category, Hask, which is whose objects are all of the

possible types that can be formed with Haskell, and whose morphisms

are all of the possible function, all of the possible well-typed

functions. Okay. Right. So is there is there a particular

name for a category with objects of more than one type?

Well, it's type category or something.

I mean, I'm sure that I think it's the kind of thing that once

you stare at it long enough, the question ceases to make sense.

I think I think it's that kind of thing. Right.

Whereas it's sort of like it's it's that it's trying to like, mix these

metaphors a little bit too rigidly. That introduces that I would say is

like any given category is like. The universe of a particular

kind of thing. Right? And then if you want to talk

about categories where you've got multiple different kinds of things,

those you're actually talking about working in the category of

categories and talking about functors and natural transformations

between functors. Right. You're talking about relations

between this, the the universe of things of this kind and the

universe of things of this kind. And, you know, translations from

the universe of things of this kind into things of this kind.

So I'd say for any given category, you've got one kind of thing.

But the point is, that doesn't mean that category theory,

at any given time, is only about one kind of thing.

It's actually about relational, structural relationships between

the universes of different kinds of things all at once.

I'm going to have to go back to those those tutorials, by the way,

there there are these online tutorials done by, uh, Brendan Fong

and some others that those are the ones I was trying to go through.

Yeah. Uh, no, those are great. And and it's fun because, um.

You know, it's really enjoyable when you come across a topic

that that blows your mind. And the way, like I, I kind of

quantify a mind blowing thing is, is if I understand it the day that I

learn it and then somehow the next day totally no longer understand it.

It's kind of like it's kind of mind blowing.

Um, so here's a here's just kind of a funny question for you.

And answer this any way you want, but, uh, which do you think is,

uh, more mind blowing math? Um, category theory or number theory?

Oh. Category theory. Um, the thing is, number theory.

I don't understand it well enough to have my mind blown. Right?

I don't know what the weird results of it. This this better.

I don't know what the the weird results coming from number

theory mean. I have no way to interpret them.

It's like what? There's a number that does that,

right? This particular number field is like,

you know, has some weird, you know, abstract property

relative to another one. I don't I don't even know what

those things mean. Right. You know,

because I'm not a number theorist. I haven't spent forever thinking

about this. But, you know, I have at this point

spent, you know, like at least ten years thinking about categories.

And so I've found it to be a lot more mind blowing, because to

have your mind blown, you have to understand what it means. Right?

And I do feel like at least whether it is my own sensibility

or essential to the discipline, I won't say, but I kind of think

that category theory is more mind blowing. Otherwise I'd be.

Otherwise I'd be a number theorist. But, um, you know, no, I like that.

In order to have your mind blown, you have to first understand.

So that can be, uh, blown away. Yeah, because otherwise it's

just complete mystery. And it's like,

mystery is not interesting, right? UN mystery that has the flavor of.

I can understand this. That's interesting.

Otherwise it's just like, huh. That's weird.

And I move on with my day. Got it right.

It's got to it's got to trigger the sense.

It's like, oh, this something weird happened and I can make sense of it.

I can make sense of it. I haven't made sense of it,

but I can, right? I have to have that taste.

Otherwise it's boring. That makes sense. That makes sense.

So, um, one other thing, um, that that confused me a little bit

about the category paper is, um. You were talking about being able to,

for example, um, create an analytical pathway, if you like, between,

um, let's say a list or a tree. And I, I'm understanding the

work as being about, um, data rather than algorithms.

And I think the way I understood it was, again,

I was kind of brought up on Peter's work with algorithmic alignment.

So he's thinking about, um, you know, getting neural networks to behave as

if they are doing sorting or doing like Dijkstra or something like that.

And I was thinking about, you know, this, this could become a kind of

system to toolkit where I need to have these algorithmic primitives

and I can compose them together. So do I understand correctly at

the moment that it's almost about, um, like a types of data rather

than types of algorithm? I don't know if those are

actually that different. Right. And the reason why is if I have a

well-typed function from one data type to another one, that's both

about algorithm and data, right? That talks about what you're

allowed to do with data is exactly what the algorithm does. Right.

So the point is that I don't think that that distinction is meaningful.

Interesting. Hmm'hmm. Yeah. Can I, I mean, I think I,

I sort of, uh. I have a lot of sympathy to that

point of view, because this goes back to, um,

when I when I first learned about, you know, algebraic data type

theory and its application to, to programing languages,

I started to really realize, yeah, you know, data and, uh,

I don't know what the right word is, but but data and type data and set

of transformations allowed on that data are, are almost intertwined

into the. You know, like a saint. Like one isn't really useful

without the other. It's kind of a weird duality or

something. I'm not sure how to phrase it.

Yeah, no, I mean, like, I might, you know, hazarded some sort of like,

you know, dialectical fixed point is like one informs the other

and like the notion of type and well-typed function reinforce

and define each other. Right. It's a very much a functionalist

kind of kind of thing. Right? You know,

type theory is the core of like, you know, a syntactic treatment of

a synthetic mathematics, right? At least that's, you know,

the weird, you know, not fuzzy, but like rarefied and

unbelievably abstract world that, you know, through which I

discovered type systems. Right. But yeah, maybe, maybe another

way to think about this. Maybe this is related to, um,

get your opinion on this, Paul. That may help Tim.

For us to think about is, you know, for many, let's suppose you have

and I'm just going to say class, I'm not trying to be like object

oriented or something. But suppose you have a type,

you know, that's that's interesting. Like whatever complex numbers,

you know, you're you're doing a lot of programing and complex

numbers are useful. Yet there are many representations

of that complex number. You know,

you could like literally store it as a piece of like literal data.

That was an angle and a magnitude or two, you know, x,

y coordinates or whatever. There's many possible data

representations of it, and it's just irrelevant.

It's like you choose one based on like, you know, whatever works,

whatever's efficient, whatever. I feel like, you know,

whereas all the, all the important and interesting stuff is, is and

is at a higher level than that. I would hazard that I would be

uncomfortable with the statement that it's irrelevant. Right.

And in fact, so here's one of the like. So you know.

Maybe this is not what happens if you're like in graduate school

for mathematics, in places where category theory doesn't have this

sort of like, you know, somewhat deprecated history using deprecated

in this kind of like a slang way. I was like, oh,

this is kind of like not approved kind of mathematics, right?

You know, North American math, like specifically American Academy.

Yeah, exactly. It's heretical. Or like, people don't like it so

much. Like why? It's like, well, because it doesn't

feel like you're doing anything right. Why do you do mathematics?

What is the pleasure? It's like, I want to feel like

I'm doing the computations. And category theory has this

nasty habit of telling you that you didn't do anything right.

But it's like that scene in good Will hunting where you know, they're up.

There's some circles and lines drawn on the board,

and he goes up and erases one and draws a line somewhere else,

and they're like, oh, you know, it's like, you know, well, what?

Yeah, I mean, the thing that it's sort of about category theory is

actually, you know, Grothendieck had this statement that was like,

if it's not obvious, you're not ready to work on the problem yet.

So and, you know, he had this whole, you know, principle of like theory

building for problem solving. And one of the aspects of theory

building for problem solving is you develop a vast library of equivalent

or at least related representations of similar or the same concept,

and you solve a problem by finding the one in which it's obvious. Right?

It's sort of it's a shamanic mathematics.

It's a mathematics about understanding something and then not

really having to do that much. Right. The point is, think about this in the

right way and it becomes trivial. This is or not trivial,

or at least like clear, right? You know, this is very much like

the task of science is to carve nature at the joints. Right.

Like that's the thing is like, well, then you have a bunch of different

ways you could carve this up, find the one in which it's easy.

Right. So that makes sense. So I mean, like thinking about

all the different possible representations of complex

numbers as a data representation. It's still again informative of

the algebra that you're, you know. Yeah. Exactly.

Which what what are you going to do with it. Intricately linked. Yeah.

Like what are you going to do with it that determines which representation

you should use. Yeah. Yeah. So you're telling me the scheme

people were right all along. That code is data. It's just.

Oh, absolutely. No, they were completely right.

Yeah. No, I. What was it? Uh, 10th grade learning scheme.

Um, yeah. Uh, that definitely informed my

sensibilities a lot more than I realized.

You know, because I went away from doing, like, math and science

for quite a while before coming back to it as an undergrad.

And as I've gotten deeper into mathematics, I was like, oh,

I really did drink this. Like functional programing kool

kool aid of this, like deeply pure 1970s variety a long time ago.

And whether I understood that that's what I had done, I absolutely had.

But I do think that they were right. Like the difference between data

and code is, is like there isn't one, right?

Actually, like the difference between data and code pops up in

neural networks a lot, right? You know, people like thinking

like data is like, you know, training data is like what is like

trading data is the goddamn program. The neural network is the compiler,

right? You are compiling training data into

the final state, which is going to be the function that you're

actually going to use, right? This is why I think eventually all of

the copyright lawsuits against the major AI companies will go through

because they are using copyrighted data to, to program their machines.

That's exactly what they're doing. It's not learning as bullshit.

It's it's it's programing. Right? It's a weird programing that works

by way of like differential, like sort of, you know, like a slightly

mutant differential geometry. But it is just programing, right?

You are literally programing your models with data. Right.

So any distinction between data and code is nonsense.

That is very interesting. Some great soundbites from this

conversation, Tim. Uh, for me, for me, it was kind of

a compound of little different, you know, things that I was taught.

Like when I learned that you could exponentiate matrices. Yes.

You know, I was like, wow, this is weird. Uh, yeah.

You know, when, uh, when I thank God for power.

Yeah, yeah. Oh, yeah. Yeah. Well, that's another cool thing

about power series is when when you finally learn y, you know,

e to the I pi equals minus one. Yeah. And you look at the power like

that's that's kind of mind blowing. And then, uh, and then when I,

when I saw how to construct, I think it was Peano arithmetic from

this kind of algebraic type theory, you know, perspective and things like

that just all adds up to the same. Really fascinating.

Yeah, I don't know fact about the universe or fact about logic

or something. Yeah. Um,

so we've been using some terminology like syntax and semantics.

Can you, can we just like, zoom out a little bit?

What do those terms mean and how are you using those terms in

this context. Okay. So what. I mean to say is okay, syntax.

Formal expressions that you do not assume have any meaning,

but instead you just have rules that allow you to combine that thing.

So purely formal constructions, usually in terms of symbols,

whether it's graphical or text, that sort of irrelevant.

But the point is it's purely formal manipulations of symbols. Right.

And semantics is when those are interpreted as meaningful. Right.

And the kind of two category theory that I've been talking about is

like an abstract semantics, right? Whereas this category with like a

single object in these endomorphisms, this group,

this like classifying groupoid of a group thing that is the.

It's sort of like that's the categorical avatar in this case of

like the syntax of being able to write g dot x and g prime dot g dot

x, that kind of thing. Okay, okay. So, um, you know,

from a linguistics point of view, you know, like people understand

semantics to me, to be the meaning, um, the aboutness of, of a word.

So, for example, I could say Wagga and dog and it would point

to the same cognitive category, which is a dog.

Um, but in, in this framework, just help me understand exactly

like what is the semantics? That is a good question, right?

Because you're asking about semantics in this very

particular way about as like, what is this thing about? And. Well.

The syntax allows you to write down constructions or operations

which are manifest in the semantic category. Right?

So the category is the world in which those in which that syntax

is interpreted as, be it things like elements, or be it things

like functions or be it things like transformations. Yes. Right.

So this so the category that you're so there's this semantic category.

And then the fact that they're also encoding like syntax as a category

itself that ends up, you know, opening up another sort of can of

conceptual or werm sort of breaking down like the easy distinction

between syntax and semantics. But the point is the story that

they were telling, and this is sort of instructive.

And this is sort of very much what Andrew says is, okay,

so you start with syntax. You can write that down into the

universal category that has that syntax interpreted as morphisms.

And then you look at functorial semantics of that.

So that's the interpretation of this universal category in which those

terms have meaning. Yeah. Right. Which means they have no other

meaning other than the formal relationships between them that

you specified in your syntax. And then you take a functor out

of that thing into something else and that interprets them.

So that's, this is, that's this notion of functorial semantics.

The story that I'm talking about is I'm not even worried about the ever

writing down any individual terms. I'm just interested in sort of this

two dimensional graphical picture that allows you to talk about what

kind of thing algebra is at all. Right.

So for the story that I was interested in this and the way I

come to this is we don't, don't even really ever study the elements.

Like I want to get away from that. Right.

Because the elementary perspective, while it's useful and helpful for

like the analytic construction of these neural networks, it's not

that good for proving properties of those of those networks. Right?

You know, it's pretty easy to like you write an

expression that has 5 to 10 terms. Suddenly it's very confusing.

Whereas you draw an element like a diagram that has just like 5 or 6,

you know, 1 or 2 dimensional pieces, and you can derive formal properties

about it's a little bit easier. Okay, okay.

I mean, because we're saying earlier that, um, we could use the example

of group theory and I could look at all of the, like, symmetry

preserving transformations of a triangle and the aboutness, you know,

the semantics is the triangle. Yeah. Is the transformation the.

Exactly. It's about the triangle. It's about the actual

transformations of the triangle. And then the syntax is just these

things as elements of the abstract symmetry group that that makes sense.

So when I was discussing this with Andrew, he said that the

reason this separation is useful is because if you think about it,

it's like you're talking about you're talking about two, two worlds.

And he was arguing that we want to or we have less representational

ambiguity in the world of semantics. Yes. So there's a kind of asymmetry.

So for example, you know, using this, this um group

analogy of the triangle, we have these high precision,

you know, we can we can actually draw a grid of all of the, um, you know,

symmetry preserving transformations. And now we've got a very high

resolution way of understanding that triangle.

But I guess the question is, though, is it is it really true to say that

there is less representational ambiguity in the semantics?

I guess, like the question I'm asking here is, um, are these

semantics ordained by God or are they purely a human construct?

I mean, does semantics just mean something to humans?

I wouldn't say that they are ordained by God, but I would say that what you

can do is like you can have like the initial semantics of something,

which is some. Right? So if a category is some like

abstract universe in which stuff does, in which things like a universe

of a kind of thing and then morphisms between that kind of thing. Right.

You can talk about like the initial semantics of something. Right?

And this means the one that that like the semantics that has only the

properties that are necessitated by the syntax and no other. Um, right.

And that is genuinely necessarily less ambiguous than any particular

syntactic presentation of something. Right.

So I'm not going to be able to come up with two distinct

presentations of a group. But the point is, I can write

down things like groups by way of generators and relations. Right.

But there may be and usually are, many different ways I could write

down the same algebraic structure with different syntaxes.

However, there is only one semantics for that thing. Um, cool.

Now, um, so so you've written this paper, right?

If you were to give me a one minute elevator pitch of the paper,

what would it be? Title of the paper is categorical

Deep Learning an algebraic Theory of Architectures.

Uh, one thing that, like the. You know, hardcore category theorist

amongst us might take objection to is that the words algebraic theory

actually have a formal meaning in category theory, and that's not

actually exactly what we mean, right? An algebraic theory to a category

theorist is like a linear theory. It's a it's one of these syntactic

categories of like the allowed manipulations, stuff like that.

Whereas what we mean is specifically architecture as being the semantics

of the sort of these universal algebraic notions. Um, so.

That's the title of the paper. Gotten distracted by this statement

that we said algebraic theories. Maybe that's not the best name,

but it's a pretty good name. Why is it a good name?

Because the point is that we want to say that all of the things

that have been studied in GDL and significantly more examples,

all of them are in fact instances of algebraic structures,

be they structure preserving maps, i.e. morphisms between algebras

for monads or structure maps of algebras for monads or the

appropriate dual constructions. Say these are sort of these like

universal structures, and then we're interpreting them

into a two category whose objects are vector spaces, whose morphisms

are parametric maps. Right. A good example of the kind of

parametric map that pops up in deep learning is a ReLU network

is a parametric function, right? Because for any given choice of

weights in all of your various layers, right, you get one function,

but you don't really want to think about those functions as distinct.

You want to sort of group them all together.

And so this is why there's the innovation of this notion of the

parametric function. And then you want to talk about being

able to change those parameters. So that requires the introduction of

this notion of reparametrization. Right.

So specifically what we develop or we don't develop it in this paper.

Right. This para has been around for

some time. But what we really do is we say,

hey, if we do two dimensional universal algebra valued in these

categories of objects, parametric maps and reparametrizations,

that allows us to talk at a very high level of abstraction about the

kinds of things like the geometric priors of GDL or the structure maps,

which are exactly the representations of groups that

are like the foundation for GDL. It allows you to talk about it

in a far more abstract way that more naturally generalizes to

algorithmic constraints, as opposed to merely sort of conservation law

type constraints, which is that which groups are really good for.

Okay, so let's just talk broadly about abstraction. Right.

So what is the purpose of abstraction.

And in this particular case give us an example of a powerful

abstraction that would, you know, change the way that we reason

about neural networks. Okay. What is the purpose of abstraction

from a mathematician's standpoint if you want to be utilitarian about it?

Uh, the purpose is. Do I want to climb a mountain with a

lot of incredibly specific gear, or do I want to learn to be a

mountain goat? Right. That's really what the purpose

of abstraction is. The purpose of abstraction is

getting away from all of these specifically adapted tools and say,

okay, these are the tools that I'm going to a