WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...
Summary
TLDR本次对话深入探讨了深度学习和人工智能领域中的抽象概念,特别是如何利用范畴论来改进神经网络的结构和理解。嘉宾们讨论了从具体的算法实现到抽象的数学结构的转变,以及这种转变如何帮助我们更好地推理和理解复杂系统。范畴论提供了一种强大的工具,使我们能够超越具体的编程语言和计算范式,进入一个更高层次的思考和理论构建。
Takeaways
- 🤖 讨论了如何通过抽象和理论框架来提高深度学习模型的可解释性和效率。
- 🧠 强调了神经网络与数据和算法之间的关系,以及如何通过类别理论来建立这种联系。
- 🔍 探讨了如何使用类别理论来设计特定领域的编程语言,以改善机器学习系统的构建和理解。
- 📈 讨论了如何通过类别理论来处理神经网络中的数据表示和算法结构。
- 🔧 强调了抽象的重要性,以及如何通过抽象来简化和理解复杂系统。
- 🧬 通过类比生物进化中的“canalization”来解释抽象的概念和其在数学和深度学习中的应用。
- 🌐 讨论了如何将复杂的计算问题转化为更易于管理和推理的抽象表示。
- 📊 强调了类别理论在解决数学问题中的实用性,尤其是在处理结构和关系方面。
- 🔑 讨论了如何使用类别理论来识别和利用数学结构中的模式和对偶性。
- 🎯 强调了目标导向和基于代理的智能之间的区别,以及如何通过类别理论来理解和构建智能系统。
- 💡 通过类别理论的视角,探讨了如何构建更高效、更可解释的神经网络架构。
Q & A
乔治离开特斯拉的原因是什么?
-乔治离开特斯拉是因为他不认同特斯拉提出的解决自动驾驶问题的方法,他认为特斯拉试图通过更多的数据和计算来解决所谓的'旧金山问题',并不是正确的方法。
什么是'旧金山问题'?
-'旧金山问题'是指特斯拉自动驾驶系统在旧金山特定的复杂交通环境中的表现问题,需要特别的解决方案来优化。
乔治对于自动驾驶系统的看法有何不同?
-乔治认为特斯拉的方法,即通过大量数据和计算来训练神经网络,类似于试图记忆无限的东西,他认为这并不是一个可行的解决方案。
特斯拉是如何尝试学习世界的?
-特斯拉通过大量的训练数据,尝试构建一个能够记住世界上所有可能的感知状态,并在所有情况下产生正确感知信号的函数。
什么是'Operation Vacation'?
-'Operation Vacation'是特斯拉内部的一个项目,目的是让自动驾驶系统的功能收敛并记住无限,也就是达到一个所谓的奇点时刻。
保罗对于神经网络的看法是什么?
-保罗认为神经网络是有限的机器,它们在有限的计算和内存约束下运行。虽然理论上可能构建一个能够记忆无限的神经网络,但在特斯拉车辆的计算限制下,这是不可能实现的。
保罗加入的公司的主要目标是什么?
-保罗加入的公司的主要目标是进行研究导向的创业,证明可以在没有规模扩大的情况下做很多事情,挑战传统的以规模为基础的解决方案。
什么是几何深度学习(GDL)?
-几何深度学习(GDL)是一种深度学习方法,它试图超越传统的深度学习模型,通过考虑数据的几何结构来提高学习效率和性能。
什么是范畴论(Category Theory)?
-范畴论是一种数学理论,它研究不同数学结构之间的共同性和映射关系。范畴论提供了一种抽象的语言和工具,可以用来描述和理解复杂的数学对象和它们之间的关系。
为什么范畴论对于计算机科学和软件工程很重要?
-范畴论为计算机科学和软件工程提供了一种强大的抽象工具,可以帮助理解和设计复杂的系统。通过范畴论,我们可以更好地理解数据类型、函数组合和算法结构,从而提高软件的可靠性和效率。
Outlines
🤖 人工智能与自动驾驶的思考
讨论了人工智能在自动驾驶领域的应用,特别是特斯拉的自动驾驶系统。提到了乔治对于特斯拉处理旧金山自动驾驶问题的方法的质疑,他认为特斯拉的方法是通过大量数据和计算来让系统记住世界,而不是找到正确的解决方案。乔治认为这种方法不可行,并提出了反对规模化的观点。
📈 深度学习和几何深度学习的局限性
探讨了深度学习,特别是几何深度学习的局限性。讨论了DeepMind团队试图通过放弃可逆性和可组合性的限制来推广几何深度学习。同时,讨论了神经网络作为对象和参数化映射的实例,以及如何将它们与代数结构和范畴理论联系起来。
🔢 类型理论与范畴理论的结合
讨论了类型理论与范畴理论的结合,以及如何使用这些理论来更好地理解软件和算法。提出了通过范畴理论来设计特定领域的编程语言,以便更好地理解和推理复杂系统。同时,讨论了如何使用范畴理论来解决软件工程中的分支和类型问题。
💡 类别理论对计算机科学的影响
探讨了范畴理论对计算机科学和软件工程的影响。讨论了范畴理论如何帮助我们设计更好的编程语言和类型系统,以及如何通过抽象和理解复杂结构来提高软件工程的效率。同时,提出了通过范畴理论来解决深度学习和量子计算中的一些挑战。
🌐 构建机器学习领域的特定语言
讨论了如何使用范畴理论来构建机器学习领域的特定语言。提出了通过类型驱动的机器学习和深度学习来提高模型的可解释性和效率。同时,讨论了如何通过形式化的方法来理解和验证神经网络的行为。
🧠 神经网络与有限状态自动机的比较
讨论了神经网络与有限状态自动机(FSA)和图灵机的比较。指出了神经网络在处理递归和记忆方面的能力限制,并提出了通过范畴理论来设计能够进行更高级别计算的神经网络架构。同时,讨论了如何通过结构化的方法来解决软件工程中的复杂性问题。
Mindmap
Keywords
💡神经网络
💡深度学习
💡自动驾驶
💡风险投资
💡计算约束
💡函数
💡数据标签
💡类别理论
💡对称性
💡泛型
Highlights
讨论了深度学习和神经网络的结构化方法,以及如何利用范畴论来改进和理解这些结构。
提到了如何通过范畴论来构建特定领域的语言,以便更好地理解和设计机器学习模型。
探讨了神经网络的局限性,特别是在处理递归和抽象方面的能力。
讨论了如何通过抽象和范畴论来简化和理解复杂的数学结构和算法。
强调了抽象的重要性,以及如何通过抽象来提高问题解决的效率和清晰度。
提到了如何利用范畴论来设计下一代神经网络软件,从而提高软件工程的可解释性和可靠性。
探讨了如何通过范畴论来理解和改进深度学习中的几何先验。
讨论了如何将范畴论应用于神经网络的训练过程中,以提高模型的泛化能力和性能。
强调了在神经网络中实现类似于图灵机的计算能力的可能性和挑战。
提到了如何通过范畴论来构建具有更高级结构和操作的神经网络架构。
探讨了如何利用范畴论来改进和加速机器学习领域的研究和发展。
讨论了如何通过范畴论来理解和解决机器学习中的算法对齐问题。
强调了在设计神经网络时考虑结构和语义的重要性,以及如何通过范畴论来实现这一点。
提到了如何利用范畴论来创建可以解释和验证的神经网络模型,从而提高软件工程的质量。
探讨了如何通过范畴论来理解和改进深度学习中的模块化和组合性。
讨论了如何利用范畴论来设计能够处理无限数据集和复杂任务的神经网络。
Transcripts
So, Paul, we're, um. Where did you come from?
Are you are you nearby? Tim. Uh, for the last two weeks,
I've been in London. Oh, yeah? Tell Keith the story.
So, um, they raised some seed capital about 18 months ago,
and then they set a date in February to do so. Yeah.
So what the deal is, I think sometime in 2022, George and
then co-founder and his co-founder, Taylor, they were, you know,
George quit Tesla because he thought that the way that they were proposing
to fix autopilot, in particular, to fix the
"San Francisco problem" for autopilot, was just way more data for San
Francisco specifically and just, you know, throwing compute at the
wall. And so it was like this. So George's thought was like,
this can't be the right way to do it. So he was like,
I'm not exactly sure how to do this, but I want to take a bet.
I want to take a position against scale.
It kind of feels like they're trying to memorize infinity. That's correct.
So, um, so would you say that your experience there informed
some of this skepticism? Yeah. I mean, I think if you look at the AI
day presentations that effectively go into deep technical detail about how
Tesla learns the world or Andre's talk at NeurIPS two years ago.
Um, you know, Tesla very much is trying to come up with a function
that memorizes the world, you know, basically take all of
the possible perceptual states that exist in the realm of
driving and produce the correct perceptual signals that result
from that under all circumstances. And they're pouring massive amounts
of training data and making that happen, running, you know, tests in
shadow mode to collect triggers that then feed more labels back to the
labeling team to run that data loop. And, you know, Operation Vacation,
which has been publicly discussed, is effectively that singularity
moment where the function converges and has memorized infinity.
And, you know, as you know, I've kind of talked to you about so far, I do
not believe that that's possible. You know, obviously,
neural networks are finite machines. They operate under finite compute
constraints with finite memory. It may be possible to build a neural
network that could indeed memorize infinity, given an infinite number
of layers and infinite number of, you know, uh, fully connected
neurons, um, infinite compute and memory to execute that model.
But given the compute constraints in a Tesla vehicle, uh, I don't
think that's possible to achieve. And so then he shopped around for
a year or so to try and find a venture capitalist to fund a sort
of research oriented venture to, you know, make a case for you can
do a lot of things without scale. And then I came on to the
company about six months ago, and within about a month and a half,
we were like, okay, we don't know what we'll be doing in six months,
but we need to be selling it to to the investors in six months.
And so we finished the paper on February 1st or really February 2nd
for the North American time zone, because it's, you know,
a February 1st anywhere on Earth kind of thing.
And then a week and a half later, we were meeting with, uh,
the venture capitalists to get money to do this whole thing.
Yeah, quite a whirlwind. I mean, yes, I'm.
I'm just thankful that, uh, you know, George created this,
and it exists because. Because I think, um, I take the
same line conceptually, which is the answer isn't just, you know,
more of the same or the same. You know, we need to, uh, we need to
go back to some cleverness and, uh, and invent some new things and, uh,
to really, to really make, you know, order of magnitude improvements.
You should see this setup. We've got you on a big 55 inch
plasma screen. So it's as if you're in the room
with us. I'm sorry about that. Like, you know, that's too much.
Dagger is what you're saying. I would have shaved or, like,
you know, um, like, brush my hair or something.
Well, you did, you did pat it down in the back. The headset. Yeah.
It's like this headset hair you get, you know. Oh, yeah, a headset.
Uh, so I'm actually going. I'm worried about that.
Like, what's the compression going to do to my ear? Cartilage.
Like over time, all these unknown, unknown cybernetic effects.
A very soft version of cauliflower ear for podcasters. Right? Right.
I guess something you could be proud of, right?
Just like the MMA guys are proud of the cauliflower ear.
I've had this headset on for so many hours.
10,000 hours of Skyrim alone, I have achieved true mastery.
Yes. Okay, well we're in. We are, we're in.
So, um, where do we start? I mean, maybe, um, Paul,
because Keith hasn't watched the recording of the other day.
So maybe if you just frame up what's going on here.
And in particular, I think we want to have the Turing
machine discussion, um, with you, with, with with Keith on the call.
Um, so, yeah, I mean, just just frame up the, the the, you know,
what's this all about? Okay. What the hell is this all about?
Um, depends on which side of the collaboration that results in
the paper you are on. If you come from the DeepMind side
on this, you are looking for a generation generalization, excuse me,
of geometric deep learning, right. There are some obvious limitations
to geometric deep learning, which is that the kind of
transformations with which you can, or relative to which you can ask for
a neural network to be equivariant or invertible and always composable,
and so on their side, they're thinking, okay, we need to generalize
away from invertibility first, and then generalize away from the
presumption that all transformations can be composed, right?
Because not all computations can be composed.
So if you want to use something like GDL to align neural network
architecture to algorithmic challenges, you need to figure out a
way such that you can structure. You can align it to these not
necessarily invertible, not necessarily composable steps
of computation. Right. If you come from it,
from where I come from, it, it is oh, I'm interested in
neural networks as a particular. Instance of morphisms in a two
category of objects, parametric maps and reparametrizations.
And I'm interested in the kind of structure that they're interested in.
But thinking about it, as you know, algebras for monads or coalgebras for
co monads, stuff like this. Right. So this sort of like not I'm not
going to say like higher level of abstraction, but much more
sort of two categorical flavor, whereas they're really thinking in
terms of sort of representation theory and a sort of generalized
version of representation theory. Bruno and I, you know,
Bruno first. Right? I learned what I know about
neural networks through reading Bruno's thesis.
And that collaboration has been really instructive.
But, you know, we've really been thinking about this as, okay,
there's a two categorical story here. We want to do two categorical
universal algebra. And this gets even a gets you
way more abstract, which makes it more fun to think about.
But it also makes a lot of things tractable that are otherwise far
too complicated to reason about. I think just just as a very tiny
kind of kind of question. I think it'd be good.
I think most of our, our listeners can, can definitely understand that
some computations are not invertible. Yeah. You know that that that's easy.
I mean, you know, you you add two numbers together.
It's like you don't know what the original two numbers are.
They could be a bunch of possibilities. It's not invertible.
Um, but can you give us an example and some intuition behind
computations that are not composable? Oh, yeah.
So this is this is sort of like what in some sense this is what
type is for. Right. Consider some like computation that
requires that the input be a tree cannot be at least not naively
applied to something that is lists. Right?
You need to do a conversion between lists and trees before
you can apply a computation that takes in trees to it, right?
So that would be an example of a non non composable computations. Right.
Stuff where it was like well the types of the input and output
simply don't line up. Well. Yeah. And isn't isn't another example.
And correct me if I'm wrong, but I think another example is
something like like branching is not at least unless you're in a
non-deterministic paradigm. Like branching is not composable,
right? Like if I have an if statement
that kind of says, you know, if it's one thing, do this,
if it's something else, do that, then that that poses some issues
for composition, for function composition. Right?
I wouldn't say that. Uh, the reason I wouldn't say that is
because the function if this do this, if this, do this right,
I can still compose stuff afterwards. It depends on if my if do if this has
a different type, say then if this do that then you might not be able to
compose like stuff together naively. But I wouldn't say that
conditionals and branching are really what the issue is.
It really has to do with input and output types. Right.
Well, I think I think the concern there, it is a type one
kind of at least from. And let me first state up front.
Uh, I'm super interested in category theory. I know very little about it.
I only have, you know, I've tried to do some tutorials online,
and so a lot of my knowledge is pretty far out of date.
But I think like the issue with composability of branching at least
like say in the Haskell, you know, kind of functional programing
world is that, um, if even if both the results are of type T, the,
the outcome of that branch is, is possibly a union of it's, it's,
it's an optional or a maybe of those of those types like it may
be one value or another value. Yeah. So it kind of creates this.
Is that right. Well I mean so but the thing is like
maybe something is still just one type. Right. So I really. Right.
But it's not t but it's not type T it's a no. Exactly.
You have to you would have higher order type. Yeah.
Yeah it's a it's another type. Exactly.
And so you can't simply plug maybe T into like you can't simply apply
something that requires, you know, it takes input of type T to something
whose output was maybe t. Right. You need to do like, oh,
I have to check is like if is it, is this just t or is this like
you know else or whatever? I can't remember what the Haskell
syntax is right for the exception. So that's the problem with branches
is as soon as you throw in branches into code, you start to you start
to change the type into these, you know, composite. Okay.
So I'll yeah, in that sense I will I will buy that in that composition.
You have to be much more careful about type.
So I think it's still a story about input and output types and
which things you can compose. But the problem is naive composition.
If you start doing branching of things, you realize, oh,
I've actually changed what type I'm dealing with. Yeah.
And so just just back to my own like personal story a second,
because I have a question like I'd like to ask you, which is,
uh, a long time ago, you know, I learned about the power of algebra.
So when I first took an abstract algebra course, I was like, wow,
this is really cool. You know, algebra is so much more
right than I thought it was. Um, then sometime later,
I found out just the the real beauty of of type theory.
And this was all within the computer science perspective.
So algebraic data type theory types, you know, I just I fell in love
right with types. Yeah. Okay. And so then along comes somehow I
hear about category theory okay. And I'm like, oh yeah,
this seems to be you know, if I go learn some about category
theory, it's going to help me understand software better.
And, you know, algebraic type theory and everything.
But what I sort of quickly found and again, this could be quite naive.
And this is why I want to ask you is that.
It seems to me like type theory is, you know, it's it's this higher
order thing where, for example, you might ask, you might look for
similarities between algebras. You know, like here's the
algebra of Boolean algebra, here's the algebra of partial orders.
And, and they're actually maybe the same algebra or they have a
lot in common. And so there's this beautiful
sort of high order. Um, uh, pattern finding and
relationships that, that allows you, you might be able to prove
something in one algebra and it gives you the tools to map it,
like into a different algebra. But I found like it didn't really
have a lot of relevance to me just for my everyday software engineering.
And so I started to kind of lose, you know, the interest in it,
like tell me where I'm going wrong. Like tell me, you know,
in what ways category theory can can help someone who's more of a
practitioner, a computer scientist, a software engineer, you know,
really, um, like, understand better what they're doing or,
or apply new insights, you know, to the kind of work that we would do.
So unfortunately, I am not the best person to answer that question,
but I'll give it a good shot anyway. Right.
You know, so I come to this whole story from pure mathematics,
and I discover type systems in the context of homotopy type theory,
where it's like what there's a programing language that
corresponds to space. Okay. Right. So my, my my introduction to
this stuff, like I come to like I actually like, you know,
way back in high school or whatever I learned scheme and stuff like that.
But that was, you know, longer ago than I would like to
admit and had that hadn't been, like terribly relevant to my
intellectual life in recent years. But, um, so I would say the problem
with trying to justify category theory in terms of functional
programing alone is that. You've already learned most of the
category theory that's going to be directly relevant by doing the
functional programing itself, right? The point is, you don't get
Haskell without category theory. But once you learned Haskell,
you've learned a lot about a specific category, and that's the
only one in which you're actually doing all that much work.
So the abstractions of category theory don't necessarily give you
all of that all that much. Right? That's sort of my naive
perspective on like, you know, it's like, oh, category.
You know, I remember the first day I came into the symbolic office when we
were next door to Khosla Ventures. And I come in and, you know,
like one person's there, you know, and they're like, oh,
you're the new category theorist. Uh, I hear that's useless.
And, you know, my retort is like, you like functional programing,
right? And they're like, yeah. And I'm like,
that wouldn't exist in its current formulation without category theory.
It's like, yeah, but I know it through all of these other means.
It's like, yes, you've you've grokked the essence of the category theory
that is applied to functional programing most of the time already.
I do think that once you get into things like dependent types and sort
of more sophisticated type systems, the category theory side,
you need more sophistication to make really good sense of it.
You can have sort of a pretty good, um, elementary in the sense of like,
I think about elements or I think about term's way of thinking about
things like dependent type systems. But, um, I found, you know,
of course, this is how I came to this stuff.
So of course I think this is like the right perspective.
But I think that category theory gives you permission to have a
sort of more geometric interpretation of this stuff that
makes it easier to understand and easier to reason about. Right.
You know, one thing is like the real strength of category theory
is not like working in any particular category, right?
The real strength of category theory is, hey, this gives me this
principled language in theory, for abstracting things such that I
can remove all the extraneous detail from a particular problem and study
just the bare bones of that thing. Get an excellent description, an
excellent understanding, and find it in lots of different places. Right.
So it's the idea, it's sort of, you know, like I want a good library.
I do it once and then I apply it in lots of different contexts.
And I'm absolutely not. One of the category theory is,
is useless people. I think I'm in the category of I
think there's probably a lot of really useful stuff in there.
I just either I don't like I definitely don't understand it and
or it hasn't been discovered yet. Um, you know, for,
for my particular context. But let me give you an example of
where, yeah, my intuition tells me there's probably some really useful
insights from category theory to be explored. So, um, long ago. Okay.
Uh, Alexander Stepanov, a really brilliant, you know, uh,
computer scientist, was trying to, to do things within type systems
that were just not possible in the current languages.
So one thing he wanted to be able to do, for example, was to have a type
system rich enough that it could express properties of an algorithm.
So maybe you have a bunch of different Sword algorithms in
your library. And he wanted the kind of complexity
constraints of those Sword algorithms to be part of the type system. Yes.
So that somebody writing a generic piece of code, you know,
calling a sort algorithm, and there were certain properties
required there that the type system could make the best decision,
like it could say, well, in this particular case,
you should use this Sword algorithm. And it was able to compute all this
like from the type system. Yeah. Um, he eventually discovered C plus
plus which just by accident. Okay. Just by accident,
it had a template metaprogramming capability that wasn't by by design.
It just kind of accidentally ended up being a Turing complete, you know,
type system or whatever, where he was able to implement some of his ideas
with really ugly code, by the way. So I'm not justifying, you know,
the quality of it. It just he just was able to and
he was able to do, you know, lots of brilliant things which
led to like ideas like, you know, traits and categories and their
sense like in the kind of, and, you know, algorithms with,
with very rich type systems, template metaprogramming.
I feel like we've hacked our way to that, like by accident,
kind of from the, from the computer science side,
from the programing side of things. And the category theory could
offer a new lens to view that, um, and provide insights for designing
like the next generation of, of languages and, you know,
advanced type systems. Um, is there any sense to that
or or or I'm off the. No, I think that that's so one of the
things that like, I'm not working on this kind of stuff right now,
but I have spent a lot of time thinking about and is essentially
starting with a particular category and then trying to define a domain
specific language for that, or like an internal language of that,
such that it becomes easy to reason about the stuff in that
category in a way that's perhaps a little bit more intuitive than.
Going towards, like the high structuralism of
purely category theoretic arguments. Right. An example. So I love that.
Let's let's pause here for one second. So I love this idea.
So category theory can help one design domain specific languages.
Yes. No. This has been like become a huge
deal. Like as you know, I only know of
it from like the homotopy type theory community and the kinds of
projects that that has spawned. But as an instance of this,
you know, so there's this paper, I think, by Mike Schulman called
a called A Practical Type theory for symmetric monoidal categories.
And what he does in this paper is say, okay, there are a bunch of type
systems that are interpreted into symmetric monoidal categories, but no
one uses them for proving things about symmetric monoidal categories.
And he asked the question why? And he essentially, you know,
nails down a couple of points that all of the existing type theories
that are interpreted into symmetric monoidal categories fail to satisfy.
And like the one that I found the most compelling is like they
don't have this sets with elements flavor to them. Right.
And what's prerequisite for sets with elements.
And that feeling is, well, you need tuples of terms to be
in tuples of types. And that needs to be symmetric on
both sides of judgment. Right. And so he says, okay,
I'm going to design a new type system that has that property
with my intended semantics in a symmetric monoidal categories.
So the idea is you can design a programing like you say,
you can like you can have like the semantics that you want this thing to
have and then design the programing language to fit that right.
And I think this is an incredibly powerful, uh.
Powerful trick that I do believe it's, you know, it's taking longer
than people thought it would, but that's okay, you know,
because it's hard. But the idea is,
you know what you can select. Kuhn has this line about, you know,
the only reason science works is because you can forget how you
got there and just start from new foundations each time. Right?
It's kind of like a less snarky analog of the, you know, science
proceeds and funerals kind of thing. It's like, you don't want you don't
exactly you don't want to have to recapitulate all of mathematical
history to work with something. And we're finding out that
there's all of these, like, complicated mathematical structures
that we really want to reason about, and we don't want to have to
build them from scratch. Instead, we would like to start with
something like a programing language, or a logic that just deals with all
the complexity of that naturally, by forcing you to be careful
about syntax. Yeah, right. And this is the kind.
And so this is exactly this whole study of like domain specific
languages for categories or better in like in math or language.
This is like internal language hypotheses or internal language
statements like, I want the internal language of this
flavor of category or the internal language of this flavor of category.
And so maybe like so this is you know, you can ask the question is
like, why the hell are we getting to why, why, why now? Right.
There's a there's a good deal of why now, right?
I've just said it's like, hey, we've just had a successful funding
round. We've got $31 million. We're assembling like the world's
largest industrial category theory and machine learning team.
The immediate response to this like, aren't you the only like, uh,
category theory and machine learning team in the world?
It's like, no, there's probably like a couple of them.
There's like Petar's team at DeepMind, that kind of stuff.
So we're not the only people in the world doing this,
but we'll probably over the next like six months, become the biggest.
And the question is like, how the hell do you get money for this?
And like, one of the reasons is, you know, if you look in the history
of computation, there have been people banging the drum of like
purely functional perspectives and category theory as like the
right foundations for computing. But you haven't needed it before,
right? You've been able to hack your
way with other paradigms. But deep learning and quantum
both pose problems to the existing paradigms.
It's just like it suddenly becomes too hard to make progress without
categorically informed treatments of those computational substrates.
Right. And so let me let me ask you about
this because I and and and open I haven't read the paper yet.
Um, so apologies for that. But I think if I'm going to take
a wild stab, based on what you've everything you've said so far about
where you might be going from this perspective, is it the case that,
um, category theory will inform? The construction of domain specific
languages for machine learning. That is definitely one outcome
that I foresee. Okay. And so and so and my kind of
let's say, uh, lay more lay understanding of that.
It would be like, say, as a practitioner, I would now have a
language that I could use to maybe even, uh, you know, define, um,
modules, machine learning modules or constructs compositions of,
you know, neural networks in a way that I can actually reason about
as a human being, because there will be a new set of concepts,
a new language that I can program in, if you will, which will ensure the
semantics and structure that that I need to to do something useful. Yes.
No. Right. That is exactly right. One of the outcomes of this
research program program, we hope, will be something we
can call like type driven ML or type driven deep learning. Yeah.
I mean on on that we had a very interesting discussion just
before we hit the record button. And there's a real problem now
with ML engineering, but also just large scale ML systems like
Facebook and its inscrutability. And this is this is a real problem,
right. Because we tend to just blame
the system. And it it gives gainful employment to
lots of engineers because we lie. We say that these things reason
that they that they operate in a in a rational way and they don't.
And when they go wrong, we just blame the system.
We just say, oh, you know, it's just a complicated neural
network that it's like alchemy both in how we build the systems and
how we explain the failure modes. And what you're talking about is
a future towards not only having a formal way of understanding
and verifying programs, but just being on a more solid
footing for how we build systems. Yeah, no, precisely like this.
Like our aim is very much to refound deep learning in such a
way that you can actually do. Careful reasoning about it, right?
That you can actually do theorem proving about the behavior of
these neural systems. Right. Whereas currently we're hacking
everything together and we, you know, we use benchmarks and
heuristics and all of this stuff, like we can do better. Right.
We know that we can do better with classical programing.
So why can't we do better with neural networks?
You know, lots of people say, oh, you simply can't do that.
Maybe software engineering has to be like neuroscience.
And I think it's like, well, there's definitely going to be
something like that there. But a lot of it is formally
structured so that the key will be to events that which is formally
structured, and pull as much out of that as we can and make that
explicit so that we know the bounds over which this thing is doing.
Some like inscrutable curve fitting, whereas the other in other places
it's doing this very formal stuff. So I mean, that that point is
really interesting that, um, people have said to me that they
think the future of software engineering is like neuroscience.
And what I mean by that is we don't know how the brain works.
We stick probes in and we try and kind of like, you know,
figure out what's going on a bit like the blind men and the elephant
by having all of these different views on this inscrutable mountain.
And, um, it's just a little bit weird, isn't it? Right.
So, you know, we're building these multi-agent systems and they
have all these complex dynamics, and we try to apply engineering
techniques by doing monitoring and alerting and having these, like,
weird thresholds and just fixing problems as, as, as they show up.
And a lot of people just say, well, this is the way things are, this is
the way software is going to go. And as I understand software
engineering, because Keith and I have have written a whole bunch of complex
software recently, and part of the reason for having abstraction is
to create a cognitive interface. So it's not necessarily about
how the software runs because it gets compiled into like,
you know, into low level code, native code on, on the machine.
But it's about creating an abstraction to help us understand
and communicate what's going on and also do like, you know,
forms of verification later. Yeah. Well, and that abstraction is a
domain specific language. And so every, every programmer,
when they're, you know, when they're creating whatever, just, you know.
A bunch of functions, a bunch of modules,
a bunch of classes, like, you know, whether you know it or not, you're
designing a domain specific language. And and we kind of just do it
intuitively without much like, sort of, uh, math helping us.
And I think the point is, like, if I understand both of you correctly,
machine learning and quantum, you know, quantum computing, it's just
beyond our capability to do that. Like, you know,
kind of just intuitively, we need we need math like it's
it's it's just too hard to hack it together anymore. Right? Right. Yeah.
Yeah. So I think that's beautiful. I mean, and I, uh, in fact,
there's a, there's a very old, um, programing book from like the 1970s.
Um, I think it's called like programing languages or designing
programing languages or something, which tries to make this point too.
And it makes it makes two interesting points.
It says, you know, when you're writing a program,
whether or not you know it, you're creating a domain specific language.
And so knowing something about programing, language design is
helpful even to just writing code. So you said there were two or at
least one thing you hope comes out of this project.
Um, Paul is, is, uh, um, machinery to create domain specific
languages for, for machine learning. What are the other things that
you hope comes out of it? The other things that I hope
come out of it are. Completely novel architectures. Yeah.
So completely novel novel architectures that can do kinds
of things that we currently don't know how to do. Right.
So one thing that you might want to do is like, well, you know,
so I'm in industry now. I'm not an academic anymore. Right?
Suddenly I have to figure out how to eventually generate revenue
with this foundational research. Say, what's one of the like?
So consider Microsoft for example. Right.
And this is, you know, a story that we've been we've been telling for
some time is consider Microsoft and all of the different places that
they are deploying AI systems or LMS specifically throughout their whole,
you know, panoply of offerings. The only one that's actually making
any money is copilot. Right. But copilot is not actually very
good, right? LMS are not actually very good
at generating code. Right. You know, they'll they'll like
regurgitate some snippets, but like a staggering portion of
the time they're wrong. The fact that we're used to this
is because like when humans generate snippets,
snippets of code for a staggering proportion of time, they're wrong.
So we're not offended by this, right? But so if you could make something
better such that if you say, have an LM that you can interact
with in natural language, but that actually emulates a type system.
Right, you will end up with an LM capable
of doing higher order processing, higher order organization.
You'll note that I'm intentionally shying away from using the word
reasoning, because I sort of take exception to this language.
I understand it to be the lingua franca of the practice,
but I don't really like the language of reasoning, but I'll accept it.
Exception to it as well. Are we? Very much do.
But can we can we just linger on this?
But this is a fascinating point. Now, um,
there are a lot of people out there drinking the Kool-Aid right now.
So there was the Codex model, which is what is built into VS code,
and that's garbage. And GPT four turbo is pretty
damn good. Anthropic opus is pretty damn good.
There's this thing called Devin, which came out recently,
and people are saying, oh my God, there's no limit.
You know, we're going to start, um, getting this thing to generate code,
and then we're going to fine tune the base model on the
outputs of the code. And it's going to be a bit like
evolution. It's just going to Complexify
and Complexify. And we're going to memorize all
of the types of reasoning and all of the types of things we're
ever going to see. And I think that a lot of the
people who are drinking the Kool-Aid about this just don't
do serious software engineering. And that's because, like,
for example, I was playing with opus three last night.
I got it to generate an animation of a bunch of mathematical, you know,
graphics, and it worked really well. I thought, bloody hell,
it just worked. First time I then said, um, you know,
can you modify it a bit and can you add this thing very quickly?
It diverged and it just it turned into garbage and it was useless.
And that's because it's almost like we're wrestling on the
boundary of chaos. Right? So the complexity of software
blows up so fast it is possible to do something simple.
But as soon as you, you know, you modify it and you change it
and you change it, you're in no man's land very, very quickly.
And I think part of the reason of having this structured approach,
you know, this abstraction approach is to wrestle with that complexity.
So you're making the argument that if you're doing something like a
large language model does now, but in the abstraction space
rather than the low level syntax space that we do now, then we
can get further than we are now. That is that is precisely the claim.
There's a I think it's a relatively recent paper by I
think the team was led by Tanya. Tanya Ringer or Talia Ringer.
Excuse me. I can't remember which university
it's based out of, but, uh, it's a great paper, and it
demonstrates pretty conclusively that transformers just don't learn
recursion. Right? Yeah. Of course. There you go. And so, like.
Why would you ever think that by scaling an architecture that is
likely to have fundamentally architectural, architecturally,
like by construction limits on the kinds of things it can even
figure out? Yeah, right. Why would you ask a system that,
you know, can't do recursion, which we know to be the essence
of programing, at least in some sense, right.
Why would you presume that that model is ever going to be capable of doing
the kind of software engineering that we actually need to do in the world?
I mean, I think it's almost even worse than that because they they can
kind of mimic recursion to a fixed depth, you know, because they have a
fixed amount of computation. Right. But even worse than that,
the real problem is they can't create abstractions because creating
abstractions is it's abduction. Yeah. And I think they probably do
create abstractions. Right. They've got like latent
representations. But the problem is like the
abstractions are not good. They're not principled.
Oh, but I would argue that we create abstractions and we we embed them
in our language and, and then they get memorized by, by a language
model and they can be reused. But what we do when we design
software, we, we, we solve this intractable problem.
It's very mysterious how this happens.
We select an abstraction from an infinite set, an infinite set.
And then we use that to to represent a concept.
And language models do not do that. No. Yeah.
I mean, on the, uh, the recursion thing is really interesting because
we were having a conversation the other day about, um, so there's,
there's this really exciting technology you're working on.
And as we were just saying, it's a way of building a semantically
principled, um, way to design the future of neural network software.
So you're rather than dealing with these inscrutable neural networks,
we're actually dealing with compositional structure, with clear
semantics from, from a high level. But the problem is,
as I understand it, the the end result is still a neural network.
And a neural networks are finite state automata. Right?
So they they don't they don't do recursion.
And then there's this notion that came up because we were talking about
this over dinner, and there is such a thing as a recursive neural network,
but a recursive neural network is still a finite state automata.
The difference is it can we can modify backprop to recurse over
a structured input. So for example, we can put trees
into these things. Yes. So they can do structural recursion.
Exactly, exactly. So so um we can because a
recursive algorithm has a has a kind of stopping condition or a
base case if you like. So we could put a tree in and then
the algorithm can recurse that tree. And what it's doing every time is
it's just embedding, um, everything. You know, you start with the leaf
node and then you go up and you go up until you get to the top.
And it's it's just incrementally embedding that datum into the
neural network. But the neural network itself is
still doing a fixed amount of computation for that input. Yeah.
So so I guess the question is what what we're really excited about.
Peter's been working on this for years is algorithmically
aligning neural networks. And even GDL more broadly is about
constraining neural networks so that they generalize better within
distribution on, on geometric priors. And but the thing is, they still
only generalize within distribution. So they, they,
they don't act like Turing machines, which is to say they can't they
can't address an expandable memory potential. Infinity.
They can't, you know, they can't just go to Costco and get
some more memory when they need it. They do a fixed amount of
computation per datum. Well, so this is you know,
this particular question is outside of my domain of exact expertise.
So I want to be like careful about saying, you know,
anyone who's like says, oh, this person said something wrong.
It's like, yeah, I'm not claiming to be an expert in this particular
domain, but this sounds like a story. It's like they can do coalgebras,
however, right. And coalgebras are things where,
yeah, you can just like keep popping something off, you know,
ad infinitum. You can just like, query it.
Hey, give me this part of this, give me this part of this.
And there's no stop to how many times I could ask for that, right?
So I can do coalgebraic stuff too. But my sense is that maybe this
is some sort of more definitional confusion than it is really a
statement of the kinds of things that neural architectures can do.
Yes, I think I can. I can probably help clarify it
because for better or worse, we've been having a lot of
debates in our discord, you know, recently about this topic.
And so I think I think maybe we'll test it out here today I've
learned how to communicate this, you know, a little bit better.
Like to try and explain the the pragmatic, you know,
problems with it. Um, and so if we think about,
first of all, just, just to set up something like very clearly, um.
If we all understand what a finite state machine is, just imagine
it as a graph like it can. It sits there and it can get input.
And by the way, it can get an unbounded amount of input.
You know, like for example, a soda machine.
Yeah, you can keep feeding it instructions as long as you want.
You can keep pressing the button and putting in quarters in it,
like, you know, all day long. And, you know, uh, stuff will
continue to happen forever. The point is that it has a fixed
number of states that are defined like it build time when it was
constructed, and it just navigates between these states, you know,
always kind of responding, responding to the input.
That's a finite state automata, a Turing machine is and I say nothing
more than but but by the way, guys, this was a genius.
You know, discovery like back in the day, a Turing machine can be
boiled down to nothing more than a finite state machine that has
an expandable read write memory. Or another way to think about it
is it can have two stacks. It can push a value onto one
stack or another stack. It can pop a value off any one of
those stacks and push it down. So it has this expandable read write
memory that's that's unbounded. Yeah. Um, that machine alone,
that simple machine alone, in fact, is powerful enough to perform all
computations that all what are called effective computations.
So any mechanical thing you can construct, any, you know,
physical thing that you devise that has like a digital nature and kind of
do kind of stuff, you know, is no more powerful than this machine.
And that includes, by the way, quantum computing.
So quantum computers are still just Turing machines.
They can do certain problems faster, but they can't fundamentally
solve any problem that a Turing machine couldn't solve. Okay.
So getting to neural networks like it should be immediately
obvious to anybody that that a neural network has that finite
state machine component. That's what the neural network is.
And now you ask, can I hook that up to read write memory. Right.
And have that that neural network do something useful?
The problem we run into is training, training the neural networks to
do something useful with unbounded read write memory.
Like we don't usually do that. So all recursive neural networks
that are trained today, they have a fixed window ahead of time.
That's known at training time where they do like operations on whatever a
context of 100,000, or they they sit there and operate on, um, you know,
100 windows or something like that. And then you take the answer
that you get at that point, like you've got an output signal.
Whenever people try to train them to do unbounded calculations,
like, for example, maybe I'm going to have an RNN and
one of my whenever one of my bits lights up in the output, then I'm
going to treat that as as halting. And I'm going to take the answer
at that point okay. So I'm going to run the Turing
machine during training. As soon as bit number 26 lights up,
I'm going to treat that as the halting bit.
And then I'm going to take the answer.
That could be a Turing machine. The problem is when they try to
train machines like that, we just can't do it.
Um, and so if you think about the space of all possible programs like.
The space of all possible programs are Turing the Turing machines. Okay.
And we're doing a certain search. So today we've got the capability
to construct all kinds of neural networks and use stochastic
gradient descent or whatever. We're searching that space of all
possible programs, and we're finding one that does something useful.
What we're finding is that that space that it searches is a very small
subspace of all possible programs. And it's really those that are
that are also reachable by finite state automata.
So we're not getting out of that boundary up into context free
grammars, context sensitive grammars, recursively enumerable languages.
So there's this huge other space that we just don't know how to
effectively search efficiently with machine driven techniques yet. Yeah.
I mean, I will confess I could be wrong, but I don't think that that is
I think that's sort of an accident of particular definitions and the
way that we've been structuring neural architectures. Right.
And what I mean by aspects of definitions is like, yeah,
if you pretend that your thing is of like fixed depth, but if you
allow feedback into the system, then suddenly you can get I do think
you could get access to that, that, you know, higher order computation.
Well, I think it's actually just an accident of our of our,
our optimization procedures. It's really it's an accident.
Not so much of like the architecture is, is fine.
Like you said, people do have, you know, RNNs that have feedback
and, and all this sort of thing. But it's an accident of the way
we train them. So when we when we go to train them,
nobody currently trains in the way that I just said.
We're like, when I do a training pass and I go to propagate my gradients,
I never sit there and run the neural net, the RNN until a particular bit
lights up and then take the answer. I run it for a fixed number of
steps and take the answer. Yeah, well, because the problem is if
you if you try to run it until a bit lights up, it may never light up.
Well, no, you can halt. Uh oh. Okay. So I see, see the concern.
Well, so what if you consider that? Hmm. Yeah.
I mean, I don't think this is a real problem.
I think this is a question of misconceptions.
Maybe some that I have or maybe some that are popular in the the ML
community, to which I, you know, I'm like a quasi expert in like,
CT world and, you know, like I'm learning, you know,
using CT to, you know. Speed up my process of, you know,
developing expertise in ML. But I don't think that.
This is a permanent problem, like in the sense of like,
just like something about neural computation makes this impossible.
I don't think that's the case. Right? Right.
I think in particular, you could do this thing where it's like, oh,
if this bit lights up, it stops. I do think that that's possible.
Right. Well, so and in particular,
I mean, and I agree with you like it is possible.
And so there are attempts at this like the differentiable
neural computers. Right. Or some configurations of,
of neural Turing machines. But the it seems like the universe
is conspiring against us. Because the funny thing that happens
is when you generalize those neural networks and try to train them in
this way, it just becomes so much more difficult to train them.
I mean, maybe we should discuss, um, the so what question.
So the the thing that Petr has been driving at is, you know,
this algorithmic alignment of neural networks and the hypothesis
is there are certain fundamental forms of reasoning where you do
require this kind of, um, you know, quantification over potentially
infinite sets, um, just computing the nth digit of pi or just sorting,
um, uh, you know, like a set of numbers to, to arbitrary length.
Now, the the incredible discovery of neural networks is so many
things in the world, you can essentially interpolate within
distribution and it just works. Now I'm a big believer in this
system two idea that, you know, there's there's a lot of gnarly
perception type stuff. And then there's this system to
where we do need to do this kind of symbolic reasoning.
Is there a space where we need to have Turing computation for AI?
Even if you could solve all interesting problems, you know,
to humanity with, with a, with a big enough Turing machine,
you know, like you, you construct one the size of the sun and, you know,
it's made out of neutronium and like, whatever else and like,
it can finally do everything we want. You can do it with a lot less energy,
a lot less matter, you know, if you can, if you can use, uh,
programs that that can utilize some unbounded, you know,
input sensitive amount of memory. So I think like that's again like
it's about this program search thing, which is, um, the best solution
in the sense of like ones that are practical, energy efficient,
whatever many problems, the best solution lies in this broader space
of kind of Turing complete problems. It's certainly been the case for
human programing, like when, you know, people could just
write programs that were fsas, like, we could have done that.
We never needed to develop, you know, von Neumann machines and kind of
like what's sitting on your desk. But we did because they were
extremely, you know, useful for solving real problems.
And and I agree with Paul that there's no theoretical limitation why
you can't train a neural network, you know, to to search the space
of Turing programs. It's just we haven't figured out
how to do that yet. And there's some tricks that
we're missing. And I'm just saying all the current
architectures do not search that space. They search a small subspace.
And there are many interesting problems outside of that space. Yeah.
Can I frame it a slightly different way, which is there.
There are two directions here. One direction is what Francois
Chollet says. And you've just been alluding to
Keith, which is that we build a kind of AI, which is almost
purely a discrete program search. And the the other school of
thought is we do some kind of neurosymbolic combination.
And Josh Tenenbaum has been doing a lot of interesting work here with,
you know, Bayesian program search and Dreamcoder and so on,
and even things like fun Search and Q star and Tree of Thought
and stuff like that is a step in this direction, which is to say,
we start with this interpolative continuous vector space,
which has been trained on a whole bunch of stuff, and then it's
still traversable in that space. So if we do some kind of search
controller on the top, I mean, of course it's not going to be
searching the space of Turing machines, but it will give us a
margin on the top, which will find all sorts of interesting things.
So just doing this Q star algorithm where we're in an LM
and we search a whole bunch of trajectories and we have some kind
of cost function or whatever. And that margin might give us a
dramatic improvement. But I still think that there's
there's a whole space that we're missing that is not traversable
through neural networks. Let me just say as well that what's
happening right now is the real intelligence of of these GPT models
is the collective search process. It's the collective intelligence.
So we're all using it and we are just putting different inputs into it.
So we're a bit like Tree of Thought. We're a bit like Q star.
All of these people all over the world are just finding these
interesting prompts, and they're kind of traversing the the input space,
and they form like a layer of intelligence on top of the LMS.
And I guess the question we're asking is maybe the first step would be
just to replace that human search process with something automatic.
Well, I mean, I'm just going to jump in real quick and then let Paul go,
which is which is a two things. Again, there's no in my mind
there's no theoretical reason why. So again,
a Turing machine is a finite state automata with a read write memory.
There's no reason that finite state automata can't be a neural network.
It's just that we haven't yet figured out how to train neural
networks that function in that way, at least not for general purpose,
efficiently, etc. and I think like the other, as to whether or not
like which direction is ultimately going to be the more successful.
Pragmatically, I don't know. It's a big mystery to me,
but it seems like, you know, the efforts of, you know, Paul and,
you know, and George's company. Is working towards that route,
which is like, give us, give us a domain specific language.
That's like, in a sense, uh, many generations higher than Dreamcoder.
It's like it's like a domain specific language that lets people program,
um, these, these complicated, you know, modules and machine
learning more effectively. And I see no reason why that
wouldn't also empower, um, automated, you know,
programing programing as well. Yeah. Broadly, I like that story. Yeah.
Like I, I like I think I've been confused when this question has come
up because I thought it was the case. I thought it was the claim that,
like, we somehow can't do this and I don't know, like I might be wrong
again, but I don't at my current level of understanding see any
like fundamental obstruction to developing neural networks that are,
you know, Turing complete or whatever the magical language,
magical words are for it. Right? I don't actually see any like,
essential obstruction. We may not have figured out how
to do it yet, but I don't see why that's impossible.
Well, they have to be a neural network with a read write memory.
Yeah. And once once they have that. So they're augmented, you know,
neural networks. And then once we figure out how
to train them. Bikes kind of open season on on
every computable problem. Yeah. I mean, I think we can agree that
it's probably possible in principle and currently not in practice.
And it is a very, very difficult problem.
I mean, we can we can all agree on that. Yeah.
Um, well it's currently it's definitely currently not not,
not nobody's doing it, you know, that I know of currently.
I think it is really difficult. I think there's some magic tricks
that we haven't found yet. And they may lie in this direction
of, you know, starting to apply category theory to, to, to design,
um, domain specific languages that, that have the semantics that we need.
Yeah. I mean, another thing was when I,
when I first read your paper, I assumed that there was some
kind of I guess I visualized it as a kind of compiler.
So now it's, it's a bit like, imagine we had something like
Jax and we did to Jax what TypeScript did to JavaScript.
So, you know, we add typing to it, we add compositional semantics
and so on. And what we've done is we've now
created this whole like interface basically to
understand how to build the next generation of neural networks.
And then the system is a bit like a compiler.
So it will compile down to neural network modules that have,
you know, GDL constraints or algorithmic constraints and so on.
And in the future, maybe even symbolic networks
that it would just it would just automatically do all of this stuff.
Because what we want to get to is we need to get away from the alchemy,
right? We need we need this compiler to kind
of do the hard work for us so that, you know, just all of these problems
that we're seeing at the moment just are a thing of the past.
I have a I have a I have just a quick question because like,
is it is one proper way to think about this, that. We are.
We are pushing a lot of a lot of the work.
We're pushing a lot of the necessary semantic work to syntax where it can,
where it can then be automated. Like that seems to be really
like the story of mathematics, too, as a whole.
It's like creating new syntaxes and new sets of rules,
new domain specific languages that effectively systematize and
allow to be automated. You know, the necessary reasoning.
Yeah, I think that the way that the mathematician might phrase this, and
I think this is what Andrew actually provided a very nice description of,
is, you know, the history of mathematics has taught us that.
Abstracting a lot of the specificity away and making your problem
statement more and more general. Well,
you might think would make it harder, actually makes it easier, right?
Because the point is, if there's a lot going on,
I don't know what to grab on to. But if I reduce the things that
I'm allowed to grab onto, I can think more clearly about them.
And the elegance allows me to reason carefully about them.
And here I do mean reason, right? In the sense of coming up with
reasons for things right. Why does this do this?
How does this work? Those kinds of things. Right.
So you abstract away the sort of like essentially analytic detail
to this, to a problem, and then suddenly it becomes tractable.
It becomes conceivable. I would say that what really
category theory does is it abstracts the idea of a semantics.
Right. That's really what it is. Right, is because you say, oh,
what is what is a category? It is a universe of some
particular kind of thing. You know, your objects are the the
instances of that kind of thing, and then your morphisms are the
computations that go from one instance of that kind of thing
to another instance of that kind of thing. Right.
So it is really a theory of abstract semantics.
Just a quick question on that. Like you're you're aren't aren't
you allowed to have more than one type of object in a category or. No.
Or does that have a different name. Well, so any any so what do you
mean kind. What what does one mean? I mean, can I have a more morphism
that transfers an object of type T to an object of type S and
then a different morphism. Yeah. But they are all objects of the
same category right. Different types can be category
but different types. Exactly. So the point is like the notions
of object and type are not are not exact.
They're not exactly the same thing. Right.
You know, for example, if you're thinking in terms of Haskell, which,
which is like the most likely way that someone like listening to
this or watching this like this is going to be the most familiar thing
is there's this category, Hask, which is whose objects are all of the
possible types that can be formed with Haskell, and whose morphisms
are all of the possible function, all of the possible well-typed
functions. Okay. Right. So is there is there a particular
name for a category with objects of more than one type?
Well, it's type category or something.
I mean, I'm sure that I think it's the kind of thing that once
you stare at it long enough, the question ceases to make sense.
I think I think it's that kind of thing. Right.
Whereas it's sort of like it's it's that it's trying to like, mix these
metaphors a little bit too rigidly. That introduces that I would say is
like any given category is like. The universe of a particular
kind of thing. Right? And then if you want to talk
about categories where you've got multiple different kinds of things,
those you're actually talking about working in the category of
categories and talking about functors and natural transformations
between functors. Right. You're talking about relations
between this, the the universe of things of this kind and the
universe of things of this kind. And, you know, translations from
the universe of things of this kind into things of this kind.
So I'd say for any given category, you've got one kind of thing.
But the point is, that doesn't mean that category theory,
at any given time, is only about one kind of thing.
It's actually about relational, structural relationships between
the universes of different kinds of things all at once.
I'm going to have to go back to those those tutorials, by the way,
there there are these online tutorials done by, uh, Brendan Fong
and some others that those are the ones I was trying to go through.
Yeah. Uh, no, those are great. And and it's fun because, um.
You know, it's really enjoyable when you come across a topic
that that blows your mind. And the way, like I, I kind of
quantify a mind blowing thing is, is if I understand it the day that I
learn it and then somehow the next day totally no longer understand it.
It's kind of like it's kind of mind blowing.
Um, so here's a here's just kind of a funny question for you.
And answer this any way you want, but, uh, which do you think is,
uh, more mind blowing math? Um, category theory or number theory?
Oh. Category theory. Um, the thing is, number theory.
I don't understand it well enough to have my mind blown. Right?
I don't know what the weird results of it. This this better.
I don't know what the the weird results coming from number
theory mean. I have no way to interpret them.
It's like what? There's a number that does that,
right? This particular number field is like,
you know, has some weird, you know, abstract property
relative to another one. I don't I don't even know what
those things mean. Right. You know,
because I'm not a number theorist. I haven't spent forever thinking
about this. But, you know, I have at this point
spent, you know, like at least ten years thinking about categories.
And so I've found it to be a lot more mind blowing, because to
have your mind blown, you have to understand what it means. Right?
And I do feel like at least whether it is my own sensibility
or essential to the discipline, I won't say, but I kind of think
that category theory is more mind blowing. Otherwise I'd be.
Otherwise I'd be a number theorist. But, um, you know, no, I like that.
In order to have your mind blown, you have to first understand.
So that can be, uh, blown away. Yeah, because otherwise it's
just complete mystery. And it's like,
mystery is not interesting, right? UN mystery that has the flavor of.
I can understand this. That's interesting.
Otherwise it's just like, huh. That's weird.
And I move on with my day. Got it right.
It's got to it's got to trigger the sense.
It's like, oh, this something weird happened and I can make sense of it.
I can make sense of it. I haven't made sense of it,
but I can, right? I have to have that taste.
Otherwise it's boring. That makes sense. That makes sense.
So, um, one other thing, um, that that confused me a little bit
about the category paper is, um. You were talking about being able to,
for example, um, create an analytical pathway, if you like, between,
um, let's say a list or a tree. And I, I'm understanding the
work as being about, um, data rather than algorithms.
And I think the way I understood it was, again,
I was kind of brought up on Peter's work with algorithmic alignment.
So he's thinking about, um, you know, getting neural networks to behave as
if they are doing sorting or doing like Dijkstra or something like that.
And I was thinking about, you know, this, this could become a kind of
system to toolkit where I need to have these algorithmic primitives
and I can compose them together. So do I understand correctly at
the moment that it's almost about, um, like a types of data rather
than types of algorithm? I don't know if those are
actually that different. Right. And the reason why is if I have a
well-typed function from one data type to another one, that's both
about algorithm and data, right? That talks about what you're
allowed to do with data is exactly what the algorithm does. Right.
So the point is that I don't think that that distinction is meaningful.
Interesting. Hmm'hmm. Yeah. Can I, I mean, I think I,
I sort of, uh. I have a lot of sympathy to that
point of view, because this goes back to, um,
when I when I first learned about, you know, algebraic data type
theory and its application to, to programing languages,
I started to really realize, yeah, you know, data and, uh,
I don't know what the right word is, but but data and type data and set
of transformations allowed on that data are, are almost intertwined
into the. You know, like a saint. Like one isn't really useful
without the other. It's kind of a weird duality or
something. I'm not sure how to phrase it.
Yeah, no, I mean, like, I might, you know, hazarded some sort of like,
you know, dialectical fixed point is like one informs the other
and like the notion of type and well-typed function reinforce
and define each other. Right. It's a very much a functionalist
kind of kind of thing. Right? You know,
type theory is the core of like, you know, a syntactic treatment of
a synthetic mathematics, right? At least that's, you know,
the weird, you know, not fuzzy, but like rarefied and
unbelievably abstract world that, you know, through which I
discovered type systems. Right. But yeah, maybe, maybe another
way to think about this. Maybe this is related to, um,
get your opinion on this, Paul. That may help Tim.
For us to think about is, you know, for many, let's suppose you have
and I'm just going to say class, I'm not trying to be like object
oriented or something. But suppose you have a type,
you know, that's that's interesting. Like whatever complex numbers,
you know, you're you're doing a lot of programing and complex
numbers are useful. Yet there are many representations
of that complex number. You know,
you could like literally store it as a piece of like literal data.
That was an angle and a magnitude or two, you know, x,
y coordinates or whatever. There's many possible data
representations of it, and it's just irrelevant.
It's like you choose one based on like, you know, whatever works,
whatever's efficient, whatever. I feel like, you know,
whereas all the, all the important and interesting stuff is, is and
is at a higher level than that. I would hazard that I would be
uncomfortable with the statement that it's irrelevant. Right.
And in fact, so here's one of the like. So you know.
Maybe this is not what happens if you're like in graduate school
for mathematics, in places where category theory doesn't have this
sort of like, you know, somewhat deprecated history using deprecated
in this kind of like a slang way. I was like, oh,
this is kind of like not approved kind of mathematics, right?
You know, North American math, like specifically American Academy.
Yeah, exactly. It's heretical. Or like, people don't like it so
much. Like why? It's like, well, because it doesn't
feel like you're doing anything right. Why do you do mathematics?
What is the pleasure? It's like, I want to feel like
I'm doing the computations. And category theory has this
nasty habit of telling you that you didn't do anything right.
But it's like that scene in good Will hunting where you know, they're up.
There's some circles and lines drawn on the board,
and he goes up and erases one and draws a line somewhere else,
and they're like, oh, you know, it's like, you know, well, what?
Yeah, I mean, the thing that it's sort of about category theory is
actually, you know, Grothendieck had this statement that was like,
if it's not obvious, you're not ready to work on the problem yet.
So and, you know, he had this whole, you know, principle of like theory
building for problem solving. And one of the aspects of theory
building for problem solving is you develop a vast library of equivalent
or at least related representations of similar or the same concept,
and you solve a problem by finding the one in which it's obvious. Right?
It's sort of it's a shamanic mathematics.
It's a mathematics about understanding something and then not
really having to do that much. Right. The point is, think about this in the
right way and it becomes trivial. This is or not trivial,
or at least like clear, right? You know, this is very much like
the task of science is to carve nature at the joints. Right.
Like that's the thing is like, well, then you have a bunch of different
ways you could carve this up, find the one in which it's easy.
Right. So that makes sense. So I mean, like thinking about
all the different possible representations of complex
numbers as a data representation. It's still again informative of
the algebra that you're, you know. Yeah. Exactly.
Which what what are you going to do with it. Intricately linked. Yeah.
Like what are you going to do with it that determines which representation
you should use. Yeah. Yeah. So you're telling me the scheme
people were right all along. That code is data. It's just.
Oh, absolutely. No, they were completely right.
Yeah. No, I. What was it? Uh, 10th grade learning scheme.
Um, yeah. Uh, that definitely informed my
sensibilities a lot more than I realized.
You know, because I went away from doing, like, math and science
for quite a while before coming back to it as an undergrad.
And as I've gotten deeper into mathematics, I was like, oh,
I really did drink this. Like functional programing kool
kool aid of this, like deeply pure 1970s variety a long time ago.
And whether I understood that that's what I had done, I absolutely had.
But I do think that they were right. Like the difference between data
and code is, is like there isn't one, right?
Actually, like the difference between data and code pops up in
neural networks a lot, right? You know, people like thinking
like data is like, you know, training data is like what is like
trading data is the goddamn program. The neural network is the compiler,
right? You are compiling training data into
the final state, which is going to be the function that you're
actually going to use, right? This is why I think eventually all of
the copyright lawsuits against the major AI companies will go through
because they are using copyrighted data to, to program their machines.
That's exactly what they're doing. It's not learning as bullshit.
It's it's it's programing. Right? It's a weird programing that works
by way of like differential, like sort of, you know, like a slightly
mutant differential geometry. But it is just programing, right?
You are literally programing your models with data. Right.
So any distinction between data and code is nonsense.
That is very interesting. Some great soundbites from this
conversation, Tim. Uh, for me, for me, it was kind of
a compound of little different, you know, things that I was taught.
Like when I learned that you could exponentiate matrices. Yes.
You know, I was like, wow, this is weird. Uh, yeah.
You know, when, uh, when I thank God for power.
Yeah, yeah. Oh, yeah. Yeah. Well, that's another cool thing
about power series is when when you finally learn y, you know,
e to the I pi equals minus one. Yeah. And you look at the power like
that's that's kind of mind blowing. And then, uh, and then when I,
when I saw how to construct, I think it was Peano arithmetic from
this kind of algebraic type theory, you know, perspective and things like
that just all adds up to the same. Really fascinating.
Yeah, I don't know fact about the universe or fact about logic
or something. Yeah. Um,
so we've been using some terminology like syntax and semantics.
Can you, can we just like, zoom out a little bit?
What do those terms mean and how are you using those terms in
this context. Okay. So what. I mean to say is okay, syntax.
Formal expressions that you do not assume have any meaning,
but instead you just have rules that allow you to combine that thing.
So purely formal constructions, usually in terms of symbols,
whether it's graphical or text, that sort of irrelevant.
But the point is it's purely formal manipulations of symbols. Right.
And semantics is when those are interpreted as meaningful. Right.
And the kind of two category theory that I've been talking about is
like an abstract semantics, right? Whereas this category with like a
single object in these endomorphisms, this group,
this like classifying groupoid of a group thing that is the.
It's sort of like that's the categorical avatar in this case of
like the syntax of being able to write g dot x and g prime dot g dot
x, that kind of thing. Okay, okay. So, um, you know,
from a linguistics point of view, you know, like people understand
semantics to me, to be the meaning, um, the aboutness of, of a word.
So, for example, I could say Wagga and dog and it would point
to the same cognitive category, which is a dog.
Um, but in, in this framework, just help me understand exactly
like what is the semantics? That is a good question, right?
Because you're asking about semantics in this very
particular way about as like, what is this thing about? And. Well.
The syntax allows you to write down constructions or operations
which are manifest in the semantic category. Right?
So the category is the world in which those in which that syntax
is interpreted as, be it things like elements, or be it things
like functions or be it things like transformations. Yes. Right.
So this so the category that you're so there's this semantic category.
And then the fact that they're also encoding like syntax as a category
itself that ends up, you know, opening up another sort of can of
conceptual or werm sort of breaking down like the easy distinction
between syntax and semantics. But the point is the story that
they were telling, and this is sort of instructive.
And this is sort of very much what Andrew says is, okay,
so you start with syntax. You can write that down into the
universal category that has that syntax interpreted as morphisms.
And then you look at functorial semantics of that.
So that's the interpretation of this universal category in which those
terms have meaning. Yeah. Right. Which means they have no other
meaning other than the formal relationships between them that
you specified in your syntax. And then you take a functor out
of that thing into something else and that interprets them.
So that's, this is, that's this notion of functorial semantics.
The story that I'm talking about is I'm not even worried about the ever
writing down any individual terms. I'm just interested in sort of this
two dimensional graphical picture that allows you to talk about what
kind of thing algebra is at all. Right.
So for the story that I was interested in this and the way I
come to this is we don't, don't even really ever study the elements.
Like I want to get away from that. Right.
Because the elementary perspective, while it's useful and helpful for
like the analytic construction of these neural networks, it's not
that good for proving properties of those of those networks. Right?
You know, it's pretty easy to like you write an
expression that has 5 to 10 terms. Suddenly it's very confusing.
Whereas you draw an element like a diagram that has just like 5 or 6,
you know, 1 or 2 dimensional pieces, and you can derive formal properties
about it's a little bit easier. Okay, okay.
I mean, because we're saying earlier that, um, we could use the example
of group theory and I could look at all of the, like, symmetry
preserving transformations of a triangle and the aboutness, you know,
the semantics is the triangle. Yeah. Is the transformation the.
Exactly. It's about the triangle. It's about the actual
transformations of the triangle. And then the syntax is just these
things as elements of the abstract symmetry group that that makes sense.
So when I was discussing this with Andrew, he said that the
reason this separation is useful is because if you think about it,
it's like you're talking about you're talking about two, two worlds.
And he was arguing that we want to or we have less representational
ambiguity in the world of semantics. Yes. So there's a kind of asymmetry.
So for example, you know, using this, this um group
analogy of the triangle, we have these high precision,
you know, we can we can actually draw a grid of all of the, um, you know,
symmetry preserving transformations. And now we've got a very high
resolution way of understanding that triangle.
But I guess the question is, though, is it is it really true to say that
there is less representational ambiguity in the semantics?
I guess, like the question I'm asking here is, um, are these
semantics ordained by God or are they purely a human construct?
I mean, does semantics just mean something to humans?
I wouldn't say that they are ordained by God, but I would say that what you
can do is like you can have like the initial semantics of something,
which is some. Right? So if a category is some like
abstract universe in which stuff does, in which things like a universe
of a kind of thing and then morphisms between that kind of thing. Right.
You can talk about like the initial semantics of something. Right?
And this means the one that that like the semantics that has only the
properties that are necessitated by the syntax and no other. Um, right.
And that is genuinely necessarily less ambiguous than any particular
syntactic presentation of something. Right.
So I'm not going to be able to come up with two distinct
presentations of a group. But the point is, I can write
down things like groups by way of generators and relations. Right.
But there may be and usually are, many different ways I could write
down the same algebraic structure with different syntaxes.
However, there is only one semantics for that thing. Um, cool.
Now, um, so so you've written this paper, right?
If you were to give me a one minute elevator pitch of the paper,
what would it be? Title of the paper is categorical
Deep Learning an algebraic Theory of Architectures.
Uh, one thing that, like the. You know, hardcore category theorist
amongst us might take objection to is that the words algebraic theory
actually have a formal meaning in category theory, and that's not
actually exactly what we mean, right? An algebraic theory to a category
theorist is like a linear theory. It's a it's one of these syntactic
categories of like the allowed manipulations, stuff like that.
Whereas what we mean is specifically architecture as being the semantics
of the sort of these universal algebraic notions. Um, so.
That's the title of the paper. Gotten distracted by this statement
that we said algebraic theories. Maybe that's not the best name,
but it's a pretty good name. Why is it a good name?
Because the point is that we want to say that all of the things
that have been studied in GDL and significantly more examples,
all of them are in fact instances of algebraic structures,
be they structure preserving maps, i.e. morphisms between algebras
for monads or structure maps of algebras for monads or the
appropriate dual constructions. Say these are sort of these like
universal structures, and then we're interpreting them
into a two category whose objects are vector spaces, whose morphisms
are parametric maps. Right. A good example of the kind of
parametric map that pops up in deep learning is a ReLU network
is a parametric function, right? Because for any given choice of
weights in all of your various layers, right, you get one function,
but you don't really want to think about those functions as distinct.
You want to sort of group them all together.
And so this is why there's the innovation of this notion of the
parametric function. And then you want to talk about being
able to change those parameters. So that requires the introduction of
this notion of reparametrization. Right.
So specifically what we develop or we don't develop it in this paper.
Right. This para has been around for
some time. But what we really do is we say,
hey, if we do two dimensional universal algebra valued in these
categories of objects, parametric maps and reparametrizations,
that allows us to talk at a very high level of abstraction about the
kinds of things like the geometric priors of GDL or the structure maps,
which are exactly the representations of groups that
are like the foundation for GDL. It allows you to talk about it
in a far more abstract way that more naturally generalizes to
algorithmic constraints, as opposed to merely sort of conservation law
type constraints, which is that which groups are really good for.
Okay, so let's just talk broadly about abstraction. Right.
So what is the purpose of abstraction.
And in this particular case give us an example of a powerful
abstraction that would, you know, change the way that we reason
about neural networks. Okay. What is the purpose of abstraction
from a mathematician's standpoint if you want to be utilitarian about it?
Uh, the purpose is. Do I want to climb a mountain with a
lot of incredibly specific gear, or do I want to learn to be a
mountain goat? Right. That's really what the purpose
of abstraction is. The purpose of abstraction is
getting away from all of these specifically adapted tools and say,
okay, these are the tools that I'm going to allow myself to use for it.
Suddenly, the problem is more general.
And like you might think that that's going to make it harder,
but it doesn't. It just teaches, you know, don't hold
on to the like, individual unique specificities of each instance.
Those are distractions. Those are not the structural
properties. Right. Those are the those are those are the
specifics of specific instances. You want to get away from those
because those are distractions from how you solve the problem
in a way that's, you know, actually comprehensible. Okay, okay.
I mean, I think there's an element of there's a sea of complexity
that we need to tackle. But, um, the Mountain goats,
quite an interesting example because I would say that's
actually an example of specificity rather than generalization.
But but then we get to Canalization. Yeah. So that that's fair. Right.
This is a question of mixed metaphors, right.
You know, and getting getting excited about
one and getting stuck in it. Right. You know, so this is a train of
thought that gets a little bit too far down in one direction.
But what's what's the point? The point of abstraction is to get
away from all of this extra baggage that you might think is providing
you information, but isn't right. You might think that like the
easiest way to solve a problem about the natural numbers or about
the integers would be to like, know a lot about what how two
works or how three works. Turns out, it's way easier to develop
the abstract notion of a ring and and develop this like massive library
of theorems about rings. Right? And it's actually by abstracting
away any of the details that forces you to think in a principled way.
Yeah. So I get all of that. And duality, for example, is,
is is a wonderful feature of category theory because you can prove
something in one domain and because you have all these categories set up,
you have you have the morphisms and so on, then that propagates.
And I guess this was something that we used to do manually in
scientific theories. You know, we would have to kind
of prove duality on a case by case basis. Yeah.
So the and this is a sense where like category theory allows you to do
it once and then interpret it into various contexts and see it as an
instance of the same thing. Right. But coming back to the mountain
goat example then. So because because that's a
weird juxtaposition. It is because it's both
specificity and generalization. The reason I said canalization
is because, you know, we we know how evolution works, you know,
and how how living things work. There's this canalization.
So like, you know, this, this goat, it might seem like it's the,
you know, the perfectly, um, well adapted and specified,
you know, physical form to, to work in this situation.
But actually, if you, um, you know, we're talking about compositionality.
If you decompose it into its component parts, those parts are,
you know, they represent the ultimate form of generality.
Yeah, I would buy that. Okay. But but then one thing I think
about here is, um, just how how fundamental, how universal these,
um, general components are because, you know, we want to build a
structured picture of the world. Yeah,
out of building blocks which appear universal and probably they're not.
Maybe some of them are universal, maybe some of them are not universal.
But, you know, we want to have the Lego set for the universe.
No, we definitely want to have the Lego set for the universe.
I don't know what the Lego set for the universe is,
and I doubt that there's only one. But what the last, you know, 60,
maybe 80 years of mathematics have told us now is that category
theory is a damn good one, right? It's a principled theory of
abstraction, right? It says, okay,
how do you what is a good abstraction for a notion of structure?
And what are the principles by which you should reason about it? Right.
And it becomes this task of, hey, you know, the category theorist sees
a problem in some particular domain and says, what are the tools that I
know that kind of map onto this? How can I it tells you which parts
of that might actually matter and which parts of it might not. Right.
So you develop a categorical abstraction about it,
and then suddenly it becomes often easier to think about and it becomes
easier to prove things about. Yes. And I think that the miracle of human
cognition in particular, as we were alluding to with Keith earlier,
is, is this I think abduction is the same as categorization.
So I think, you know, we we select these categories,
we create, maybe we select them, maybe we create them.
I mean, that's an interesting philosophical thing.
But but we have we have the ability to do that.
And we're kind of we're carving up the world.
And it's our way of overcoming the sea of complexity in the universe.
I would buy that. That's that's a a good story is
you don't, you know, it's sort of like, you know,
what happens in neural networks is like, do you want trillions
and trillions of parameters that allow you to recreate exactly this?
Or do you want a much lower dimensional space of explanations
from which you can rederive and constantly be checking against
what you're experiencing? Yes, and this is the story of
parsimony in general. So on the one hand, you could have an
inscrutable large language model, which is just a sea of neurons.
On the other hand, maybe there is some kind of
decomposition of language model. Maybe just forget the language model.
Maybe there is a decomposition of reality and no one knows how complex
that thing will be, right? You know? Maybe it's possible to really
decompose it down into like a thousand types or something
like that, but that must be a better way to go than to have
the inscrutable language model. I definitely think so.
And that's definitely the bet we're playing with.
Symbolica okay, now there's the question of what
we mean by reasoning as well. So I think like we intuitively agree,
I think that in order to reason, we have to be working on this kind of
decomposed, parsimonious view of, of, of the universe and the
there's a relationship between reasoning and generalization.
So type two generalization, as we understand how I,
how I reason about chess, for example, is we have this discrete
categorical model of the chess board. And reasoning is kind of like taking
trajectories and doing various forms of deduction and inference
and so on in that discrete set. So that's reasoning.
Um, is reasoning more complicated than that?
I mean, maybe what do you think reasoning is?
My definition for what reasoning is, is heavily informed by what I know
from the work of Sperber and Mercier. Right?
I am really influenced by the idea that what reasoning is, is the
coming up with reasons for things. Reasoning is about explanations,
right? So this is one of the reasons why
I sort of bristle against the the sort of common ML use of reasoning.
Right, is I'm not entirely comfortable with the term because I
think it's a category mistake. Right? I think reasoning is coming up with
explanations for things. Right. Deduction is really what we're
talking about. Or induction, right? Things like that.
That's mostly what we're talking about.
But and so we say inductive reasoning and deductive reasoning.
And I know that that's sort of popular language.
But I've sort of fixated on this perhaps idiosyncratic use of the
word reasoning that has sort of informed how I think about a lot
of the stuff in the last couple of years. Interesting. Yeah.
So, so, so, um, there is a school of thought, as you say,
that reasoning is deduction, induction and abduction.
Neural networks definitely do not do deduction.
I mean, if they do do it, it's only because they've
memorized instances of it. Um, they possibly do do a form of
induction and they definitely do not do any abduction, which is,
you know, like generating reasonable explanations for things.
Or if they do do, it's only because they've
memorized human instances of it. Does it matter if because this
is kind of like, you know, what Bostrom says about, um, intelligence
and goals are orthogonal. And what he's saying is that,
um, goals are universal. And this is a really common
thing in symbolic AI. And, you know, people think that
intentionality means that we have goals and these goals are fixed.
So the reason you behave the way you are is because you had a goal.
And that goal is like a universal thing. It was always there.
And that guides your behavior. Another school of thought is
that intentionality is as if, which means like you have agency.
If, um, you could construct an explanation
that explained your behavior about yourself or other people,
but is that a good explanation? If it's just a model rather than
being what you actually did? Um, it's a really good explanation
if you can make that thing formal, right? If you can say, okay.
Here is my sort of purely syntactic description, say,
of what that reasoning is. And then I can have a formal
interpretation of that and say, actually,
this did work like that, right? It's certainly not going to be
the only explanation. But the point is it is a good one.
Right. So that's that's the kind of thing
that I find most interesting. And what but what makes a good
explanation is, is it just predictability or is
it intelligibility or is it something else?
I think intelligibility is hugely important.
An inscrutable reason is like having no reason at all. It's exactly so.
So we intuitively agree that the reason why language models are it's
not so much that because there is obviously a reason why they do
things, but we don't understand what the reason is, right?
And we intuit that the reasoning is not robust because it.
First of all, if you just look at the activation
pattern of a neural network, it's like you perturb the input by a
tiny amount and it's almost like it activates completely differently.
So so we think that the reason it does stuff is super, super brittle,
whereas we think that the reason we do things is robust and maybe
we're just being human chauvinists or whatever, but but we kind of
think that that kind of reasoning has an elevated status. Yeah.
I mean, I think it I think it does, but I don't think it's elevated
in the sense that neural networks can't do it.
I think it's that existing architectures don't.
So what would we need to do to a neural network to make it more
robust in that way? Um, this is in some sense what
the paper is about. Right? So the paper is about how you
might talk about algebraically structured neural networks.
So is this almost like when we deliberately canalize a neural
network? We are I mean, obviously this is
this is all on a spectrum. But by reducing this kind of chaotic,
you know, random trajectories, random activations and so on,
by by creating a robust activation pattern, which is perhaps more,
if not completely intelligible, then it's going towards reasoning.
Yeah, I think so, yeah. So we should do some category theory
101, because I'm sure that the, the audience know nothing about
category theory on, on average. So in your paper you actually had
a really beautiful introduction or a primer to category theory
where we kind of like go through the concepts one by one.
Can can we do some of that now? Yeah. I'm glad you think so.
Um, so one story of where you get to categories is the one that Peter
will have told you, you know, 20 minutes ago in this in this video.
Right. And he talks about it as okay. So you start with groups and then
first thing you do is that you relax the requirement that all
transformations are invertible, that they can be undone.
And then the next thing that you do is you relax the criterion that all
transformations can be composed. And that does indeed get you to
a notion of category. Right. Because a category, at least a small
one, is a set of objects together with a set of morphisms that have a
composability criterion on them, which is at the end of one arrow,
has to be the beginning of the next one, and they have a composition
law that is associative, meaning if I've got three of them, if I compose
the first two and then the third, or if I compose the first one,
and then the composition of the of the of the second and the third,
I get exactly the same thing. Okay. So there's this associativity axiom.
And there's also an identity axiom, right? Yes.
There is always an identity endomorphism of every object okay
okay. So let's just be clear here. So a category it's like a set of
objects and morphisms. Yeah it is a set of objects.
And then it is a parameterized family of sets of morphisms.
And it's parameterized by pairs of objects one being the source,
the other one being the target. Or in different language,
the domain and the codomain. Okay. So I mean,
let's just sketch out an example so that in the context of like GDL,
what would that be? Oh okay. Let's go do something easier.
Let's look at the category of sets. Right. Right. So I've got.
My objects are sets. My morphisms are functions of sets.
Not all functions of sets are composable, because I can't apply
a function that takes in input from a set of three things to a
set that has two things, right? So that's where the composability
criterion comes in okay. Okay. So let's talk about morphing
morphisms. How do they work. So they're just a this is this is
the magic of category theory is like the sense of how they work is.
Well, they compose in an associative way and there's nothing else to them.
You've abstracted away any of the specificity of those things being
a computation or doing something. They're simply there. Right.
And you're reasoning about or here I go using reasoning again.
But the point is you're thinking and like doing inference reasoning.
Fine. I'll just use, say, reasoning. You're reasoning about how you can
compose them and what properties you know, those compositions have. Okay.
So um, in the, in the GDL, um, show. Yeah, there were these,
these groups and, you know, there could be the group of triangles.
Um, but we're also talking about, you know, things like, um,
so three or things like, you know, ones that preserve rotations and
angles or there is ones that preserve like translational
equivariance and so on. So um, and these groups needed to
have um, like four axioms that, that, that they, um, uh, you know,
conformed to. Yeah. Right. So groups are sets together with
a binary operation and a specified element such that,
you know, that specified element, the unit of the group operation acts
as the identity transformation. If you put it in the first argument
or in the second argument and that that multiplication is associative.
Okay. So so these morphisms, it's a
kind of similar concept. Right. So it's a transformation between
two things. But there has to be some kind of
structure preserving characteristic. Well. This is.
The funny thing is, what you're doing is you're abstracting away
the structure that it preserves. And simply talking about what
kinds of things can you do with structure preserving maps?
It's like, I don't know, but I can compose them.
And so the idea is, hey, let's just talk about what we can know about
mathematics, just in terms of the notion of composable morphisms,
of composable functions. We don't look inside the
functions anymore. We specifically abdicate that
position. We say, okay,
I'm going to reason about how I can compose stuff and nothing
else and see how far I can get. And what's absolutely demented is
how far you can get with just that. Right?
You might think that, no, I really need to see what's
inside these things. It's like for a lot of properties of
mathematical structures you don't write is if you abstract that away,
suddenly the same claims can be made and proved in various
different contexts. So, I mean, so just to really
spell it out, when you say you can compose these morphisms together
because you were talking a little bit before about it almost being,
um, an interface, a bit of Legos, a good example, you know, so,
so there's, there's, um, a square peg, there's a square hole.
These two things go together. So for them to be composable,
it just means that if there's a compatibility between them,
you can kind of, you know, plug them together in any configuration you
want or not any configuration. Right. There's usually only compatible.
Yeah, exactly. There's,
there's like the way that the ways that you are allowed to write.
So suppose I have one function f. It goes from a set x to a set y.
And I've got another function g or yeah that goes from y to z. Right.
I'm allowed to compose first f, then g.
I'm not allowed to compose first g then f. Um okay.
So so we've done, we've done the basic category thing.
Um, let's go to the next level of abstraction.
Let's talk about things like, you know, um, uh, you know,
functors and endofunctors. Okay. So what is a functor. Right.
So let's first we've said a category is, you know, the class
or a collection or whatever. Usually it's a set,
a set of objects together with this doubly parameterized family of
morphisms that you can compose well. There's that's a mathematical
structure in and of itself. So I can ask what are the structure
preserving maps between categories. These are the things that are called
functors. What does a functor do? A functor comprises an assignment of
the objects of the first category to the objects of the second category,
and the morphisms of the first category to the morphisms of
category in a way that preserves composition. Okay.
So again, let's bring this to life, because we were talking earlier
with Keith about, um, you could, for example, transform a list into
a tree or something like that. So it seems like the objects are
very different, but you're still transforming between
them in a way that, you know, maintains these properties.
Yeah. So lists and trees. So let's talk about, you know, lists
valued in a particular set. Right. So what do I mean by lists
valued in a particular set. I can think about the monoid of
lists valued in a set X. I can also talk about binary
trees valued in a set X. Right. And it's still a set function
between them. But it might preserve some portion
of the of the monoid structure of lists and of the algebra for
a monad structure of trees. Um, so let's let's move up one
level of abstraction. So let's go to two categories.
How does that work okay. So. We've talked about categories,
and now we've introduced this notion of functor as structure preserving
maps between between categories. Natural transformations suddenly
pop up. What the hell are they? Well, first off,
they're essentially the reason why category theory is invented, right?
This is sort of like, you know, I think if you ask McLean and the
other guy whose name for some reason I can't remember Eilenberg excuse me.
If you ask Sammy Eilenberg and Saunders MacLane like,
why did you invent category theory? The answer is, well,
we needed enough to explain what natural transformations were.
Right? And so what happens? So suppose I have two functors.
I've got some category C over here and I've got some category
D over here. What a natural transformation is is a
compatible family of deformations. It's a deformation from one
functor into another one. Right. So if I've got C over here and D
over here right. And I've got one functor F and
I've got another functor D, a natural transformation is a two
dimensional morphism that goes from one functor to the other one.
Data wise what does it comprise? It is for every object of the
source category. It is a morphism in the target
category that goes from the image of that object under the first
functor to the image of the object under the second functor,
and it satisfies this criterion called Naturality, which is that
this deformation is compatible with. The morphisms of C interpreted by the
functor F and the functor G, right? So these things called naturality
squares, if you, you know, they're sort of, you know, say,
suppose I had some function f from A to b in the category C,
and I had two functors capital f and capital G. Right.
The natural transformation has components which are going to be
maps of the form f of a to g of a, and those are supposed to be
compatible in the sense that a particular square commutes.
And what is that square? Well, you have f of f goes from f of
a to f of b, you have g of f goes from g of a to g of B, and then
you've got these two vertical things. This one is the component of this
natural transformation at the object A this one is the component of the
object, the component of the natural transformation at this object.
B and the criterion is that this diagram commutes.
What does that mean? That means that if I compose this
thing, and this thing that is equal as morphisms to the composition
of this one and this one, let's talk about monads.
So what are monads? Okay. So what monads are is a gorgeous
abstraction for things like group actions.
Or excuse me, not for things like group actions, but for the
notion of group action. Right? So monads are things that have
algebras. Let's look at a group action for
a moment. So suppose I wanted a group action
on an object x. Monads are hard. The kind of thing that, like I used
to find them really confusing. And now I still struggle to
explain them. But it's the kind of thing that
you're like, well, you've accepted this long enough.
It's like what's there's this notion. It's like there is no figuring out.
There is only getting over. Yes. Yeah, yeah,
it's very much that kind of thing. I mean,
but part of part of the motivation here is Andrew in particular.
Um, his lens of analysis, as I understand it is,
is monads that, that that's, that's his main view on this work.
I would actually say so it was in writing the paper.
Bruno and I came to the DeepMind team with, hey,
let's talk more about monads. Right. They were thinking in terms of let's
talk about functorial semantics of syntactic categories. Right.
And we said, well, fortunately, if you've got a syntactic category
and you're looking at all of the things which are models of that
syntax, there is a monad whose algebras are exactly the same thing
as models of that syntax. Right. What the monad does is it purely
semantically bundles up all of the operations. Right.
Here's a good example of a monad. List blank. What does list blank do?
It takes a set X to the set of all lists valued in x.
What is the extra structure of this monad?
Because what I've just described is just an Endofunctor.
I haven't described the extra structure of a monad.
A monad has a unit and it also has a multiplication, right?
What is the multiplication of this endofunctor?
Well, what if I take list of list of x?
So what are the elements of the set? List of list of x?
Those are lists of lists whose entries are elements of the set X.
What's the multiplication? I can erase the inside parentheses.
I can turn any list of lists into a list by forgetting the
next nested parentheses. That's the multiplication of the
list monad. What is the unit of the list monad?
The unit allows you to interpret things of type x as things of type
list x, right? How do I do that? I take the element x and I
associate to it the list that just has x in it. Right?
And the laws of a monad are compatibility between these two
notions, between the multiplication and and the unit.
What it means to be an algebra for a monad, right, is to have a
structure map that goes from, you know, the monad operating on
your object down to your object. What this does is this interprets.
Like this formally interpreted. Right.
So, you know, the monad is this abstract, this purely semantic
representation of the syntax. Right. So I think about like if I've got
some monad t, I think about what t is, that's all the terms that I'm
allowed to write with x and all the syntax that defines the monad t.
And then I want to interpret that back end to x because that's
what it means to have like the structure I mean blah blah blah.
This is tautological. That's what it means to have the
structure of an algebra for T, right. What does that mean in the case
of say, this list monad? Well, the algebras for the list monad
are these things called monoids, where monoids just have a binary
operation that has a unit you don't have invertibility.
How does that make sense? Okay, let's consider an instance
of a monoid with which we are all familiar natural numbers.
Okay, so I can take a list of natural numbers, and I can
interpret it as a natural number. What's the right way to do that?
Well, I add them. Right. This is the structure map of the
natural numbers as a as a monoid. What if there isn't a natural way to,
you know, let's say turn a list into a single number?
I mean, for example, you could multiply them. Like, why add them?
That's just one of the structures. There are lots of different ways
in which a particular set can be endowed with the extra structure
of being an algebra. For lists, there are lots of
different monoids on the same set. Oh, I see,
I see so so you would have one per operation or whatever. Exactly.
That's the point, is each like abstract conception of operation
gives rise to a different monad. Yeah.
Whose algebras are all of the different structures that can be
written in terms of that operation. Okay.
But the magic of the theory of the monad is it gets you away from
having to write those things down syntactically, and instead you can
just look at the two dimensional, two dimensional semantics of this. Yeah.
I mean, so it's almost like a kind of, um, templating, um, function.
I mean, it's a, it's a pattern language for algebra.
Yeah. It's exactly, exactly. And I mean, this this brings me
to Haskell, for example. So Haskell has a few, like, um,
monads built into it. Yeah. How does it work in Haskell?
Oh, okay. So these are some of my favorite
monads, right. So I am not a Haskell expert,
so hopefully none of the FP zealots in the audience will,
uh, crucify me in the comments for getting this wrong.
But so in Haskell, a lot of the time, what do you do?
You start with an endofunctor F yeah, right.
And then you want to talk about the monad generated by F, right.
The free monad on F. Well how do you do this.
It's like well you sort of like. Infinitely compose F with itself,
right? Why does that give you a monad?
It's like well, because the. You know, take omega,
i.e. the size of the natural, like the the infinity which is
the size of the natural numbers. Do f omega times and then do it
one more time. That's the same thing as just
doing f omega times. So that means f to the omega
composed with F to the Omega is just F to the omega. Right.
So that's the multiplication of a monad.
So you can go from the structure of an endofunctor to the structure of a
monad in this relatively elementary way by believing in infinities.
Yeah, yeah. So? So what else? What else do we need to cover on
category three for the purposes of our paper? Right.
What do you really need to know? Right.
That's the question you've asked. You need to know what a category is.
You need to know what a functor is. You need to know what a natural
transformation is. You need to know what an endo
functor is, an endofunctor. That's one of the easiest ones,
right? An Endofunctor is just a functor
from one category back to itself. You need to know what a monad is.
A monad is not is an endofunctor, but it's also more.
It's an Endofunctor together with some extra data.
That extra data is a natural transformation from.
The composition of that endofunctor with itself down to
just one copy of that endofunctor. And it's a natural transformation
from the identity endofunctor into that Endofunctor that
satisfies some properties. Right. And these are the axioms of a monad.
Okay. And why do we need endofunctors?
I mean, it might seem like a naive question, but why do we need to
have a way of transforming something back to itself?
This is what constructors of a type are. Right. So how do I build lists?
I have a notion of what the empty list is, and I have a notion of how
to append something to it. Yeah. So the endofunctor that begets lists
one plus blank. Like the one. So. So if I've got a map from one
Co-product x down to x, well, what am I doing?
I'm picking out a special element in the set X and I'm picking out.
An automobile, not an automorphism. An endomorphism of the set X I'm
picking out zero and I'm picking out successor.
So I said that was lists is really like making them like interpret
the natural numbers. Yeah. Right. But it's that kind of thing.
So the point is the endofunctors are interpretations of abstract terms.
Like. So that's exactly why, like,
monads are particularly nicely structured. Yeah.
You know, notions of abstract terms that where the algebras
for monads interpret those terms into the original object.
And where can people at home learn about this in more detail?
Um, I don't know, where the hell did I learn about monads?
Uh, I think I learned about monads in honestly, I learned about
monads from watching people give a bunch of talks about monads and
eventually believing in them. If you want to actually read
something about it, you know. Sure. Probably the best thing to do is
go on YouTube and look up monad and find, you know,
find someone not from the FP world to tell you what a monad is.
I'm sure like some of the, you know, the casters videos will do a great
job of telling you what that is. Um, likewise, I'm sure there's some
of like the Phong and Spivak videos from their MIT course years ago.
That'll tell you what it is. Uh, it's I'm sure it's covered in
seven sketches, and it's definitely covered in all of the sort of pure
math oriented texts about, uh, category theory, like, you know,
the classic, you know, categories for the working mathematician, um,
or Steve Out.is book, which I learned a lot of category theory from.
I finally understood the Yoneda lemma by reading that book a
couple of times. Um. So it's going to be it's going
to be in lots of places, essentially just just pick one
and fight with it long enough and eventually you'll get it. Yes.
Now, you mentioned Livia theory earlier. Yes. Okay.
Formally, a Livia theory is a small category which is complete under
the formation of finite products. Okay, that's not a terribly useful,
um, description of it. Right? It's the kind of it's.
Excuse me to say that it's not a terribly useful description of
it is completely wrong, but it's the kind of.
You know, there's like the description of things.
That's exactly right. Once you know what it is. Yes.
But if you're going to. But it's not the right way to
learn about it, what is. So why do you want it to be
closed under finite products? Well, because finite products
are exactly what allow you to phrase functions that are going
from tuples of things to other tuples of things. Right?
So say I want to encode the laws of a monoid.
Well, there's a binary operation for a monoid.
So to encode that I need a map from x times x down to x. Right.
So what a linear theory is right? Say the linear theory.
I'm not going to do it specifically because I'm fucking
up if I try and do this right now. But what a linear theory is, is.
It is a category which encodes. The compatibilities and
relationships between a bunch of n to m array operations. Right.
So it is like it is a syntactic category that has no structure except
specified relationships between a bunch of different end to end area
operations. Mhm. Poon is an honor. Yeah. Thank you so much.
Yeah, that feels pretty good.
関連動画をさらに表示
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
Artificial Intelligence Explained Simply in 1 Minute! ✨
Computer Vision: Crash Course Computer Science #35
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
Prof. Chris Bishop's NEW Deep Learning Textbook!
5.0 / 5 (0 votes)