Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Simons Foundation
5 Apr 202455:55

Summary

TLDRThe transcript discusses the emerging paradigm of interpreting neural networks for physical insights in scientific discovery. It highlights the potential of AI in learning complex models from limited data, exemplified by advances in fluid turbulence and planetary system instability prediction. The speaker emphasizes the importance of translating these models into interpretable mathematical language using symbolic regression. They also introduce the concept of polymathic AI, which involves creating large, flexible neural networks trained on diverse data to serve as foundational models for various scientific tasks, promoting a new approach to building theories in the physical sciences.

Takeaways

  • 🧠 The concept of interpreting neural networks for physical insights represents a new paradigm in scientific exploration.
  • 🌀 Success in using neural networks for scientific insights includes predicting instability in planetary systems and modeling fluid turbulence with high accuracy.
  • 📈 Traditional scientific methods involve building theories from low-dimensional data, while modern AI-driven approaches use high-dimensional data and flexible functions.
  • 🔍 The speaker's motivation is to understand how neural networks achieve accuracy and to use these insights to advance scientific understanding.
  • 🧬 The potential of machine learning in science is highlighted by the ability of neural networks to learn from data and find patterns not previously recognized.
  • 📚 Symbolic regression is a technique used to interpret neural networks by finding analytic expressions that optimize to fit data sets.
  • 🧬 The use of genetic algorithms in symbolic regression is akin to evolving equations to fit data, providing a bridge between machine learning models and mathematical language.
  • 🌟 Foundation models, like GPT for language, are proposed for science as a way to train on diverse data and then specialize for specific tasks, improving performance.
  • 🔄 The concept of 'polymathic AI' is introduced as a foundation model for science that can incorporate data across disciplines and be fine-tuned for particular problems.
  • 🔍 The importance of simplicity in scientific models is questioned, with the suggestion that what is considered simple may be based on familiarity and utility rather than inherent simplicity.

Q & A

  • What is the main motivation behind interpreting neural networks for physical insights?

    -The main motivation is to extract valuable scientific insights from neural networks, which can potentially advance our understanding of various phenomena and contribute to the development of new theories in the physical sciences.

  • How does the traditional approach to science differ from the new paradigm of using neural networks?

    -The traditional approach involves building theories based on low-dimensional data sets or summary statistics, whereas the new paradigm uses massive neural networks to find patterns and insights in large, complex data sets, and then builds theories around what the neural networks have learned.

  • Can you explain the concept of symbolic regression in the context of interpreting neural networks?

    -Symbolic regression is a machine learning task that aims to find analytic expressions that optimize some objective by searching over all possible expression trees. It is used to build surrogate models of neural networks, translating the model into a mathematical language that is interpretable and familiar to scientists.

  • What is the significance of the universal approximation theorem in relation to neural networks?

    -The universal approximation theorem states that a shallow neural network with a single layer of activations can approximate any 1D function to arbitrary accuracy. This highlights the power of neural networks in modeling complex relationships and functions in data.

  • How do foundation models like GPT differ from traditional machine learning models?

    -Foundation models are trained on massive, diverse datasets and are flexible enough to serve as a basis for a wide range of tasks across different domains. They are first pre-trained on general data and then fine-tuned for specific tasks, whereas traditional models are often trained from scratch for a particular task.

  • What is the role of simplicity in the context of scientific discovery and interpretability?

    -In the context of scientific discovery, simplicity often refers to the ability to describe complex phenomena with minimal assumptions or variables. It aids interpretability by providing clear, understandable explanations for observed data, which can lead to more effective models and theories.

  • How does the concept of pre-training neural networks relate to the development of polymathic AI?

    -Pre-training neural networks on a broad range of data allows them to develop general priors for different types of problems, much like a well-rounded scientist. This approach is central to the development of polymathic AI, which aims to create models that can be fine-tuned for specific tasks across various scientific disciplines.

  • What are the potential challenges in training a foundation model for science, given the diversity of data types in different scientific fields?

    -The main challenge lies in defining a general objective that can be applied to the diverse range of data types in science. The objective needs to be flexible enough to accommodate different data forms, such as sequences in molecular biology or images in astrophysics, while still enabling the model to learn broadly applicable concepts.

  • How does the concept of shared concepts across different physical systems relate to the training of foundation models?

    -Shared concepts like causality and multiscale dynamics are common across various scientific disciplines. By training a foundation model on diverse datasets that encompass these shared concepts, the model can develop a general understanding of these principles, which can then be fine-tuned for specific tasks within particular fields.

  • What are the potential implications of polymathic AI for the future of scientific research?

    -Polymathic AI has the potential to revolutionize scientific research by providing a generalizable foundation model that can quickly adapt to new tasks and problems. This could lead to faster discoveries, more efficient use of computational resources, and the development of new, broadly applicable scientific models.

Outlines

00:00

🤖 Introduction to Neural Network Interpretation

The speaker introduces the concept of interpreting neural networks to gain physical insights, which is seen as a new paradigm in science. The work involves collaboration with many individuals and focuses on extracting insights from neural networks, particularly in the context of fluid turbulence and planetary system instability. The speaker emphasizes the importance of understanding how neural networks achieve accuracy beyond traditional models and the potential for using these insights to advance scientific understanding.

05:03

🧠 Machine Learning Fundamentals and Activation Functions

The speaker delves into the fundamentals of machine learning, starting with linear regression and progressing to the concept of activation functions in neural networks. Activation functions introduce non-linearity, allowing the model to fit complex data. The speaker explains the construction of a shallow neural network and how it can be viewed as a piecewise linear model. The universal approximation theorem is mentioned, highlighting the ability of a neural network to approximate any 1D function to arbitrary accuracy.

10:05

🌀 Deepening Neural Networks and Function Composition

The speaker discusses the progression from shallow to deep neural networks, explaining how additional layers allow for the representation of functions in higher dimensional spaces. The concept of function composition is introduced, likening it to folding a piece of paper to create complex behaviors. The speaker emphasizes the efficiency of neural networks through shared computation and neurons, and how this relates to emulating physical processes and gaining interpretable insights through symbolic regression.

15:06

📝 Symbolic Regression and Surrogate Models

The speaker explains the process of symbolic regression, a machine learning task aimed at finding analytic expressions that fit a dataset. This is done using a genetic algorithm-based search, which evolves equations to fit the data. The speaker discusses the use of symbolic regression to build surrogate models of neural networks, translating the model into a mathematical language. This technique is used to interpret and distill the behavior of neural networks into equations, providing a more interpretable approximation of the original model.

20:07

🌟 Polymathic AI and Foundation Models

The speaker introduces the concept of polymathic AI, focusing on building massive neural networks for science. Foundation models, which are trained on diverse data to learn general concepts, are discussed as a way to improve performance on downstream tasks. The speaker explains how these models can be fine-tuned for specific tasks, drawing parallels with language models. The potential for these models to discover new scientific insights and the importance of pre-training are highlighted, with examples of their application in physics.

25:08

🔍 Interpreting Neural Networks and Simplicity

The speaker discusses the interpretation of neural networks, particularly focusing on the concept of simplicity. The speaker argues that simplicity is based on familiarity and usefulness, and that broadly useful algorithms discovered by polymathic AI models may become familiar and simple over time. The speaker emphasizes the importance of understanding what is interpretable and how pre-training can provide a better starting point for neural networks in scientific applications.

30:10

💡 Final Thoughts and Q&A

The speaker concludes with a discussion on the potential impact of polymathic AI on science and how it might change scientific teaching. The speaker also addresses questions about the scalability of training for foundation models, the potential for these models to discover new physics, and the challenges of symbolic regression with high-dimensional data. The speaker acknowledges that while pre-training has been beneficial in experiments so far, adversarial examples may exist that could hinder training in certain cases.

Mindmap

Keywords

💡Neural Networks

Neural networks are a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In the context of the video, neural networks are used to interpret and gain physical insights from data, such as learning subgrid models for fluid turbulence or predicting instability in planetary systems.

💡Physical Insight

Physical insight refers to the understanding of the fundamental principles or laws that govern physical phenomena. In the video, the speaker is interested in using neural networks to extract such insights, which can lead to advances in scientific understanding and the development of new theories.

💡Data-Driven Models

Data-driven models are computational models that are developed based on data and statistical analysis, rather than purely on theoretical or physical principles. These models are used to make predictions or identify patterns within large datasets. In the video, the speaker discusses the use of neural networks as data-driven models to emulate physical processes and gain new scientific insights.

💡Symbolic Regression

Symbolic regression is a form of machine learning where the goal is to find an analytical expression that fits a set of data. It is used to create a human-interpretable model from data, which can be expressed in mathematical terms. In the video, the speaker describes using symbolic regression to interpret the behavior of neural networks and translate their learned patterns into mathematical equations.

💡Foundation Models

Foundation models are machine learning models that are trained on massive amounts of general and diverse data to serve as a basis for other models. They are then fine-tuned for specific tasks, often resulting in better performance due to the general priors they have learned. In the video, the speaker discusses the concept of foundation models in the context of scientific discovery and their potential to revolutionize the way science is done.

💡Interpretability

Interpretability in machine learning refers to the ability to understand the reasoning behind a model's predictions or decisions. It is crucial for gaining trust in the model, diagnosing its failures, and leveraging its knowledge. In the video, the speaker emphasizes the importance of interpretability in neural networks to extract meaningful scientific insights and contribute to the advancement of science.

💡Polymathic AI

Polymathic AI refers to a hypothetical type of artificial intelligence that is capable of understanding, learning, and making contributions across multiple disciplines or fields. In the video, the speaker discusses the idea of building such AI by training neural networks on diverse scientific data to create a foundation model that can then be specialized for various scientific tasks.

💡Subgrid Models

Subgrid models are mathematical models used in computational fluid dynamics to represent small-scale processes that occur within the grid cells of a larger simulation. They are essential for accurately modeling turbulence and other complex fluid behaviors. In the video, the speaker mentions subgrid models as an example of how neural networks can be used to improve upon traditional models.

💡Planetary Instability

Planetary instability refers to the condition where the gravitational interactions within a planetary system cause the orbits of the planets to become chaotic and potentially collide or eject each other from the system. In the video, the speaker discusses using neural networks to predict such instabilities, which is a centuries-old problem in the field of chaos theory.

💡Genetic Algorithm

A genetic algorithm is a search heuristic that mimics the process of natural selection to find approximate solutions to optimization and search problems. In the context of the video, genetic algorithms are used for symbolic regression to evolve mathematical expressions that best fit a given dataset.

Highlights

The speaker discusses a new paradigm of interpreting neural networks to gain physical insights, marking a shift in the approach to science.

The motivation stems from examples like using neural networks to learn subgrid models for fluid turbulence with high accuracy, outperforming traditional models.

Another example is predicting instability in planetary systems, a centuries-old problem, where neural networks have shown better accuracy and generalization.

The traditional approach to science involves building theories from low-dimensional data sets, but the speaker suggests using neural networks as a tool to describe the data and its patterns.

Neural networks, especially those trained on massive amounts of data, can discover new things not present in existing theories.

The speaker introduces the concept of using neural networks as a compression tool to extract common patterns from a data set.

The potential of machine learning models to serve as surrogate models for complex physical processes is highlighted.

Symbolic regression is introduced as a method to interpret data-driven models by finding analytic expressions that optimize some objective.

The speaker discusses the use of genetic algorithms for symbolic regression, evolving equations to fit the data set.

Foundation models are presented as a shift in industrial machine learning, which are trained on massive, diverse data and then fine-tuned for specific tasks.

The idea of polymathic AI is introduced, aiming to build massive neural networks for science that incorporate data across many disciplines.

The potential of polymathic AI to discover broadly useful algorithms and new scientific models is emphasized.

The speaker discusses the importance of pre-training models, even on unrelated data, to provide a better starting point than random initialization.

The concept of simplicity is explored, suggesting that what is considered simple is often based on familiarity and usefulness in describing the world.

The speaker argues that polymathic models will be broadly useful and may drive the simplicity of newly discovered concepts or algorithms.

The potential impact of polymathic AI on how science is taught and done is discussed, suggesting a significant shift in the scientific process.

The speaker addresses concerns about whether neural networks are rediscovering known physics or discovering new insights, emphasizing the importance of verification.

The limitations of symbolic regression for high-dimensional problems are acknowledged, and the potential for incorporating more general algorithms is discussed.

The speaker concludes by expressing excitement about the direction of polymathic AI and its potential to revolutionize scientific discovery and understanding.

Transcripts

play00:09

so uh I'm very excited today to talk to

play00:11

you about uh this idea of kind of

play00:17

interpreting neural networks to get uh

play00:20

physical Insight which I view as as kind

play00:23

of a new really kind of a new paradigm

play00:26

of of doing science um so this is a this

play00:30

is a work with huge number of people um

play00:32

I can't individually mention them all

play00:34

but um many of them are here at the flat

play00:36

IR Institute so I'm going to split this

play00:39

up I'm going to do two parts the first

play00:41

one I'm going to talk about kind of how

play00:43

we go from a neural network to insights

play00:46

how we actually get insights out of a

play00:48

neural network the second part I'm going

play00:49

to talk about this polymathic AI thing

play00:53

um which is about basically building

play00:55

massive uh neural networks for

play00:58

science so

play01:01

my motivation for this line of work is

play01:06

uh examples like the

play01:08

following so there was this paper led by

play01:11

Kimberly stachenfeld at Deep Mind uh a

play01:14

few a couple years ago on learning fast

play01:18

subgrid models for fluid

play01:21

turbulence um so what you see here is

play01:24

the ground truth so this is kind of some

play01:26

some box of a fluid uh the bottom row is

play01:29

the the the Learned kind of subgrid

play01:31

model essentially for this this

play01:34

simulation um the really interesting

play01:37

thing aart about this is that this

play01:41

model was only trained on 16 simulations

play01:45

but it it actually learned to be more

play01:47

accurate than all traditional subgrid

play01:50

models at that resolution um for fluid

play01:54

dynamics so I think I think it's really

play01:56

exciting kind of to figure out how did

play01:59

the model do that and and kind of what

play02:01

can we learn about science from this

play02:04

from this uh neural

play02:07

network uh another example is so this is

play02:09

a work that uh I worked on with Dan too

play02:12

and others on predicting instability in

play02:15

planetary systems so this is a this is a

play02:18

centuries old problem you have some you

play02:21

know this this compact planetary system

play02:23

and you want to figure out when does it

play02:25

go un stable um there are literally I

play02:28

mean people have literally worked on

play02:30

this for

play02:31

centuries um it's a fundamental problem

play02:34

in chaos but this this neural network uh

play02:37

trained on I think it was maybe 20,000

play02:40

simulations um it's it's not only more

play02:43

accurate at predicting instability but

play02:45

it also seems to generalize better to

play02:47

kind of different types of systems um so

play02:51

it's it's really interesting to think

play02:52

about okay this these neural networks

play02:55

they've um they've seemed to have

play02:57

learned something new how can we we

play03:00

actually use that to advance our own

play03:02

understanding so that's that's my

play03:04

motivation here so the traditional

play03:07

approach to science has been kind of you

play03:10

have some low dimensional data set or

play03:12

some kind of summary statistic and you

play03:14

build theories to describe that uh

play03:18

low-dimensional data um which might be

play03:20

kind of a summary

play03:22

statistic so you can look throughout the

play03:25

history of science so maybe Kepler's Law

play03:27

is an empirical fit to data

play03:30

and then of course Newton's law of

play03:31

gravitation was required to explain this

play03:34

and another examples like Plank's law so

play03:36

this was an actually an empirical fit to

play03:39

data um and quantum mechanics was

play03:42

required uh partially motivated by this

play03:45

to um explain it

play03:48

so this is this is uh kind of the the um

play03:53

the normal approach to building theories

play03:56

um and of course some of these they

play03:58

they've kind of I mean it's not only

play04:00

this it also involves you know many

play04:03

other things but um I I think it's

play04:06

really exciting to think about how we

play04:07

can

play04:08

involve

play04:10

interpretation of datadriven models in

play04:13

this process going to vary generally so

play04:16

that's what I'm going to talk about

play04:18

today uh I'm going to

play04:20

conjecture that in this era of AI where

play04:25

we have these massive neural networks

play04:26

that kind of seem to outperform all of

play04:28

our traditional the the um we might want

play04:32

to consider this approach where we use a

play04:35

neural network as essentially

play04:37

compression

play04:38

tool or some kind of uh tool that that

play04:42

pulls apart common patterns um in uh a

play04:48

data set and we build theories not to

play04:50

describe the data directly but really

play04:52

kind of to describe the neural network

play04:54

and what the neural network has learned

play04:57

um so I think this is kind of a exciting

play04:59

new approach to I mean really really

play05:02

science in general I think especially

play05:04

the physical

play05:05

sciences so the the key Point here is

play05:09

neural networks trained on massive

play05:10

amounts of data with with very

play05:13

flexible functions they they seem to

play05:16

find new things that are not in our

play05:18

existing Theory so I showed you the

play05:20

example with turbulence you know we can

play05:22

find better subgrid models just from

play05:24

data um and we can also do this with the

play05:26

planetary

play05:28

Dynamics so

play05:30

I think our challenge as scientists for

play05:32

those

play05:33

problems is distilling those insights

play05:36

into our language kind of incorporating

play05:38

it in our Theory I think this is this is

play05:40

a a really exciting way to kind of look

play05:43

at these these

play05:45

models so I'm going to break this down a

play05:48

bit the first thing I would like to do

play05:51

is just go through kind of what what

play05:53

machine learning is how it works um and

play05:56

then talk about this this uh kind of how

play05:59

you app apply them to different data

play06:00

sets Okay so just going back to the very

play06:04

fundamentals uh linear regression in

play06:09

1D this is I would argue if you don't

play06:13

really have physical meaning to these

play06:17

parameters yet it is a kind of type of

play06:19

machine learning um and so this is a

play06:23

it's these are scalers right X and Y

play06:25

those are scalers 0 51 scalar parameters

play06:29

linear

play06:30

model you go One Step Beyond that and

play06:33

you get this shallow Network so again

play06:36

this has 1D input X 1D output y but now

play06:42

we've

play06:44

introduced this layer so we we have

play06:47

these linear

play06:49

models so we have

play06:52

three hidden neurons here and they pass

play06:55

through this function a so this is

play06:57

called an activation function and what

play07:00

this does is it gives the model a way of

play07:04

uh including some

play07:06

nonlinearity

play07:08

so these are called activation functions

play07:11

the the the one that most people would

play07:14

reach for first is the rectified linear

play07:17

unit or reu essentially what this does

play07:19

is it says if the input is less than

play07:22

zero drop it at zero greater than zero

play07:25

leave it um this is a very simple way of

play07:30

adding some kind of nonlinearity to my

play07:33

flexible curve that I'm going to fit to

play07:35

my data

play07:37

right

play07:39

um the next thing I do is I have these I

play07:43

have these

play07:45

different activation functions they have

play07:47

this this kind of joint here at

play07:50

different different points which depends

play07:52

on the

play07:53

parameters and I'm going to multiply the

play07:57

output of these activations by number so

play08:00

that's that's kind of the the output of

play08:04

my kind of a layer of the neural network

play08:08

um and this is going to maybe change the

play08:09

direction of it um change the slope of

play08:12

it the next thing I'm going to do is I'm

play08:14

going to sum these up I'm going to

play08:16

superimpose them and I get this is the

play08:18

output of one layer in my network so

play08:22

this is a shallow Network essentially

play08:24

what it is it's a piecewise linear model

play08:28

okay and the the joints here the parts

play08:31

where it kind of switches from one

play08:33

linear region to another those are

play08:35

determined by the inputs to the the

play08:39

first layers activations so it's it's

play08:41

basically a piecewise linear model okay

play08:44

it's a piecewise linear model um

play08:48

and the one cool thing about it is you

play08:53

can use this piecewise linear model to

play08:55

approximate any 1D function to arbitrary

play08:58

accuracy so if I want to model this

play09:01

function with five joints I can get an

play09:04

approximation like this with 10 joints

play09:06

like this 20 like that and I can just

play09:08

keep increasing the number of these

play09:11

neurons that gives me better and better

play09:14

approximations um so this is called the

play09:18

universal approximation theorem so it's

play09:20

it's that my uh shallow neural network

play09:24

right it just has one one kind of layer

play09:27

of activations I can describe any

play09:29

continuous function um to arbitrary

play09:32

Precision now that's not I mean this

play09:35

alone is not uh that exciting because

play09:39

like I can do that with pols right like

play09:41

I don't I don't need like the neural

play09:42

network is not the only thing that does

play09:44

that I think the exciting part about

play09:47

neural networks is when you start making

play09:48

them deeper so first let's look at what

play09:51

if we had two inputs what would it look

play09:54

like if we had two inputs now these

play09:57

activations they are activated along

play10:01

planes not not points they're activated

play10:04

along planes so for this is my maybe my

play10:09

input plane I'm basically chopping it

play10:12

along the the Zero part and now I have

play10:15

these 2D planes in

play10:18

space okay and the next thing I'm going

play10:20

to do I'm going to scale

play10:22

these and then I'm going to superimpose

play10:25

them and this gives me ways of

play10:28

representing

play10:29

kind of arbitrary functions in now a 2d

play10:33

space rather than just a 1D space so it

play10:36

gives me a way of

play10:39

expressing um you know arbitrary

play10:42

continuous functions okay now the cool

play10:46

part oops the cool part here is when I

play10:50

want to do two two layers okay so now I

play10:55

have two layers so I have this this is

play10:58

my first neural Network this is my

play11:00

second neural network and my first

play11:03

neural network looks like this okay if I

play11:05

consider it alone it looks like this my

play11:08

second um neural network it looks like

play11:11

this if I just like I cut this neural

play11:14

network out it looks like this okay when

play11:17

I compose them

play11:19

together I get this this this shared um

play11:24

kind of behavior where so I'm I'm

play11:27

composing these functions together and

play11:29

essentially what happens

play11:32

is it's almost like you

play11:34

fold the functions together so that I

play11:38

experience that function in this linear

play11:41

region and kind of backwards and then

play11:43

again so you can see there's there's

play11:45

kind of like that function is mirrored

play11:47

here right it goes goes back and forth

play11:51

um so you can make this analogy to

play11:54

folding a piece of paper so if I

play11:56

consider my first neural network like

play11:59

like this on a piece of paper I could

play12:01

essentially Fold It draw my second

play12:05

neural network the function over that

play12:07

that first one and then expand it and

play12:11

essentially now I have this this uh

play12:14

function

play12:15

so the the cool part about this is that

play12:18

I'm sharing I'm kind of sharing

play12:22

computation because I'm sharing neurons

play12:25

in my neural network um so this is going

play12:29

to come up again this is kind of a theme

play12:30

we're we're doing efficient computation

play12:33

in neural networks by sharing

play12:35

neurons and it's it's useful to think

play12:38

about it in this this this way kind of

play12:40

folding paper drawing curves over it and

play12:44

expanding

play12:45

it um okay so let's go back to the

play12:49

physics now neural

play12:52

networks uh right they're efficient

play12:55

Universal function approximators you can

play12:57

think of them as kind of like a type of

play13:00

data

play13:01

compression the same neurons can be used

play13:03

for different

play13:05

calculations uh in the same network um

play13:09

and a common use case uh in in physical

play13:13

sciences especially what I work on is

play13:16

emulating physical processes so if I

play13:18

have some my my simulator is kind of too

play13:21

expensive or I have like real world data

play13:23

my simulator is not good at describing

play13:25

it I can build a neur neural network

play13:29

that maybe emulates it so like I have a

play13:32

neural network that looks at kind of the

play13:34

initial conditions in this model and it

play13:36

predicts when it's going to go

play13:39

unstable so this is a this is a good use

play13:41

case for them um and once I have that so

play13:46

maybe I have this I have this trained

play13:50

piecewise linear model that kind of

play13:52

emulates some physical

play13:54

process now how do I take that and go to

play13:59

uh interpret it how do I actually get

play14:01

insight out of

play14:03

it so this is where I'm going to talk

play14:06

about symbolic regression so this is one

play14:09

of my favorite things so a lot of the

play14:13

interpretability work in uh industry

play14:16

especially like computer vision language

play14:18

there's not really like there's not a

play14:20

good modeling language like if I have a

play14:22

if I have a model that classifies cats

play14:24

and dogs there's not really like there's

play14:27

not a language for

play14:29

describing every possible cat there's

play14:31

not like a mathematical framework for

play14:33

that but in science we do have that we

play14:35

do have um

play14:38

oops we do have a very

play14:42

good

play14:43

uh mathematical

play14:46

framework let me see if this

play14:51

works uh so in science right so we have

play14:54

this you know in science we have this

play14:56

very good understanding of the

play15:00

universe and

play15:02

um we have this language for it we have

play15:05

mathematics which describes the universe

play15:08

very well uh and I think when we want to

play15:12

interpret these datadriven models we

play15:15

should use this language because that

play15:17

will give us results that are

play15:19

interpretable if I have some piece-wise

play15:22

linear model with different you know

play15:24

like millions of parameters it's not

play15:26

it's not really useful for me right I

play15:28

want to I want to express it in the

play15:29

language that I'm familiar with which is

play15:32

uh

play15:34

mathematics um so you can look at like

play15:35

any cheat sheet and it's uh it's a lot

play15:38

of you know simple algebra this is the

play15:41

language of

play15:42

science so symbolic regression is a

play15:45

machine learning task where the

play15:48

objective is to find analytic

play15:52

Expressions that optimize some objective

play15:55

so maybe I uh maybe I want to fit that

play15:58

dat set and uh what I could do is

play16:03

basically try different trees so these

play16:06

are like expression

play16:08

trees right so this equation is that

play16:10

tree and I basically find different

play16:12

expression trees that uh match that data

play16:17

so the point of symbolic regression I

play16:20

want to find equations that fit the data

play16:22

set so the symbolic and the parameters

play16:26

rather than just optimizing parameters

play16:29

in some

play16:30

model so the the the current way to do

play16:33

this the the state-of-the-art way is a

play16:35

genetic algorithm so it's it's kind of

play16:39

um it's not really like a clever

play16:42

algorithm it's it's uh I can say that

play16:45

because I work on it it's a it's it's

play16:47

pretty close to Brute Force essentially

play16:50

what you do is you treat your equation

play16:55

like a DNA sequence and you basically

play16:57

evolve it so you do like mutations you

play17:00

swap one operator to another maybe maybe

play17:04

you crossbreed them so you have like two

play17:06

expressions which are okay you literally

play17:09

breed those together I mean not

play17:11

literally but you conceptually breed

play17:13

those together get a new expression um

play17:16

until you fit the data set

play17:20

um

play17:22

so yeah so this is a genetic algorithm

play17:24

based search uh for symbolic regression

play17:28

now

play17:29

the the point of this is uh to find

play17:33

simple models in our language of

play17:37

mathematics that describe uh a given

play17:40

data

play17:41

set so um so I've spent a lot of time

play17:43

working on these Frameworks so piser

play17:47

symbolic regression.

play17:48

JL um

play17:50

they they work like this so if I have

play17:54

this expression I want to model that

play17:55

data set essentially what I'm going to

play17:57

do is just search over all possible

play18:01

Expressions uh until I find one that

play18:04

gets me closer to this ground truth

play18:07

expression so you see it's kind of

play18:09

testing different different branches in

play18:11

evolutionary space I'm going to play

play18:13

that

play18:15

again until it reaches this uh ground

play18:19

truth data set so this is this is pretty

play18:21

close to how it

play18:23

works uh you're essentially finding

play18:25

simple Expressions that fit some data

play18:27

set accurately

play18:35

okay

play18:36

so what I'm going to show you how to do

play18:40

is this symbolic regression idea is

play18:44

about fitting kind of finding models

play18:48

symbolic models that I can use to

play18:51

describe a data set I want to use that

play18:55

to build surrogate models of my neural

play18:59

network so this is this is kind of a way

play19:02

of translating my model into my language

play19:06

you could you could also think of it as

play19:08

like polom uh or like a tailor expansion

play19:12

in some

play19:14

ways the way this works is as

play19:16

follows if I have some neural network

play19:19

that I've trained on my data set

play19:22

whatever I'm going to train it normally

play19:24

freeze the

play19:26

parameters then what I do is

play19:29

I record the inputs and outputs I kind

play19:31

of treat it like a data generating

play19:33

process I I try to see like okay what's

play19:35

the behavior for this input this input

play19:37

and so on then I stick those inputs and

play19:40

outputs into piser for example and I I

play19:44

find some equation that models that

play19:48

neural network or maybe it's like a

play19:49

piece of my neural

play19:51

network so this is a this is building a

play19:54

surrogate model for my neural network

play19:56

that is kind of a a Pro imates the same

play19:59

behavior now you wouldn't just do this

play20:01

for like a standalone neural network

play20:04

this this would typically be part of

play20:07

like a larger model um and it would give

play20:10

you a way of interpreting exactly what

play20:12

it's doing for different

play20:15

inputs so what I might have is maybe I

play20:19

have like two two pieces like two neural

play20:22

networks here maybe I think the first

play20:25

neural network is like learning features

play20:27

or it's learning some kind of coordinate

play20:30

transform the second one is doing

play20:32

something in that space uh it's using

play20:34

those features for

play20:36

calculation um and so I can using

play20:39

symbolic regression uh which we call

play20:42

symbolic distillation I can I can

play20:45

distill this model uh into

play20:48

equations so that's that's the basic

play20:50

idea of this I

play20:53

replace neural networks so I replaced

play20:55

them with my surate model which is now

play20:57

an equation

play20:59

um you would typically do this for G as

play21:02

well and now I have equations that

play21:04

describe my

play21:06

model um and this is kind of a a

play21:10

interpretable approximation of my

play21:12

original neural network now the reason

play21:14

you wouldn't want to do this for like

play21:16

just directly on the data is because

play21:18

it's a harder search problem if you

play21:21

break it into

play21:22

pieces like kind of interpreting pieces

play21:24

of a neural network it's easier because

play21:27

you're only searching for

play21:29

2 N Expressions rather than n s so it's

play21:33

a it's a bit easier and you're kind of

play21:34

using the Neal Network as a way of

play21:38

factoring factorizing the system into

play21:41

different pieces that you then

play21:45

interpret um so we've we've used this in

play21:47

in different papers so this is one uh

play21:50

led

play21:52

by Pablo Lemos on uh rediscovering

play21:56

Newton's law of gravity from data

play21:59

so this was a this was a cool paper

play22:01

because we didn't tell it the masses of

play22:04

the bodies in the solar system it had to

play22:06

simultaneously find the masses of every

play22:11

all of these 30 bodies we gave it and it

play22:14

also found the law um so we kind of

play22:16

train this neural network to do this and

play22:18

then we interpret that neural network

play22:20

and it gives us uh Newton's law of

play22:23

gravity um now that's a rediscovery and

play22:26

of course like we know that so I think

play22:29

the discoveries are also cool so these

play22:31

are not my papers these are other

play22:33

people's papers I thought they were

play22:34

really exciting so this is one a recent

play22:37

one by Ben Davis and jial Jinn where

play22:41

they discover this new uh blackhole Mass

play22:44

scaling

play22:46

relationship uh so it's uh it relates

play22:49

the I think it's the spirality or

play22:53

something in a galaxy in the velocity

play22:55

with the mass of a black hole um so they

play22:57

they found this with this technique uh

play23:00

which is exciting um and I saw this

play23:02

other cool one recently um they found

play23:06

this cloud cover model with this

play23:09

technique uh using piser um so they it

play23:13

kind of gets you this point where it's a

play23:14

it's a fairly simple model and it's also

play23:17

pretty accurate um but again the the

play23:21

point of this is to find a model that

play23:22

you can understand right it's not this

play23:26

blackbox neural network with with

play23:28

billions of parameters it's a it's a

play23:30

simple model that you can have a handle

play23:35

on okay so that's part one now part two

play23:40

I want to talk about polymathic AI so

play23:44

this is kind of like the complete

play23:46

opposite end we're going to go from

play23:48

small models in the first part now we're

play23:50

going to do the biggest possible models

play23:52

um and I'm going to also talk about the

play23:53

meaning of Simplicity what it actually

play23:57

means so

play23:59

the past few years you may have noticed

play24:02

there's been this shift in indust

play24:05

industrial machine learning to favor uh

play24:08

Foundation models so like chat GPT is an

play24:12

example of this a foundation model is a

play24:17

machine learning model that serves as

play24:19

the foundation for other

play24:22

models these models are trained by

play24:24

basically taking massive amounts of

play24:26

General diverse data

play24:29

uh and and training this flexible model

play24:32

on that data and then fine-tuning them

play24:36

to some specific task so you could think

play24:38

of it as maybe teaching this machine

play24:42

learning model English and French before

play24:46

teaching it to do translation between

play24:48

the two um so it often gives you better

play24:53

performance on Downstream tasks I mean

play24:56

you can also see that I mean Chad gbt is

play24:59

uh I've heard that it's trained on um

play25:05

GitHub and that kind of teaches it to uh

play25:08

reason a bit better um and so the I mean

play25:12

basically these models are trained on

play25:13

massive amounts of data um and they form

play25:17

this idea called a foundation

play25:19

model so um the general idea is you you

play25:23

collect you know you collect your

play25:25

massive amounts of data you have this

play25:27

very Flex ible model and then you train

play25:30

it on uh you might train it to do uh

play25:35

self supervised learning which is kind

play25:37

of like you mask parts of the data and

play25:40

then the model tries to fill it back in

play25:42

uh that's a that's a common way you

play25:44

train that so like for example GPT style

play25:47

models those are basically trained on

play25:49

the entire internet and they're trained

play25:52

to predict the next word that's that's

play25:54

their only task you get a input sequence

play25:57

of words you predict the next one and

play25:59

you just repeat that for uh massive

play26:02

amounts of text and then just by doing

play26:05

that they get really good at um General

play26:09

language understanding then they are

play26:12

fine-tuned to be a chatbot essentially

play26:16

so they're they're given a little bit of

play26:17

extra data on uh this is how you talk to

play26:21

someone and be friendly and so on um and

play26:24

and that's much better than just

play26:26

training a model just to do that so it's

play26:29

this idea of pre-training

play26:32

models so I mean once you have this

play26:35

model I I think like kind of the the the

play26:39

cool part about these models is they're

play26:42

really trained in a way that gives them

play26:45

General priors for data so if I have

play26:50

like some maybe I have like some artwork

play26:53

generation model it's trained on

play26:55

different images and it kind of

play26:57

generates different art

play26:59

I can fine-tune this model on like

play27:02

studio gibli artartwork and it doesn't

play27:05

need much training data because it

play27:06

already knows uh what a face looks like

play27:10

like it's already seen tons of different

play27:12

faces so just by fine tuning it on some

play27:16

small number of examples it can it can

play27:18

kind of pick up this task much quicker

play27:21

that's that's essentially the idea

play27:25

now this is I mean the same thing is

play27:27

true in language right like if I if I

play27:30

train a model on uh if I train a model

play27:33

just to do language

play27:34

translation right like I just teach it

play27:36

that it's kind of I start from scratch

play27:40

and I just train it English to French um

play27:43

it's going to struggle whereas if I

play27:45

teach it English and French kind of I I

play27:48

teach it about the languages first and

play27:51

then I specialize it on translation um

play27:54

it's going to do much

play27:56

better so this brings us to science so

play28:01

in

play28:02

um in science we also have this we also

play28:06

have this idea where there are shared

play28:09

Concepts right like different languages

play28:12

have shared there's shared concept of

play28:14

grammar in different languages in

play28:17

science we also have shared Concepts you

play28:19

could kind of draw a big circle around

play28:23

many areas of Science and causality is a

play28:25

shared concept uh if you zoom in to say

play28:30

dynamical systems um you could think

play28:32

about like multiscale Dynamics is is

play28:35

shared in many different disciplines uh

play28:38

chaos is another shared concept

play28:41

so maybe if we train a general

play28:47

model uh you know over many many

play28:50

different data sets the same way Chad

play28:52

GPT is trained on many many different

play28:54

languages and and text databases maybe

play28:57

they'll pick up general concepts and

play29:00

then when we finally make it specialize

play29:03

to our particular problem uh maybe

play29:05

they'll do it it'll find it easier to

play29:09

learn so that's essentially the

play29:12

idea so you can you can really actually

play29:15

see this for particular systems so one

play29:18

example is the reaction diffusion uh

play29:20

equation this is a type of PD um and the

play29:24

shallow water equations another type of

play29:27

PD different fields different pdes but

play29:31

both have

play29:32

waves so they they both have wav like

play29:36

Behavior so I mean maybe if we train

play29:40

this massive flexible model on both of

play29:44

these system it's going to kind of learn

play29:45

a general prior for uh what a wave looks

play29:50

like and then if I have like some you

play29:53

know some small data set I only have a

play29:55

couple examples of uh maybe it'll

play29:58

immediately identify oh that's a wave I

play29:59

know how to do that um it's it's almost

play30:03

like I mean I kind of feel like in

play30:06

science today what we often do

play30:10

is I mean we train machine learning

play30:12

models from scratch it's almost like

play30:15

we're taking uh Toddlers and we're

play30:18

teaching them to do pattern matching on

play30:20

like really Advanced problems like we we

play30:23

have a toddler and we're showing them

play30:25

this is a you know this is a spiral

play30:27

galaxy this is an elliptical galaxy and

play30:29

it it kind of has to just do pattern

play30:31

matching um whereas maybe a foundation

play30:34

model that's trained on broad classes of

play30:37

problems um it's it's kind of like a

play30:39

general uh science graduate maybe um so

play30:43

it has a prior for how the world works

play30:47

it has seen many different phenomena

play30:49

before and so when it when you finally

play30:52

give it that data set to kind of pick up

play30:54

it's already seen a lot of that

play30:55

phenomena that's that's really the of

play30:58

this uh that's why we think this will

play31:01

work

play31:02

well okay so we we created this

play31:05

collaboration last year uh so this

play31:08

started at flat iron Institute um led by

play31:11

Shirley ho to

play31:13

build this thing a foundation model for

play31:18

science so this uh this is across

play31:23

disciplines so we want to you know build

play31:25

these models to incorporate data across

play31:28

many different disciplines uh across

play31:32

institutions um and uh so we're we're

play31:35

currently working on kind of scaling up

play31:36

these models right now the

play31:39

final I think the final goal of this

play31:43

collaboration is that we would release

play31:45

these open-source Foundation models so

play31:49

that people could download them and and

play31:50

fine-tune them to different tasks so

play31:53

it's really kind of like a different

play31:54

Paradigm of doing machine learning right

play31:57

like rather than the current Paradigm

play32:00

where we take a model randomly

play32:02

initialize it it's kind of like a like a

play32:04

toddler doesn't know how the world Works

play32:08

um and we train that this Paradigm is we

play32:10

have this generalist science model and

play32:15

you start from that it's kind of a

play32:17

better initialization of a

play32:20

model that's that's the that's the pitch

play32:23

of

play32:24

polymathic okay so we have results so

play32:28

this year we're kind of scaling up but

play32:30

uh last year we had a couple papers so

play32:32

this is one uh led by Mike mccab called

play32:36

multiple physics

play32:39

pre-training this paper looked at what

play32:42

if we have this General PD simulator

play32:46

this this model that learns to

play32:49

essentially run fluid Dynamic

play32:51

simulations and we train it on many

play32:53

different PDS will it do better on new

play32:56

PDS or will it do worse

play32:59

uh so what we found is that a single so

play33:04

a single model is not only able to match

play33:09

uh you know single uh single models

play33:13

trained on like specific tasks it can

play33:15

actually outperform them in many cases

play33:18

so it it does seem like if you take a

play33:21

more flexible model you train it on more

play33:24

diverse

play33:25

data uh it will do better in a lot of

play33:28

cases I mean it's it's not

play33:31

unexpected um because we do see this

play33:34

with language and vision um but I I

play33:36

think it's still really cool to uh to

play33:39

see

play33:40

this so um I'll skip through some of

play33:44

these so this is like this is the ground

play33:48

truth data and this is the

play33:50

Reconstruction essentially what it's

play33:52

doing is it's predicting the next step

play33:55

all right it's predicting the next

play33:56

velocity the next density and pressure

play33:58

and so on and you're taking that

play34:00

prediction and running it back through

play34:02

the model and you get this this roll out

play34:06

simulation so this is a this is a task

play34:09

people work on in machine

play34:11

learning um I'm going to skip through

play34:14

these uh and essentially what we found

play34:16

is that uh most of the time by uh using

play34:22

this multiple physics pre-training so by

play34:24

training on many different PDS you do

play34:28

get better performance so the ones at

play34:30

the right side are the uh multiple

play34:33

physics pre-trained models those seem to

play34:35

do better in many cases and it's really

play34:38

because I mean I think because they've

play34:41

seen you know so many different uh PDS

play34:44

it's like they have a better prior for

play34:48

physics

play34:50

um skip this as well so okay this is a

play34:53

funny thing that we observed is that

play34:58

so during talks like this one thing that

play35:00

we get asked is how similar do the PDS

play35:03

need to be like do the PDS need to be

play35:06

you know like navor Stokes but a

play35:08

different

play35:09

parameterization or can they be like

play35:12

completely different physical systems so

play35:14

what we found

play35:16

is uh

play35:18

really uh hilarious is that okay so the

play35:22

bottom line here this is the air of the

play35:26

model

play35:28

uh over different number of training

play35:30

examples so this model was trained on a

play35:33

bunch of different PDS and then it was

play35:35

introduced to this new PD problem and

play35:38

it's given that amount of data okay so

play35:41

that does the best this model it's

play35:43

already it already knows some Physics

play35:45

that one does the best the one at the

play35:48

top is the worst this is the model

play35:50

that's trained from scratch it's never

play35:53

seen anything uh this is like your

play35:56

toddler right like it's never it doesn't

play35:58

know how the physical world Works um it

play36:01

was just randomly initialized and it has

play36:03

to learn physics okay the middle models

play36:08

those are pre-trained on General video

play36:11

data a lot of which is Cap videos so

play36:17

even pre-training this model on cap

play36:21

videos actually helps you do much better

play36:25

than this very sophis phisticated

play36:28

Transformer architecture that just has

play36:30

never seen any data and it's really

play36:33

because I mean we think it's because of

play36:36

shared concepts of spaciotemporal

play36:38

continuity right like videos of cats

play36:42

there's a you know there's there's a

play36:45

spaciotemporal

play36:46

continuity like the cat does not

play36:49

teleport across the video unless it's a

play36:51

very fast cat um there's related

play36:54

Concepts right so I mean that's that's

play36:57

what we think but it's it's really

play36:59

interesting that uh you know

play37:03

pre-training on completely unrelated

play37:05

systems still seems to help

play37:08

um and so the takeaway from this is that

play37:12

you should always pre-train your model

play37:15

uh even if the physical system is not

play37:17

that related you still you still see

play37:19

benefit of it um now obviously if you

play37:24

pre-train on related data that helps you

play37:26

more but anything is basically better

play37:29

than than nothing you could basically

play37:32

think of this as the

play37:35

default initialization for neural

play37:37

networks is garbage right like just

play37:39

randomly initializing a neural network

play37:41

that's a bad starting point it's a bad

play37:44

prior for physics you should always

play37:47

pre-train your model that's the takeaway

play37:49

of this okay so um I want to finish up

play37:53

here with kind of rhetorical questions

play37:57

so I started the talk about um

play38:02

interpretability and kind of like how do

play38:04

we extract insights from our model now

play38:07

we've we've kind of gone into this

play38:09

regime of these very large very flexible

play38:12

Foundation models that seem to learn

play38:14

general

play38:16

principles so okay my question for you

play38:21

you don't have to answer but just think

play38:23

it over is do you think 1 + 1 is

play38:28

simple it's not a trick question do you

play38:31

think 1 + 1 is simple so I think most

play38:35

people would say yes 1+ 1 is

play38:37

simple and if you break that down into

play38:40

why it's simple you say okay so X Plus Y

play38:43

is simple for like X and Y integers

play38:46

that's a simple relationship okay why Y

play38:49

is X Plus y

play38:51

simple and and you break that down it's

play38:53

because plus is simple like plus is a

play38:56

simple operator okay why why is plus

play39:00

simple it's a very abstract

play39:03

concept okay it's it's we we don't

play39:07

necessarily have plus kind of built into

play39:10

our brains um it's it's kind of I mean

play39:15

it's it's really

play39:17

uh so I'm going to show this this might

play39:20

be controversial but I think that

play39:24

Simplicity is based on familiar

play39:28

we are used to plus as a concept we are

play39:31

used to adding numbers as a concept

play39:35

therefore we call it

play39:37

simple you can go back another step

play39:40

further the reason we're familiar with

play39:42

addition is because it's useful adding

play39:46

numbers is useful for describing the

play39:48

world I count things right that's useful

play39:52

to live in our universe it's useful to

play39:54

count things to measure things addition

play39:58

is

play39:59

useful and it's it's it's really one of

play40:01

the most useful things so that is why we

play40:04

are familiar with it and I would argue

play40:07

that's why we think it's

play40:08

simple but the the Simplicity we have

play40:13

often argued is uh if it's simple it's

play40:18

more likely to be useful I think that is

play40:22

actually not a statement about

play40:24

Simplicity it's actually a statement

play40:26

that if if something is useful for

play40:30

problems like a b and c then it seems it

play40:32

will also be useful for another problem

play40:36

the the the world is compositional if I

play40:38

have a model that works for this set of

play40:41

problems it's probably also going to

play40:42

work for this one um so that's that's

play40:45

the argument I would like to make so

play40:48

when we interpret these models I think

play40:51

it's important to kind of keep this in

play40:54

mind and and and really kind of probe

play40:58

what is simple what is

play41:01

interpretable

play41:03

so I think this is really exciting for

play41:07

polymathic AI because these models that

play41:11

are trained on many many systems they

play41:15

will find broadly useful algorithms

play41:19

right they'll they'll they'll have these

play41:20

neurons that share calculations across

play41:23

many different disciplines so you could

play41:27

argue that that is the utility and I

play41:30

mean like maybe we'll discover new kind

play41:32

of operators and be familiar with those

play41:36

and and and we'll start calling those

play41:37

simple so it's not necessarily that all

play41:41

of the uh things we discover in machine

play41:43

learning will be uh simple it it's uh

play41:47

kind of that by definition the polymath

play41:50

models will be broadly

play41:52

useful and if we know they're broadly

play41:56

useful we might we might might get

play41:57

familiar with those and and that might

play41:59

kind of Drive the Simplicity of them um

play42:03

so that's my node on Simplicity and so

play42:07

the the takeaways here are that I think

play42:10

interpreting a neural

play42:12

network trained on some data sets um

play42:16

offers new ways of discovering

play42:18

scientific insights from that data um

play42:21

and I I think Foundation models like

play42:23

polyic AI I think that is a very

play42:25

exciting way of discovering new broadly

play42:28

applicable uh scientific models so I'm

play42:31

really excited about this direction uh

play42:33

and uh thank you for listening to me

play42:36

[Applause]

play42:50

today great U so three

play42:54

questions one was the

play43:02

running

play43:07

yeah when it's fully built out is to be

play43:13

free

play43:15

yeah please use your seat

play43:23

mic

play43:25

yeah and three

play43:28

you're pretty

play43:37

young okay so I'll try to

play43:40

compartmentalize those okay so the first

play43:41

question was the scale of training um

play43:46

this is really an open research question

play43:49

we don't have the scaling law for

play43:52

science yet we have scaling laws for

play43:53

language we know that if you have this

play43:55

many gpus you have this size data set

play43:58

this is going to be your performance we

play43:59

don't have that yet for science cuz

play44:01

nobody's built this scale of model um so

play44:04

that's something we're looking at right

play44:06

now is what is the tradeoff of scale and

play44:10

if I want to train this model on many

play44:12

many gpus is it is it worth it um so

play44:16

that's an that's an open research

play44:17

question um I do think it'll be large

play44:21

you know

play44:22

probably order hundreds of gpus uh

play44:25

trained for um um maybe a couple months

play44:29

um so it's going to be a very large

play44:31

model um that's that's kind of assuming

play44:33

the scale of language models um now the

play44:37

model is going to be free definitely

play44:38

we're we're uh we're all very Pro open

play44:41

source um and I think that's I mean I

play44:44

think that's really like the point is we

play44:46

want to open source this model so people

play44:48

can download it and use it in science I

play44:50

think that's really the the most

play44:52

exciting part about this um and then I

play44:55

guess the Third question you had was

play44:58

about the future um and how it

play45:02

changes uh how we

play45:05

teach um I mean I guess uh are you are

play45:08

you asking about teaching science or

play45:10

teaching machine learning teaching

play45:12

science I see

play45:15

um I mean yeah I mean I don't know it

play45:18

depends if it if it works I think if it

play45:20

works it it might very well like change

play45:22

how how science is taught

play45:25

um yeah I mean so I don't I don't know

play45:28

the impact of um language models on

play45:31

computational Linguistics I'm assuming

play45:33

they've had a big impact I don't know if

play45:35

that's affected the teaching of it yet

play45:38

um but if if you know scientific

play45:41

Foundation models had a similar impact

play45:43

I'm sure I'm sure it would impact um I

play45:45

don't know how much it probably depends

play45:47

on the success of the

play45:54

models I I have a question about your

play45:56

foundation models also so in different

play45:59

branches of science the data sets are

play46:00

pretty different in molecular biology or

play46:02

genetics the data sets you know is a

play46:04

sequence of DNA versus astrophysics

play46:06

where it's images of stars so how do you

play46:09

plan to you know use the same model you

play46:11

know for different different form of

play46:13

data sets input data sets uh so you mean

play46:16

how to pose the objective yes so I I

play46:19

think the most I mean the most General

play46:22

objective is self-supervised learning

play46:25

where you basically mask parts of the

play46:27

data and you predict the missing part if

play46:30

you can you know optimize that problem

play46:33

then you can solve tons of different

play46:34

ones you can do uh regression predict

play46:37

parameters or go the other way and

play46:38

predict rollouts of the model um it's a

play46:41

really General problem to mask data and

play46:45

then fill it back in that kind of is a

play46:48

superset of uh many different prediction

play46:51

problems yeah and I think that's why

play46:53

like language models are so broadly

play46:56

useful even though there train just on

play46:58

next word prediction or like B is a

play47:01

masked

play47:06

model thanks uh can you hear me all

play47:10

right so um that was a great talk um I'm

play47:12

Victor uh so uh I'm actually a little

play47:16

bit uh worried and this is a little bit

play47:18

of a question whenever you have models

play47:21

like this um you said that you train

play47:24

this on many examples right so imagine

play47:27

you have already embedded the laws of

play47:29

physics here somehow like let's say the

play47:31

law of ration but when you when you

play47:33

think about like this c new physics we

play47:36

always have this question whether we are

play47:38

you know actually Reinventing the wheel

play47:39

or like the uh the network is kind of

play47:42

really giving us something new or is it

play47:44

something giving us uh or it's giving us

play47:47

something that you know it it learned

play47:48

but it's kind of wrong so in sometimes

play47:51

we have the answer to know you know

play47:53

which one is which but if you don't have

play47:56

that let's say for instance you're

play47:57

trying to discover what dark matter is

play47:59

which you know something I'm working on

play48:01

how would you know that the networ is

play48:03

actually giving you something new and

play48:05

not you know just trying to set this

play48:07

into one of the many parameters that it

play48:09

has I see um

play48:11

so okay

play48:14

so so if you want to test the model by

play48:17

letting it ReDiscover something then I

play48:19

don't think you should use this I think

play48:21

you should use the scratch model like

play48:23

from scratch and train it because if you

play48:26

TR if you use a pre-train model it's

play48:27

probably already seen that physics so

play48:30

it's biased towards it in some ways so

play48:32

if you're rediscovering something I

play48:33

don't think you should use this if

play48:35

you're discovering something new um I do

play48:38

think this is more useful um so I think

play48:43

a like a a

play48:45

misconception of of uh I think machine

play48:49

learning in general is that scientists

play48:51

view machine learning for uninitialized

play48:54

models like randomly initialized weights

play48:56

as a neutral prior but it's not it's a

play48:59

very uh it's a very explicit prior um

play49:04

and it happens to be a bad prior um so

play49:08

if you train from a a randomly

play49:11

initialized model it's it's kind of

play49:14

always going to be a worse prior than

play49:16

training from a pre-train model which

play49:18

has seen many different types of physics

play49:20

um I think I think we can kind of make

play49:22

that statement um so if you're if you're

play49:25

trying to discover new physics I I mean

play49:28

I mean like if it if you train it on

play49:30

some data set um I guess you can always

play49:33

verify that it that the predictions are

play49:35

accurate so that would be um I guess one

play49:39

way to to verify it um but I I do think

play49:42

like the fine-tuning here so like taking

play49:45

this model and training it on the task I

play49:47

think that's very important I think in

play49:49

language models it's not it's not as

play49:52

emphasized like people will just take a

play49:54

language model and and tweak the prompt

play49:56

to get a better result I think for

play49:59

science I think the prompt is I mean I

play50:03

think like the equivalent of the prompt

play50:04

would be important but I think the fine

play50:06

tuning is much more important because

play50:07

our data sets are so much different

play50:09

across

play50:13

science the

play50:21

back that the

play50:24

symbolic lied the dimensionality of the

play50:28

system so are you introducing also the

play50:33

funing and transfer learning a

play50:37

way

play50:44

en uh yeah

play50:46

so so the symbolic regression I mean I

play50:48

would consider that it it's not used

play50:51

inside the foundation model part I think

play50:54

it's

play50:55

interesting to interpret the foundation

play50:57

model and see if there's kind of more

play51:00

General physical Frameworks that it

play51:03

comes up with

play51:05

um I think yeah symbolic regression is

play51:08

very limited in that it's bad at high

play51:10

dimensional

play51:11

problems I think that might

play51:14

be because of the choice of operators um

play51:19

like I think if you can consider maybe

play51:21

High dimensional operators you you might

play51:23

be uh a bit better off I mean symbolic

play51:25

regression it it's uh it's an active

play51:28

area of research and I think the hardest

play51:31

the biggest hurdle right now is it's uh

play51:34

it's not good at finding very complex

play51:36

symbolic

play51:44

models

play51:48

comp so um I guess uh you

play51:53

could it depends like on the

play51:55

dimensionality of the data

play51:57

um I guess if it's very high dimensional

play52:00

data you're always kind of um like

play52:05

symbolic regression is not good to high

play52:06

dimensional data unless you can have

play52:08

kind of some operators that aggregate to

play52:13

lower dimensional uh

play52:15

spaces um I don't yeah I don't know if

play52:19

I'm answering your question or

play52:21

not okay I wanted to ask a little bit so

play52:25

like when you were showing the

play52:26

construction of these trees each

play52:29

generation in the different operators I

play52:31

think this is related to kind of General

play52:33

themes of the talk and other questions

play52:34

but often in doing science when you're

play52:36

learning it you're presented with kind

play52:37

of like algi to solve problems like you

play52:40

know diagonalize hilon or something like

play52:42

that what how do you encapsulate that

play52:45

aspect of doing science that is kind of

play52:47

the almic side soling problem

play52:51

rather right please use your mic oh yeah

play52:56

uh yeah so the question was about um how

play52:58

do you incorporate kind of more General

play53:01

uh not analytic operators but kind of

play53:04

more General algorithms like a

play53:06

hamiltonian operator um I think that I

play53:10

mean like in principle symbolic

play53:11

regression is it's part of a larger

play53:14

family of an algorithm called program

play53:16

synthesis where the objective is to find

play53:19

a program you know like code that

play53:23

describes a given data set for example

play53:26

so

play53:27

if you can write your

play53:29

operators into your symbolic regression

play53:31

approach and your symbolic regression

play53:34

approach has that ground truth model in

play53:37

there somewhere then I think it's

play53:39

totally possible I think like it's it's

play53:43

uh it's harder to do I think like even

play53:45

symbolic regression with scalers is uh

play53:47

it's fairly it's fairly difficult to to

play53:50

actually set up an algorithm um I think

play53:53

I don't know I think it's really like an

play53:54

engineering problem but the the the

play53:56

conceptual part is uh is totally like

play54:00

there for this

play54:06

yeah thanks um oh

play54:10

sorry okay um this this claim uh that

play54:15

random initial weights are always bad or

play54:17

pre-training is always good I don't know

play54:19

if they're always bad but um it seems

play54:22

like from our

play54:23

experiments it's we've never seen a case

play54:26

where

play54:28

pre-training um on some kind of physical

play54:30

data hurts like the cap video is is an

play54:33

example we thought that would hurt the

play54:35

model it didn't that is a cute example

play54:38

weird I'm sure there's cases where some

play54:40

pre-training hurts yeah so that that's

play54:42

essentially my question so we're aware

play54:44

of like adversarial examples for example

play54:45

you train on Mist add a bit of noise it

play54:47

does terrible compared to what a human

play54:48

buo what do you think adversarial

play54:51

examples look like in science yeah yeah

play54:53

I mean I don't I don't know what those

play54:54

are but I'm sure they exist somewhere

play54:57

where pre-training on certain data types

play55:00

kind of messes with training a bit um we

play55:03

don't know those yet but uh yeah it'll

play55:06

be interesting do you think it's a

play55:07

pitfall though of like the approach

play55:09

because like I have a model of the sun

play55:10

and a model of DNA you know it's yeah

play55:14

yeah I mean um I don't know like um I

play55:18

guess we'll see um yeah it's it's hard

play55:21

to it's hard to know like I guess from

play55:24

language we've seen you can pre-train

play55:27

like a language model on video data and

play55:29

it helps the language which is really

play55:31

weird but it it does seem like if

play55:34

there's any kind of Concepts it does if

play55:36

it's flexible enough it can kind of

play55:38

transfer those in some ways so we'll see

play55:41

I mean there's I mean presumably we'll

play55:43

find some adversarial examples there so

play55:46

far we haven't we thought the cat was

play55:48

one but it wasn't it it

play55:53

helped

Rate This

5.0 / 5 (0 votes)

Related Tags
AI InterpretationNeural NetworksPhysical InsightsScientific DiscoveryPolymathic AIData CompressionFoundation ModelsSymbolic RegressionScience AdvancementMachine Learning