Liquid Neural Networks

MITCBMM
8 Oct 202149:30

Summary

TLDRThe video script features a CBMM talk with Daniela Rus, director of CSAIL, and Dr. Ramin Hasani, where they introduce the concept of 'liquid neural networks.' These networks, inspired by neuroscience, aim to improve upon traditional deep neural networks by offering more compact, sustainable, and explainable models. Hasani discusses the limitations of current AI systems, which rely heavily on computation and data without fully capturing the causal structure of tasks. He presents a new approach that integrates biological insights into machine learning, resulting in models that are more expressive, robust to perturbations, and capable of extrapolation. The talk also covers the implementation of these models using continuous-time processes and the potential applications in real-world robotics and autonomous driving.

Takeaways

  • 📚 Daniela Rus introduced the concept of bridging the natural world with engineering, focusing on intelligence in both biological brains and artificial intelligence (AI).
  • 🤖 Ramin Hasani presented Liquid Neural Networks, inspired by neuroscience, aiming to improve upon deep neural networks in terms of compactness, sustainability, and explainability.
  • 🧠 The natural brain's interaction with the environment and its ability to understand causality were highlighted as areas where AI could benefit from biological insights.
  • 🚗 Attention maps from AI systems were discussed, noting differences in focus when driving decisions are made, with an emphasis on the importance of capturing the true causal structure.
  • 🔬 Hasani's research involved looking at neural circuits and dynamics at the cellular level to understand the building blocks of intelligence.
  • 🌐 Continuous time neural networks (Neural ODEs) were explored for their ability to model sequential behavior and their potential advantages over discrete representations.
  • 🔍 The importance of using numerical ODE solvers for implementing these models and the trade-offs between different solvers in terms of accuracy and memory complexity were discussed.
  • 🤝 The integration of biological principles, such as leaky integrators and conductance-based synapse models, into AI networks to improve representation learning and robustness was emphasized.
  • 📉 The expressivity of different network types was compared, demonstrating that liquid neural networks could produce more complex trajectories, indicating higher expressivity.
  • 🚀 Applications of these networks were shown in real-world scenarios like autonomous driving, where they outperformed traditional deep learning models in terms of parameter efficiency and robustness to perturbations.
  • ⚖️ The potential of liquid neural networks to serve as a bridge between statistical and physical models, offering a more causal and interpretable approach to machine learning, was highlighted.

Q & A

  • Who is the presenter of today's CBMM talk?

    -The presenter of today's CBMM talk is Daniela Rus, the director of CSAIL.

  • What is the main focus of Daniela Rus' research?

    -Daniela Rus' research focuses on bridging the gap between the natural world and engineering, specifically by drawing inspiration from the natural world to create more compact, sustainable, and explainable machine learning models.

  • What is the name of the artificial intelligence algorithm that Ramin Hasani is presenting?

    -Ramin Hasani is presenting Liquid Neural Networks, a class of AI algorithms.

  • How do Liquid Neural Networks differ from traditional deep neural networks?

    -Liquid Neural Networks differ from traditional deep neural networks by incorporating principles from neuroscience, such as continuous dynamics, synaptic release mechanisms, and conductance-based synapse models, leading to more expressive and causally structured models.

  • What are the advantages of using continuous time models in machine learning?

    -Continuous time models offer advantages such as a larger space of possible functions, arbitrary computation steps, the ability to model sequential behavior more naturally, and improved expressivity and robustness to perturbations.

  • How do Liquid Neural Networks capture the causal structure of data?

    -Liquid Neural Networks capture the causal structure of data by using dynamical systems that are inspired by biological neural activity, allowing them to understand and predict the outcomes of interventions and to perform better in out-of-distribution scenarios.

  • What is the significance of the unique solution property in the context of Liquid Neural Networks?

    -The unique solution property, derived from the Picard-Lindelof theorem, ensures that the differential equations describing the network's dynamics have a unique solution under certain conditions, which is crucial for the network's ability to make deterministic predictions and maintain stability.

  • How do Liquid Neural Networks improve upon the limitations of standard neural networks?

    -Liquid Neural Networks improve upon standard neural networks by providing a more expressive representation, better handling of memory and temporal aspects of tasks, enhanced robustness to input noise, and a more interpretable model structure due to their biological inspiration.

  • What are some potential applications of Liquid Neural Networks?

    -Potential applications of Liquid Neural Networks include autonomous driving, robotics, generative modeling, and any task that requires capturing causal relationships, temporal dynamics, or making decisions based on complex data.

  • What are the challenges or limitations associated with implementing Liquid Neural Networks?

    -Challenges or limitations associated with Liquid Neural Networks include potentially longer training and testing times due to the complexity of ODE solvers, the possibility of vanishing gradients for learning long-term dependencies, and the need for careful initialization and parameter tuning.

  • How does the research presented by Ramin Hasani contribute to the broader field of artificial intelligence?

    -The research contributes to the broader field of artificial intelligence by proposing a new class of algorithms that are inspired by neuroscience, which can lead to more efficient, robust, and interpretable AI models. It also opens up new avenues for research in understanding intelligence and developing advanced machine learning frameworks.

Outlines

00:00

🎉 Introduction to CBMM Talk and Daniela Rus

The presenter warmly welcomes the audience to a CBMM talk featuring Daniela Rus, a renowned director of CSAIL and a significant contributor to robotics. Daniela is recognized for her innovative ideas in robotics and AI, which are often featured in tech news. She is also known for her interest in the brain's problem, not just AI, and her role as an advisor for the presenter. Daniela introduces Dr. Ramin Hasani, who will lead the presentation on a new idea that aims to bridge the natural and engineering worlds by creating more compact, sustainable, and explainable machine learning models.

05:01

🧠 Bridging Neuroscience and Machine Learning

Dr. Ramin Hasani begins by expressing his excitement to present 'liquid neural networks,' a class of AI algorithms that integrate neuroscience principles into machine learning. He contrasts brain activity patterns with those of a trained network controlling an autonomous car, highlighting the similarities and fundamental differences. Ramin emphasizes the importance of understanding the causal structure of tasks, the robustness of natural brains, and the efficiency of neural models. He demonstrates a typical statistical machine learning system and discusses the limitations of convolutional neural networks (CNNs) in capturing the true causality behind driving decisions. The talk then shifts towards improving these models by incorporating insights from neuroscience.

10:03

🔬 Liquid Neural Networks and Their Expressiveness

Ramin Hasani explains the concept of liquid neural networks, which are inspired by the nervous systems of small species and operate on continuous dynamics described by differential equations. These networks are shown to be more expressive than traditional deep learning models, capable of handling memory and capturing the true causal structure of data. They are also robust to perturbations and can be used for generative modeling and extrapolation. The presentation includes a detailed discussion on how these networks are created, starting from the interaction of two neurons and the synaptic propagation between them.

15:06

🚗 Implementing Neural ODEs for Autonomous Driving

The talk delves into the practical implementation of neural ODEs, particularly in the context of autonomous driving. Ramin outlines the process of using numerical ODE solvers to implement these models, including the use of explicit Euler solvers and adjoint sensitivity methods for backpropagation. He also discusses the challenges of implementing these models in real-world applications and the need for improvement by drawing inspiration from biological processes, such as the leaky integrator model and conductance-based synapse models.

20:09

🤖 Liquid Neural Networks in Robotics and Decision Making

Ramin Hasani discusses the application of liquid neural networks in robotics, particularly in decision-making processes. He shows how these networks, with their dynamic causal structures, can outperform statistical models and physical models in tasks requiring temporal data processing. The talk includes an empirical analysis of the networks' performance in various tasks, including physical dynamics modeling and autonomous driving. Ramin also addresses the limitations of these networks, such as complexity tied to ODE solvers and potential issues with vanishing gradients, and suggests solutions like using gating mechanisms to preserve gradients.

25:11

🌟 Conclusion and Future Perspectives

In conclusion, Ramin Hasani emphasizes the potential of liquid neural networks, which combine elements of computational neuroscience and machine learning, to perform inference model-free, capture temporal aspects of tasks, and improve decision-making. He suggests that these networks can be composed and connected in various architectures, making them highly versatile. Ramin also highlights the importance of focusing on how brains acquire knowledge to narrow down the vast research space in AI. He encourages further exploration of these networks for complex tasks and mentions the open-source availability of the technology presented.

Mindmap

Keywords

💡Artificial Intelligence

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is a central theme as the discussion revolves around advancements in AI and the desire to improve upon current methodologies by drawing inspiration from natural intelligence as seen in the human brain.

💡Machine Learning

Machine Learning is a subset of AI that involves the use of statistical methods for computers to learn from data without being explicitly programmed. The video discusses machine learning models, emphasizing the need for more compact, sustainable, and explainable models as opposed to those based on deep neural networks.

💡Neural Networks

Neural Networks are a computational model inspired by the human brain and are used in machine learning applications. The script mentions neural networks in the context of their limitations and the exploration of new ideas to improve their performance, particularly by incorporating principles from neuroscience.

💡Liquid Neural Networks

Liquid Neural Networks, as introduced in the video, are a class of AI algorithms that integrate neuroscience with machine learning in a structured manner. They aim to capture the causal structure of data and are more expressive and robust to perturbations compared to traditional neural networks.

💡Neuroscience

Neuroscience is the scientific study of the nervous system and the brain. The video emphasizes the importance of neuroscience in informing AI development, particularly in understanding how the brain processes information and making decisions, which can inspire more efficient and robust AI models.

💡Causal Structure

Causal Structure refers to the relationships between causes and effects in a system. In the context of the video, it is important for understanding how liquid neural networks can capture the underlying causality of tasks, leading to more reliable and interpretable AI models.

💡Sustainability

In the video, sustainability refers to the goal of creating AI models that are more efficient in their use of computational resources. This is in contrast to deep neural networks, which are often criticized for their high computational and energy costs.

💡Explainability

Explainability in AI refers to the ability to understand, interpret, and explain the decisions made by an AI model. The video discusses the need for AI models that are not only effective but also transparent in their decision-making processes, which is a challenge with complex models like deep neural networks.

💡Deep Neural Networks

Deep Neural Networks are a type of neural network with multiple layers between the input and output layers, enabling the network to learn complex patterns in data. The video script discusses the limitations of these networks, such as their lack of explainability and efficiency, and the desire to move towards more advanced models.

💡Computational Models

Computational Models are mathematical models that use computer simulation to study complex systems or phenomena. In the video, computational models are used to mimic biological processes, such as the behavior of neurons and synapses, to create more advanced AI systems.

💡Autonomous Driving

Autonomous Driving refers to vehicles that are capable of driving themselves without the need for human input. The video discusses the application of AI and machine learning in the context of autonomous driving, highlighting the challenges and potential improvements that liquid neural networks could bring to this field.

Highlights

Daniela Rus introduces a new idea in robotics that aims to bridge the gap between the natural world and engineering.

Dr. Ramin Hasani presents liquid neural networks, inspired by neuroscience, for structured machine learning.

Liquid neural networks demonstrate similarities in activation patterns to natural brain activity.

The research explores the fundamental differences and gaps between intelligence in natural brains and deep learning models.

Natural brains' interaction with the environment and causal understanding is a key area for improving AI.

Brains' robustness and flexibility, especially in perturbations, is highlighted as an aspect to emulate in AI models.

Efficiency in neural models is emphasized, noting that not all parts of a network are always active.

Attention maps from CNNs reveal a learned focus on the sides of the road, not the actual causation for driving decisions.

Adding noise to images significantly impacts the decisions made by conventional CNNs, demonstrating a lack of reliability.

Neuroscience can improve AI by incorporating system-level goals and mechanisms from biological systems.

Liquid neural networks are proposed as more compact, sustainable, and explainable models than deep neural networks.

The expressivity of liquid neural networks is theoretically and empirically evaluated, showing higher trajectory lengths.

LTC networks outperform other models in tasks requiring temporal data processing and have better inference capabilities.

Liquid neural networks are robust to perturbations and can be used for generative modeling and extrapolation.

The research successfully implements liquid neural networks in real-world robotics and autonomous driving.

The attention maps of liquid networks are more focused on the true causal structure of tasks compared to traditional CNNs.

The integration of biological principles into machine learning leads to improved representation learning and model robustness.

The technology and research behind liquid neural networks are open source, available for further exploration and development.

Transcripts

play00:00

PRESENTER: So welcome to today's CBMM talk.

play00:06

It's great, really great, to have Daniela Rus coming here.

play00:12

She's, of course, the director of CSAIL, a great leader.

play00:20

I think you all know her.

play00:23

And from time to time, she has these great, wonderful, simple,

play00:28

beautiful ideas in robotics, which

play00:31

we read in papers and in the news, in the tech news.

play00:36

And she's also a great friend of CBMM,

play00:41

has been a great advisor for me.

play00:44

And it's somebody who really likes the problem of the brain

play00:50

and not just artificial intelligence,

play00:53

although artificial intelligence, of course,

play00:55

is also a great problem.

play00:57

DANIELA RUS: Thank you for this kind introduction.

play00:59

It's really a great pleasure to be here

play01:03

to share some of our ideas with the CBMM community.

play01:08

And so today, we will tell you about a new idea

play01:14

we have been pursuing, together with Dr. Ramin Hasani, who

play01:18

will present most of the talk.

play01:21

And the basic idea we want to describe with you

play01:27

aims to bring the natural world and the engineering world

play01:31

closer together.

play01:33

And Ramin and I are going at this problem,

play01:38

in part because we have a general curiosity and desire

play01:43

to understand intelligence, in part

play01:46

because when I look at the state of the art in the field

play01:50

of artificial intelligence, I see a lot of advancements.

play01:55

And I see that these advancements are really

play01:59

using decades-old ideas that are enhanced

play02:03

by computation and data.

play02:04

And so natural question is whether this is intelligence.

play02:09

Another question is, are there other ideas?

play02:12

Can we use the natural world to inspire

play02:15

us to think differently?

play02:19

Because I believe if we don't come up with new ideas,

play02:22

then our results are going to become increasingly more

play02:26

incremental.

play02:26

Because more and more people will be plowing the same field.

play02:29

And so the field really desperately needs

play02:33

some new ideas.

play02:35

And the idea that Ramin will describe today

play02:41

aims to build machine learned models

play02:46

that are much more compact, much more sustainable, and much more

play02:51

explainable than the models that are

play02:54

based on deep neural networks.

play02:57

And so let me just say that much.

play03:00

And now, it is my great pleasure to introduce more formally

play03:03

Dr. Ramin Hasani.

play03:05

Ramin is a postdoc in my group.

play03:07

Prior to joining my group, he was a PhD student

play03:11

at the Technical University in Vienna.

play03:15

And prior to that, he did his master's degree

play03:18

at Politecnico di Milano.

play03:22

And so with that, Ramin, please join us and tell us

play03:26

about your vision and results.

play03:29

RAMIN HASANI: So hi, everyone.

play03:30

Thanks, Daniela, for the introduction.

play03:33

And thanks, Professor Poggio.

play03:37

All right, I'm very excited to be here,

play03:39

presenting liquid neural networks, a class

play03:42

of artificial intelligence algorithms

play03:44

that tries to bring a little bit of neuroscience

play03:48

in a structured way to machine learning.

play03:53

So if you look at neural activity in brains, in general,

play03:59

on the left side, you see the brain activity of a mouse,

play04:03

and on the right side, you see one of the networks

play04:06

that we trained end to end--

play04:07

a controller for controlling an autonomous car.

play04:11

We see that, basically, the activation

play04:14

of the patterns and activations maybe,

play04:17

superficially, look very similar.

play04:19

But in principle, there are fundamental differences.

play04:24

There are huge gaps between intelligence

play04:27

as we know them in brains compared to deep models,

play04:31

in particular, representation learning capacities--

play04:34

how natural brains actually approach the organization

play04:40

of the world around them to make use of them,

play04:44

to be able to control them to achieve their goals.

play04:47

So we know that natural brains interact highly

play04:51

with their environments in order to understand their world.

play04:55

So by understanding-- I mean when they can actually interact

play05:00

with the world and to capture causality, basically,

play05:04

like the causal structure of the task that they are performing.

play05:08

And this is one of the reasons where natural brains can

play05:11

actually go out of distribution, where statistical machine

play05:15

learning, by definition, will stay in IID, right?

play05:19

And this is one area that would be extremely beneficial if we

play05:23

can explore more and maybe bring some of those insights

play05:27

from natural brains back to artificial intelligence.

play05:31

And at the same time, we know that brains

play05:34

are much more robust and much more

play05:36

flexible in terms of a perturbation

play05:38

or environments that they are getting into.

play05:43

And finally, efficiency of the models.

play05:46

So a network is not always active,

play05:49

so there is always some part of the network that

play05:53

is taking care of the computations that is on demand.

play05:56

So allow me to demonstrate this kind

play05:59

of a typical, statistical end-to-end machine learning

play06:03

system, so where you have inputs that are from camera inputs.

play06:09

And then you have a deep neural network

play06:11

that is take care of the, let's say, steering angle of a car.

play06:16

So in this kind of framework, what we are seeing,

play06:19

we are seeing the activity of the network.

play06:21

And we see that this network is actually

play06:26

real work tested on a real car.

play06:29

And these are demonstrations from the test

play06:31

set, where they are actually deployed in the environment.

play06:34

They have been trained using human data,

play06:37

and they are now deployed.

play06:40

So one of the things that we actually

play06:44

looked into is, basically, the attention of this network,

play06:50

like what kind of representation has been learned?

play06:52

What pixels are the most important pixels when a driving

play06:55

decision is being made?

play06:57

So this CNN actually learned to attend

play06:59

to the sides of the road, where we see lighter

play07:02

regions in this attention map, in order

play07:04

to take driving decisions.

play07:06

And that's not a actual causation.

play07:09

When you're driving, you're not just looking around, right?

play07:12

You're looking into the road and in front of you.

play07:15

So you want to actually have your focus on that perspective.

play07:18

So the causal structure here is missing,

play07:21

although the task is being completed by the network.

play07:24

Now, if you add some noise on top of the image,

play07:28

like a little bit of noise, we see that this attention map

play07:32

is not even reliable anymore.

play07:34

Even if this noise is kind of a small Gaussian perturbation,

play07:40

you can see that it has huge influence on the decisions

play07:44

and the consistency of the decisions

play07:46

that the network makes.

play07:48

So how can we improve this by bringing neuroscience in.

play07:52

As Marr and Poggio said and set up a framework for us

play07:57

for actually creating--

play07:59

let's say, if you want to explain a biological system,

play08:02

you want to say, at a system level,

play08:03

you can look at it from a system level

play08:06

and find out, what are the goals of the system

play08:09

and what are the kind of mechanisms

play08:11

that, actually, you get to the goals, that's the system level.

play08:15

And then you can also have this view

play08:17

of looking into building blocks of these things,

play08:20

going down and looking into how intelligence

play08:23

emerges from cells.

play08:25

You can go down and basically use

play08:27

computational models, precise mechanisms that

play08:30

exist in biology.

play08:31

So having this kind of framework in mind, what we can do--

play08:38

and that's what we did, just showing you

play08:40

an outline of how this research is a summary of what

play08:45

this research is about.

play08:47

So we looked into nervous system of a small species.

play08:50

And we got down into neural circuit level.

play08:57

And even for understanding neural circuits,

play08:59

we actually went into the neuron and synapse level

play09:02

even further to explain, to really fundamentally figure

play09:06

out, what are the building blocks there.

play09:08

And you know that you can even go lower

play09:10

than that and computational model down to atoms.

play09:14

But there is actually a level that you

play09:16

have to satisfy yourself that you don't want to go below

play09:18

that in order to actually get there and then take this model

play09:22

and see what kind of capabilities

play09:24

you can have using the engineering,

play09:26

super-advanced machine learning frameworks that recently got

play09:29

developed.

play09:30

So we stopped at a certain level,

play09:32

which I'm going to explain throughout the talk.

play09:34

And we saw that these models are much more expressive

play09:37

than their compartments in deep learning,

play09:40

although the kind of abstraction that we did is really simple.

play09:45

But in terms of how much capacity

play09:48

these networks can generate, they are much more expressive.

play09:51

And I'm going to show you the math behind

play09:53

and also the experimental evidence for that.

play09:56

These systems can handle memory, and these systems

play09:59

can handle explicit and implicit memory mechanisms

play10:03

that I will explain throughout the talk.

play10:06

More importantly, these systems can capture the true causal

play10:09

structure of the data.

play10:11

And that's part of the reason why these systems actually

play10:16

can be helpful in those kind of this closed-form, real-world

play10:20

decision-making processes.

play10:23

The systems are basically robust to perturbations.

play10:29

And we can use them for generative modeling.

play10:32

We can even use them for extrapolation.

play10:34

You can go out of distribution with these type of networks.

play10:38

Because if some process can capture the causal structure

play10:42

of the data and you can prove that that's the case,

play10:45

then the system is being able to actually go

play10:47

even out of distribution.

play10:50

And with that in mind, we actually

play10:51

try to perform decision making in real-world robotics.

play10:55

We are distributed robotics lab, and we

play10:58

want to bring these insights into the brains.

play11:02

Now, to show you what kind of change we have done,

play11:06

you can look at this system.

play11:07

This system has now, on the right-hand side,

play11:10

what you see is the 19 nodes of the system that

play11:14

is sparsely connected together.

play11:16

And this is described by that model

play11:19

that, actually, we developed.

play11:20

And then you can actually get into attention maps that

play11:23

are much more focused on the true causal structure

play11:28

of the task.

play11:29

And this is not just on this task.

play11:34

But we can actually see more throughout the talk.

play11:37

Well, how do you get started for creating a model?

play11:41

Let's look into the, let's say, interaction of two neurons

play11:46

and the synaptic propagation between information propagation

play11:50

between the two.

play11:51

So neural dynamics are typically given--

play11:53

unlike deep learning systems--

play11:55

they're given with continuous processes.

play11:58

And they are described by differential equations.

play12:02

So synaptic release is not just the scalar rate.

play12:06

So synaptic release can be modeled

play12:08

with much more sophisticated kind of mechanisms.

play12:11

So you can really get down to probability

play12:16

of if a neurotransmitter is actually

play12:18

going to stick to the receptors of the second neuron.

play12:21

So you can really get into the process, how much complexity.

play12:25

You can really add nonlinearity to the system.

play12:29

And there are also recurrence in the structure, there's memory,

play12:32

and there is a sparsity all over the place in neural circuits.

play12:36

So having these principles in mind,

play12:39

the goal is to actually incorporate

play12:42

these small principles that I mentioned

play12:44

into improving representation learning, improving

play12:48

the robustness of machine learning

play12:49

model and the statistical models, and, at the same time,

play12:53

improving their interpretability.

play12:55

So to get into a common ground between the computational work

play13:00

of neuroscience and the machine learning systems,

play13:03

I would like to start exploring where

play13:06

do we have continuous dynamics.

play13:08

So let's start with these processes that has been

play13:11

recently brought up-- continuous time,

play13:13

or continuous steps models--

play13:16

in the machine learning community.

play13:17

So a continuous time neural network

play13:21

is basically when a neural network f that

play13:25

has certain number of layers, has certain width,

play13:29

it has activation function of choice.

play13:33

And it is a function of its hidden state, its inputs.

play13:38

And it's parameterized by parameters data.

play13:42

So if a neural network f parameterizes the derivatives

play13:46

of the hidden state, then you would have a continuous time

play13:49

process.

play13:50

Now, it's going to be a continuous time neural network.

play13:53

With this representation, you can

play13:55

go from a discrete computational graph,

play14:00

like in residual networks that we have.

play14:02

Like, you would actually take a computation step each layer.

play14:05

Now, if you define your system like the way we show it here,

play14:12

the depth dimension of your system becomes continuous.

play14:16

And when you have a continuous-time system,

play14:19

then you would have a lot of advantages.

play14:23

First of all, the space of possible functions

play14:26

that you could actually explore and generate

play14:29

is much more than that of the discrete representations.

play14:33

Second advantage is the arbitrary computation.

play14:38

So you don't need to perform computation at every time step.

play14:41

You can have arbitrary step time computation.

play14:44

So your depth becomes very variable, basically.

play14:48

So it can be infinitely depths kind

play14:50

of networks with one process.

play14:53

And this would naturally, this continuous process,

play14:56

would be a natural fit for modeling sequential behavior.

play15:01

So let's say, compared to the normal recurrent neural

play15:05

networks that you know, the updated state

play15:07

of a neural network is actually given with this discretization.

play15:12

If you have a neural ODE and, basically,

play15:14

a more stable version of that where it has a damping factor,

play15:18

then you can use this also as a recurring neural network.

play15:22

On the top row, you see the interpolation and extrapolation

play15:27

capability of a recurrent neural network

play15:30

on irregularly sampled data that are put around the spiral.

play15:34

And we see that the red line in between is actually

play15:37

extrapolation capability of this model,

play15:40

where it cannot actually capture the dynamics very well.

play15:43

But on the bottom row, you would actually

play15:46

see that the dynamic process generated by a continuous time

play15:49

recurrent neural network actually captures

play15:52

those dynamics properly and even extrapolates to that.

play15:56

So this is nice.

play15:58

Now, how do we implement these things?

play16:00

I'm just going through the details

play16:02

of how to implement these type of models.

play16:05

So you basically, you want to, actually,

play16:06

because they are ODEs, you want to use numerical ODE solvers.

play16:10

So you basically unroll this difference.

play16:13

And then you can use any type of numerical ODE.

play16:18

So let's say we use an explicit Euler solver.

play16:21

And then, there, you can actually

play16:23

create the forward path of your network

play16:25

based on this unrolled version of your network.

play16:30

And then, choice of these ODE will actually

play16:33

define the complexity of your map.

play16:36

You can use a more complex adaptive solvers

play16:39

that has adaptive step sizes to have

play16:41

a more accurate forward path.

play16:43

How do you now do backward paths?

play16:46

You can use a mathematically known adjoint sensitivity

play16:52

method, where, let's say you have a loss function,

play16:56

and your dynamic is given by a neural ODE.

play16:59

So your loss function, basically,

play17:02

if you have the dynamic of your system starting from t0, given

play17:07

by this time, and you have labeled data,

play17:10

you can compute the output dynamic to compute a loss.

play17:15

And this loss is getting computed

play17:17

by running this ODE solver which basically give you

play17:20

this trajectory.

play17:22

And then, the adjoint method actually

play17:25

creates a new state, an auxiliary differential

play17:28

equation, that connects the dynamics of the loss in respect

play17:34

to the state of the system.

play17:35

And then you can run this ODE backward one step

play17:38

at a time to get the gradients of the loss in respect

play17:42

to the state of the system.

play17:43

And at the same time, you would be

play17:45

able to also get the gradient of the loss

play17:47

in respect to the parameters of the system.

play17:49

So this adjoint sensitivity method on the backward path

play17:53

would give you a constant memory propagation.

play17:56

Because it actually forgets the previous states

play17:58

and it just do one step at a time computation.

play18:02

When it does back propagation,

play18:04

You can also train this network for backpropagation

play18:07

through time, gradient base.

play18:09

And what you do, you perform one forward pass, and then you

play18:12

compute the derivatives of your--

play18:18

based on the chain rule, you can actually

play18:19

compute your derivatives.

play18:21

And you can update your parameters.

play18:23

This way, you are actually not treating the solver

play18:27

in a black box manner.

play18:29

So you are actually going through the solver.

play18:32

So the dynamics of the solver becomes part of your gradient,

play18:34

as well.

play18:35

So you need to be careful about that.

play18:37

But at the same time, the memory complexity of this method

play18:39

is really high.

play18:40

But it is much more accurate than the adjoint method

play18:44

if you use it in a vanilla sense.

play18:46

So I told you how these models are getting implemented forward

play18:49

and backward.

play18:51

Now, we have this neural ODE.

play18:53

So we said the continuous-time processes,

play18:55

and this representation actually can have a spatiotemporal kind

play19:01

of data processing powers.

play19:04

And it actually has a really good potential.

play19:08

But we didn't define any biological process there.

play19:11

We didn't actually get any inspiration

play19:13

from the biological insights that I talked before.

play19:17

And a really funny fact is that when you deploy them

play19:21

in real world, they're even worse

play19:22

than a simple long short-term memory network, right?

play19:25

So basically, what's the point, right?

play19:28

If you define a really fancy equation they cannot even work

play19:32

in real-world applications very well,

play19:35

then what are we even doing?

play19:37

So let's improve.

play19:42

Now, by this improvement, what we want to do,

play19:44

we want to get into biology.

play19:47

I told you that activity of neurons

play19:50

are described by differential equations.

play19:52

And you can actually model the dynamics of a cell

play19:55

or of a membrane as a leaky integrator

play19:59

and with these simple linear dynamics.

play20:03

And the more important part is the conductance-based synapse

play20:08

model, where you can have a nonlinearity included

play20:13

in the synapse of the system and not

play20:15

in the neurons of the system.

play20:17

So basically, the interaction between two nodes

play20:20

or two differential equations is given by a nonlinearity.

play20:24

And this is what is inspired by channel modeling

play20:27

behavior of Hodgkin and Huxley when they did channel

play20:31

modeling of ion channels.

play20:34

So you can actually get into this kind of a steady state

play20:36

behavior from those differential equations of Hodgkin-Huxley.

play20:40

You can reduce them into this abstract form.

play20:43

And if you want to bring it, the nonlinearities

play20:45

look like a sigmoid and activation function.

play20:48

So you actually can, in principle,

play20:51

bring neural networks, inside artificial neural networks,

play20:54

in the representation of a synapse.

play20:58

Now, putting these two systems--

play21:00

very simple things, has been there for over a century--

play21:04

together, you will get a dynamical system of such.

play21:08

And this dynamical system has certain properties

play21:11

and certain advantages.

play21:13

It's obviously a neural ODE.

play21:15

It's an ODE-based neural network.

play21:17

It has a component neural network

play21:19

f and nonlinearity that appears in the coefficient of x

play21:23

of t, or a state of your system, and in the state of the system

play21:27

itself.

play21:28

So there is a coupling between the state

play21:29

and the time constant of your differential equation.

play21:33

So at the same time that f for that linear--

play21:39

let's say I don't have recurrent connections.

play21:41

So x of t in that f is 0.

play21:45

Then f becomes only a function of I,

play21:47

or the inputs of the system.

play21:50

Then the whole system becomes a linear system.

play21:54

Now, if you have that linear system,

play21:56

the coefficient of x of t is input-dependent.

play22:01

So if the inputs of the system is changing,

play22:05

then the kind of behavior of the differential equations changes.

play22:10

Because that defines the damping factor

play22:12

of your very simple neural network that you have

play22:16

and very simple dynamical system that you have.

play22:20

So just to show you a block diagram, like how

play22:24

does it look like, in a standard neural network,

play22:26

the range of possible connections

play22:28

that you might have is basically you can have--

play22:31

let's say you have two neurons.

play22:32

They have activation function.

play22:34

You might be able to have reciprocal connections.

play22:37

You might have feedback.

play22:38

You might have an external input to the system,

play22:41

and they have their own scalar weights.

play22:44

Now, in a liquid network, you would

play22:48

have the same kind of a structure

play22:50

but, at the same time, you have a nonlinearity

play22:53

that controls the interaction of two differential equations.

play22:56

So the difference here is that activations are changed

play22:59

to differential equations.

play23:00

And their interactions are given by a nonlinearity

play23:04

that can be a neural network.

play23:09

So in terms of what does it represent,

play23:13

let's say I trained a neural network for driving,

play23:15

for autonomous driving, from visual data.

play23:17

I'm showing the visual data in the middle.

play23:20

I did that with a standard neural network that

play23:23

has a constant time constant.

play23:26

And I did that with a liquid network.

play23:29

What we are seeing on the x-axis is 1 over tau.

play23:32

That means 1 over the time constant of the system.

play23:39

And on the y-axis, what we see is the steering angle

play23:42

of the car.

play23:43

And the color shows left for blue and yellow

play23:49

for turning right.

play23:53

And in the middle, you have the middle part.

play23:56

So now, we see that a neuron actually

play23:58

learned to associate its behavior, its timing behavior--

play24:04

without any prior, just to plug in those very simple building

play24:09

blocks together--

play24:11

actually learned to associate the dynamics

play24:14

of the task to its behavior.

play24:16

So that's one of the advantages that you receive

play24:18

from these type of networks.

play24:24

Another property of these networks

play24:25

is that the state of these systems are stable.

play24:30

And their time constant and their behavior is stable.

play24:33

So if you define the time constant

play24:35

of the system as that expression that

play24:37

is the coefficient of x of t, or the hidden states,

play24:41

then you can actually write that down

play24:43

as relaxing for not having a recurrent connection.

play24:49

Let's say, x of t is out.

play24:51

Then you would be able to bound the time consent of the system.

play24:54

And these are actually the bounds that you can have.

play24:56

So the network cannot go unstable.

play24:59

You can also bound the state of the system.

play25:02

Let's say a neuron is receiving many synaptic connections.

play25:06

A, in this representation, is a synaptic parameter,

play25:10

and its synapse is specific.

play25:12

So each synapse has a bias, or has

play25:15

an A, that actually has a connection to this neuron.

play25:19

And now, basically, you can say the maximum of the A parameter

play25:26

would be the maximum amount that your state can actually reach.

play25:31

And the minimum of that, the one that has the least one,

play25:34

actually has the least amount of impact on your activity

play25:38

of your differential equation.

play25:43

We can also show that this biologically inspired system

play25:46

is actually a universal approximately.

play25:50

You can actually do a function approximation, use

play25:53

those methods, actually, to prove that, actually,

play25:56

this expression can approximate any given

play25:59

dynamics with arbitrary precision given

play26:02

in number of their cells.

play26:06

But to truly, actually find out how expressive

play26:11

is a neural network from the theoretical standpoint,

play26:14

we want to get down to a more fine-tuned expression.

play26:19

So for example, there are more measures

play26:23

of expressivity of neural networks

play26:24

that we can use for measuring expressivity of a network--

play26:29

for example, the trajectory lengths.

play26:31

Imagine I have a circular trajectory,

play26:34

and I input this circular trajectory

play26:36

to a deep neural network.

play26:38

I'm just defining what is this trajectory length measure.

play26:42

You input this to a neural network.

play26:44

This neural network is parameterized.

play26:46

And then we can observe that, at every layer of the network,

play26:51

this trajectory gets deformed, gets

play26:53

more complex and the lengths of the trajectory getting

play26:57

more complex and complex.

play26:59

And it actually increased exponentially.

play27:02

You can measure that length of this trajectory

play27:07

with an arc length measure.

play27:09

And you can actually find the lower bound

play27:11

for the expressivity of the neural network.

play27:13

Given its depth, you can actually

play27:15

measure the expressivity of a neural network

play27:19

by its parameterization, properties

play27:22

of its synaptic parameterization,

play27:26

the width of the network, and the depth of the network,

play27:30

basically.

play27:31

So we actually did use this expressivity measure.

play27:35

Because this actually draws a boundary

play27:37

between shallow networks and deep networks.

play27:40

The deeper you get, the more expressive

play27:42

you can get based on this measure.

play27:47

Now, in our space, we have continuous-time processes,

play27:50

let's say, liquid time constant networks, or LTCs.

play27:53

We have continuous time neural networks.

play27:55

And we have neural ODE representations.

play27:58

Now, if we give the same neural networks--

play28:01

we parameterize this neural network

play28:03

f for all of these processes, given their representation

play28:07

of differential equation--

play28:09

we see that we consistently get longer and more complex

play28:13

trajectories out of the LTC network.

play28:19

Now, we systematically analyzed this in an empirical fashion,

play28:25

where we changed, basically--

play28:28

like, on the x-axis, you see different types of ODE solvers

play28:31

for these three types of networks.

play28:34

Neural ODEs, CTRNNs, and LTCs.

play28:38

And we see that the yellow line actually

play28:40

shows the trajectory lengths.

play28:42

For these LTC networks, we see that, even

play28:46

if you change the width of a network, on the x-axis,

play28:49

you see that the trajectory length is always higher.

play28:54

And we can see that if the initialization of your network

play28:57

is actually changing, you also have a dependency on that.

play29:01

Now, we also figured out, theoretically,

play29:04

lower bound for expressivity of, basically,

play29:10

these type of networks where the lower bound

play29:13

is a function of weighted scale, biases scale, width

play29:16

of the network, depth of the network,

play29:18

and number of discretization steps

play29:20

that you're taking for your ODE.

play29:22

And we also implemented that for LTCs.

play29:25

You cannot compare lower bounds to say that, yeah,

play29:28

so this network is more expressive than the other one.

play29:31

But it's just a good measure to just see

play29:34

where are we standing in terms of this type of behavior.

play29:39

Now that we have this type of measure and theoretically

play29:42

evaluated them, let's really put these networks in action,

play29:46

and let's see how good they are in representation learning.

play29:50

So one of the things we start with

play29:52

modeling physical dynamics.

play29:54

When I told you that a neural ODE cannot beat an LSTM

play29:57

network, you see that here.

play30:00

And you see that we can actually get better performances

play30:03

while using these networks.

play30:06

You can compare them across a large series of advanced RNMs.

play30:13

And this [INAUDIBLE] inspired network is actually

play30:17

beating them even in person activity in a real example,

play30:21

just to perform, in irregularly sample data.

play30:27

We

play30:28

Also performed some analysis on some real-world examples.

play30:34

And we saw that, on most of these tasks, LTCs are better.

play30:38

For example, one task is LSTM is better,

play30:40

and that's the task where we have longer term dependency.

play30:43

And that's one of the issues that you

play30:44

have to solve gradient propagation

play30:46

in continuous-time processes is problematic.

play30:49

So you always have to take care.

play30:51

If you actually wrap them inside a kind of well-behaved gradient

play30:56

propagation, then you would be also getting

play30:58

a better performance there.

play31:01

We didn't stop there.

play31:02

And we actually scaled the applications

play31:04

to this end-to-end autonomous driving that, at the beginning,

play31:07

I showed you.

play31:08

We have human-collected data.

play31:10

And we trained deep learning models.

play31:13

Typically, a deep learning pipeline actually

play31:16

looks like that when you want to have

play31:18

a set of convolutional heads.

play31:20

And then you would have fully connected networks that has,

play31:26

basically, the over-parameterization part

play31:28

of their network is actually there, in the hidden layers.

play31:31

Between five to 100 million parameters

play31:34

it takes to actually perform lane-keeping,

play31:36

or this type of task, if you have this type of networks.

play31:40

What we did, we said that let's replace

play31:43

the fully connected networks by continuous-time processes,

play31:48

and let's see what kind of behavior we get.

play31:50

So we get four types of variance.

play31:52

We take a neural circuit policy, which is the first one, NCP.

play31:56

That has a four-layer architecture-- again,

play31:59

nature-inspired-- that has interneurons, command neurons,

play32:03

and motor neurons, all LTC-based neurons

play32:06

based on the masses I showed you before.

play32:10

You can replace that fully connected layers

play32:12

with LSTMs and CTRNNs, and you have

play32:15

the convolutional neural network.

play32:16

So I'm going to talk about differences of these four

play32:20

variants.

play32:21

So the first thing, the number of parameters

play32:26

that requires to actually perform autonomous driving

play32:30

is basically significantly reduced

play32:32

when you're using these type of networks.

play32:38

Now, remember the representation of the network where

play32:40

I was showing that convolution on a fully

play32:43

connected convolutional network can

play32:45

get perturbed, the kind of representation they learn.

play32:48

And now, with LTCs, we would be able to have

play32:52

19 neurons at control.

play32:55

And then we perform and see that the convolutional part of it--

play32:59

so what I'm showing in the attention map,

play33:01

we are not changing the convolutional neural network

play33:03

structure of these variants, of these network

play33:07

variants that I showed you.

play33:09

We see that this architecture imposes an inductive bias

play33:12

on the convolutional networks that let

play33:16

them learn a causal structure.

play33:20

Now, if you add, even, noise, we see that the explanations

play33:24

are not scattered as much as it was

play33:27

for convolutional neural networks.

play33:31

We also take to a real-world measure of this,

play33:35

like how many crashes would you have if you increase

play33:37

the amount of input noise?

play33:39

And you will see that these kind of networks

play33:40

are basically much more robust to this type of perturbations.

play33:45

And now let's look at the convolutional neural network

play33:49

attention of these end-to-end trained networks

play33:53

when their heads are different-- when

play33:55

they had a CTRNN, when they had a LSTM,

play33:58

and when they had our LTC-based model.

play34:03

And we see that the kind of prior

play34:06

that the recurrent neural network had

play34:08

put on convolutional neural networks

play34:11

makes them learn different types of weights.

play34:13

So the representations that are learned out of this system

play34:16

are completely different from each other.

play34:19

And we see that the only one that has a consistent behavior

play34:22

is the CNN itself in our solution.

play34:25

But CNN actually focuses consistently

play34:27

on the outside of the road, so we don't want that.

play34:31

LSTM is actually giving you a good-- most of the time--

play34:34

a good representation.

play34:35

But it is actually sensitive to lighting condition.

play34:39

So if I stop the video in some parts,

play34:42

you will see that when the shading areas are not good,

play34:44

the attention of that LSTMs are actually getting scattered.

play34:48

And the CTRNN, or the neural ODEs,

play34:51

basically cannot actually gain a nice representation in this

play34:55

task.

play34:57

Now, why is this the case?

play35:00

Now, let's explore the why of this.

play35:02

So if you look at the taxonomy of possible modeling

play35:06

frameworks, at the bottom at one end of this--

play35:09

I don't want to call it the bottom--

play35:11

at one end of the spectrum, we have the statistical models

play35:15

where statistical models are amazing in learning from data

play35:19

and, at the same time, basically performing inference in IID,

play35:24

so predicting in IID.

play35:26

So this is actually what the statistical models can do.

play35:32

On the other side of the spectrum,

play35:34

we have physical models.

play35:36

So physical models are basically described, usually,

play35:39

by differential equations.

play35:40

When you have differential equations that

play35:42

describes the dynamics of your system,

play35:44

they can actually answer questions.

play35:48

They can account for interventions in the system.

play35:51

So if you can actually design a universal approximator that

play35:55

is closer to the physical kind of models,

play35:59

then you would actually get into a more causal structure

play36:03

by nature.

play36:04

And also, you're being able to actually get

play36:06

insights about the system.

play36:07

You can learn from data.

play36:09

You can answer counterfactual questions and predicting IID

play36:13

and outs of distribution.

play36:18

So as I said, physical dynamics can be modeled by ODEs.

play36:23

And this set of ODEs can actually

play36:27

predict the future evolution of your system.

play36:30

They can describe the results of interventions in the system.

play36:35

And the coupled-time evolution helps

play36:38

us define averaging mechanisms for capturing

play36:41

the statistical dependencies in data.

play36:44

And it enhances our understanding

play36:48

of the physical phenomena.

play36:51

And because of that, they are actually causal structures.

play36:55

So now, let me get more formal about this.

play36:59

Let's say we have a differential equation given

play37:01

by dx over dt equal to g.

play37:05

And g of x is basically a nonlinearity of the system.

play37:09

So we have the Picard-Lindelof theorem

play37:14

that actually shows that this kind of differential equation

play37:18

would have a unique solution if the nonlinearity is Lipschitz.

play37:24

Now, if you unroll this system with Euler,

play37:29

then the representation, the underlying representation

play37:33

under this uniqueness condition, would be a causal mapping.

play37:37

Why?

play37:38

Because you can actually say what

play37:39

happens in the future events, which is the xt plus dt based

play37:46

on the previous events.

play37:49

Now, there is a framework within this spectrum of causal models.

play37:56

It's called dynamic causal model.

play37:58

So a dynamic causal model has the nonlinearity

play38:01

of the shape that you're seeing.

play38:03

It does take a bilinear approximation,

play38:05

or a second-order Taylor approximation, of that ODE.

play38:09

And it gives you these coefficients for the system.

play38:13

So coefficient 1 controls the internal coupling

play38:17

of the system, A. Coefficient B controls

play38:21

the coupling sensitivity among networks nodes.

play38:24

So it actually accounts for internal interactions

play38:29

and interventions.

play38:32

And coefficient C regulates the external inputs.

play38:36

This framework is actually a graphical model

play38:41

that is implemented by ODEs.

play38:43

So you can put these things together

play38:45

to actually create this system.

play38:47

They allow for feedback, as opposed

play38:51

to their kind of Bayesian network architectures

play38:55

that you can actually receive.

play38:57

Now, if we look at the liquid neural networks,

play39:01

or the representation that we gain from that representation,

play39:07

under two conditions, that f is C1 mapping--

play39:11

that means like f is Lipschitz-continuous,

play39:13

basically, and is bounded--

play39:15

I didn't write the bounded, no? no,

play39:18

I didn't write that, so it has to be, also, bounded--

play39:21

and tau is positive.

play39:23

And if you have a strictly positive tau,

play39:26

then this network would also have a unique solution.

play39:31

Now, let's say I assume that this f, the nonlinearity,

play39:37

is given by a tangent hyperbolic.

play39:39

It has recurrent connections.

play39:40

And it has weights like an input mapping.

play39:47

And then, with this nonlinearity,

play39:50

I would be able to compute the coefficients.

play39:52

If you look at the coefficients for causal models,

play39:55

we can compute the coefficients of this causal behavior.

play40:00

So that means there are certain parameters of the system that

play40:06

are responsible for a certain type of intervention

play40:09

in the system-- internal intervention

play40:11

and external intervention in the system.

play40:15

Just from the diagram perspective--

play40:18

going back to our diagram--

play40:19

we will actually have a dynamic causal model

play40:21

that can have the parameter B that

play40:25

controls the amount of collaboration

play40:28

of two nodes with each other, or interactions of two nodes,

play40:32

and coefficient C that controls the inputs, or external inputs,

play40:37

to the system.

play40:38

You would have the same type of behavior--

play40:42

it's a nonlinear version of that dynamic causal model--

play40:45

that actually performs the same thing.

play40:46

And they have more sophisticated causal structures.

play40:51

Now, with that, we did some experiments.

play40:55

They are behavioral cloning kind of experiments

play40:58

where we have drone agents that are moving in the environment.

play41:03

And they are given--

play41:05

visually, there is actually a target in the environment.

play41:08

And we ask the drones-- so actually, we

play41:10

drive the drones towards that target.

play41:12

And with this visual demonstration,

play41:14

what we want to do, we want to learn this behavior and gain

play41:19

agents that are good in closed loop when they're interacting

play41:22

with the environment.

play41:23

We see that this is actually a learned behavior

play41:25

of this system, where as soon as the target becomes apparent,

play41:31

then we see that this neural network actually learned

play41:35

to focus on that target.

play41:37

Because that's the kind of important matter

play41:41

in this kind of task process.

play41:43

So basically, the causal structure of the task

play41:45

is learned by these drone agent.

play41:49

Now, if you compare the kind of focus, or attention,

play41:53

of these networks to other neural networks,

play41:56

we see that the only representation that, actually,

play41:58

we see this type of process is actually

play42:01

the liquid network-based solutions, where

play42:05

this attention is not persistent in the other ones.

play42:09

So we cannot say that the other systems actually learned

play42:12

to navigate towards the target and understood what they were

play42:16

doing.

play42:19

We also did that in multi-agent.

play42:21

Right now, you're a follower drone.

play42:23

And there is a leader drone in front of it.

play42:26

And the target is basically to follow this drone.

play42:32

And in this type of environment, also, we

play42:36

observe that the attention of the network

play42:38

is, actually, always on the second drone, basically.

play42:44

So that means the causal structure is actually captured.

play42:48

Now, how you can show this even more quantitatively?

play42:53

Then we looked into close form interaction.

play42:56

We trained these networks in open loop

play42:58

and from training data.

play43:00

Now, we deploy them, actually, in that environment.

play43:03

And we measure the amount of success rate

play43:05

that they can have in different type of tasks in closed loop.

play43:09

So if they do not have their true causal structure

play43:12

of the task, they wouldn't be able to perform this task very

play43:15

well.

play43:16

And we did across different kind of spectrum of perturbations

play43:22

on the system.

play43:23

We see that the systems are being

play43:25

able to perform much better than the other ones.

play43:27

Of course, there are always room for improvement, even

play43:31

for these systems.

play43:32

Because we didn't add any kind of constraint

play43:34

on helping these systems to learn more and more.

play43:37

So we were just trying to see what's

play43:41

the gap between these type of networks and the others.

play43:44

So obviously, these type of networks

play43:46

come with certain limitations.

play43:50

So the complexity of the networks

play43:52

are basically tied to the complexity of their ODE solver.

play43:55

So as a result, you might have longer training times

play43:58

and longer test time if you use these networks.

play44:01

You can have a solution for that.

play44:03

You can use the fixed-step ODE solvers.

play44:05

You can use the sparse flows.

play44:08

You can use a sparsity--

play44:10

and the process that optimizes sparse neural networks-- on,

play44:14

let's say, CPUs or any kind of hardware

play44:17

that you're running or GPUs.

play44:19

And then you can use hypersolvers.

play44:22

And these are the class of solvers where they can actually

play44:26

integrate everything together, and they can actually

play44:28

run much faster when you have differential equations.

play44:33

You can also use closed-form variants

play44:35

in these kind of scenarios.

play44:37

So you can use the closed form--

play44:38

if you solve these differential equations as closed form,

play44:41

then you can end up with a nicer presentation.

play44:43

And that's one of the things that we did

play44:45

and we're very excited about.

play44:49

So there's another limitation that this ODE-based network.

play44:53

They might also express vanishing gradient problem.

play44:55

Because they're continuous systems, and their memory

play44:59

is given by an exponential decay.

play45:01

So then, you would face learning long-term dependencies.

play45:05

So the solution is that you wrap it inside a well-behaved kind

play45:09

of process-- for example, a gating mechanism that you can

play45:11

actually put these networks together--

play45:15

for example, if you have the state of an LSTM network

play45:17

defined by an LTC network.

play45:22

So if you do that, then you would have gating mechanism,

play45:26

and you have a gradient propagation

play45:29

preserve the gradients.

play45:32

Now, in summary, what I showed you

play45:36

I showed you that you can acquire knowledge

play45:38

by these flexible neural models that can

play45:41

perform inference model-free.

play45:44

They can really capture the temporal aspects of the task

play45:49

that is at hand better than--

play45:53

the tasks that require temporal kind of data processing,

play45:57

they can actually infer the--

play45:58

and these are all thanks to their causal structure.

play46:03

And they would be able to perform credit assignment

play46:07

better than the other models that are out there.

play46:10

So you might use them for generative modeling.

play46:14

And if you want to model the world,

play46:17

you basically can use these representations

play46:19

or also get representation of your world

play46:22

in order to do further inference from those kind of models.

play46:27

So there are certain properties that I mentioned--

play46:29

the compositionality of, layer-wise, these networks,

play46:32

you can actually put them in different architectures.

play46:35

And you can connect them in a sparse fashion.

play46:38

And the network is actually differentiable.

play46:40

And you can use this.

play46:42

And if you're dealing with visual data or video data,

play46:49

it would be adding CNN heads or perception modules.

play46:54

And then this can act as your decision-making engine.

play46:57

They're expressive, they're causal,

play46:59

and they add more into interpretability

play47:02

of the networks.

play47:03

So some of the perspectives that we have is that there is--

play47:08

I just put two different hundred-years-old models

play47:14

together, and this is all kind of properties that emerge

play47:18

from those kind of things.

play47:20

And you can see how much potential is actually

play47:22

in this type of research that you can put,

play47:25

and you can really explore what's going on in the brain.

play47:28

And why do you need to do that?

play47:30

Because, basically, the research space

play47:32

is huge if you just want to algorithmically implement

play47:35

something intelligence, right?

play47:37

So you would narrow down if you actually focus on brains

play47:40

and how they acquire knowledge.

play47:43

And definitely, because we have these machine learning tools

play47:46

these days, you would be able to actually do much more

play47:50

than it was possible before.

play47:54

We can also work with the objective functions.

play47:56

In this talk, in this research that I showed,

play48:00

we just focused on the model and the properties of the model

play48:03

in a structured fashion.

play48:05

So you can also work with the objective function

play48:07

of your learning problem.

play48:09

You can also, for learning processes,

play48:12

you can use physics-informed kind

play48:15

of learning processes in order to perform

play48:18

this type of learning.

play48:19

You can do causal entropic forces, for example.

play48:23

This is like defining intelligence

play48:25

as a force that maximizes the future freedom of action.

play48:31

So that would be a new way of formulating intelligence.

play48:34

And then, from there, you would be able to actually get

play48:38

into much more.

play48:39

So this is actually an exciting area of research

play48:42

that could be enabled and scaled by what we showed today.

play48:51

And as I said, one of the properties that we showed today

play48:54

is that there are certain structures that can emerge

play48:59

from these liquid networks.

play49:02

And those structures are good.

play49:04

So you would be able to use these for more complex tasks.

play49:08

So these are good candidates--

play49:10

this could be giving you some candidates

play49:13

for performing decision-making, better decision-making, based

play49:17

on these selective computations.

play49:21

With that, I would like to thank you for your attention.

play49:23

And all this technology is open source.

play49:26

You can actually get them online.

Rate This

5.0 / 5 (0 votes)

Related Tags
Neural NetworksAI InnovationMachine LearningNeuroscienceIntelligenceCausal ModelsAutonomous SystemsData ProcessingContinuous TimeDynamic CausalityExplainable AI