DiffDock

SBGrid Consortium
17 Nov 202343:41

Summary

TLDRThe SBGrid YouTube channel hosts a webinar series where experts discuss advanced topics in structural biology. In a recent webinar, Gabriele Corso, Hannes Stark, and Bowen Jing presented DiffDock, a deep learning approach for small molecule docking. They explained that traditional docking methods can be time-consuming and sensitive to inaccuracies in protein structures, particularly with computationally generated structures like those from AlphaFold. DiffDock addresses these challenges by using a generative model based on diffusion models, which are adept at handling complex probability distributions. The model predicts the 3D coordinates of a small molecule's atoms relative to a protein without prior knowledge of the binding pocket. It samples from a noisy distribution and progressively refines the pose towards the true binding pose. The DiffDock model has shown promising results, outperforming traditional methods, especially when docking to predicted structures. The webinar also touched on the upcoming DiffDock-pocket, which enhances the original DiffDock by allowing for control over a specific binding pocket and predicting side chain rearrangements. The presenters concluded with a Q&A session where they discussed the potential for local refinement, the incorporation of reliability information into the DiffDock process, and the practical aspects of using DiffDock, including its speed and memory requirements.

Takeaways

  • 🎓 **SBGrid Webinar Series**: The video is part of a webinar series by SBGrid, focusing on software tutorials, lectures by structural biologists, and unique content related to structural biology and computational methods.
  • 📅 **Upcoming Talks**: The channel has scheduled talks on 'DeepFoldRNA' by Robin Pearce and 'DIALS' by Graham Winter from Diamond Light Source, indicating a commitment to continuous learning and updates in the field.
  • 🤖 **DiffDock Presentation**: Gabriele Corso, Hannes Stark, and Bowen Jing present DiffDock, a deep learning approach for small molecule docking that predicts 3D coordinates of molecules in relation to a protein structure.
  • 🧠 **Blind Docking**: DiffDock performs blind docking, considering the entire protein structure without prior knowledge of the binding pocket, which is a more challenging task compared to pocket-level docking.
  • 🔍 **Methodology**: The method uses a generative modeling approach with diffusion models to handle the large search space and uncertainty in docking, as opposed to traditional regression-based deep learning methods.
  • 📈 **Performance**: DiffDock demonstrates higher performance in docking tasks, especially when dealing with predicted protein structures like those from AlphaFold, where traditional methods struggle due to inaccuracies.
  • 🔧 **Practical Usage**: The tool is designed to be used in practice with inputs including protein structures and small molecules, providing multiple candidate outputs with scores for further analysis.
  • 🔗 **GitHub and Colab**: Detailed instructions, models, and Colab notebooks for DiffDock are available on GitHub, facilitating easy access and use for the scientific community.
  • 🔄 **DiffDock-Pocket**: An upcoming tool called DiffDock-pocket aims to address the limitations of controlling for specific binding pockets and predicting side chain rearrangements upon binding.
  • ⚙️ **Technical Aspects**: The generative model operates on a non-Euclidean manifold space defined by accessible ligand poses through torsion angle adjustments, which is a key technical detail of the DiffDock approach.
  • 🚀 **Future Research**: The presenters discuss the potential for incorporating prior knowledge into diffusion sampling processes and the active research in this area, suggesting opportunities for further development and improvement of the tool.

Q & A

  • What is the primary focus of the DiffDock approach?

    -DiffDock is a method for small molecule docking using deep learning approaches. It focuses on blind docking, where the entire protein structure is considered to find the binding site of a small molecule, rather than focusing on a known pocket.

  • How does DiffDock handle the uncertainty in the docking task?

    -DiffDock uses a generative modeling approach with diffusion models to handle the uncertainty in the docking task. It aims to populate all possible modes, accounting for both aleatoric uncertainty (multiple poses) and epistemic uncertainty (model indecision).

  • What are the inputs and outputs of the DiffDock tool?

    -The input to DiffDock is the 3D structure of a protein and the 2D chemical graph of a small molecule. The output is the 3D coordinates of every atom of the small molecule, along with scores for multiple candidate poses.

  • How does DiffDock differ from traditional docking methods?

    -Traditional docking methods use a scoring function and a search algorithm to find the minimum energy conformation of the ligand with respect to the protein. DiffDock, on the other hand, uses deep learning to predict the binding pose directly, without relying on an energy function.

  • What are the advantages of using DiffDock over traditional docking methods?

    -DiffDock can handle large search spaces more efficiently than traditional methods, is less sensitive to inaccuracies in the protein structure, and can deal better with scenarios where the binding pocket is not already known.

  • How does the generative model in DiffDock work?

    -The generative model in DiffDock uses diffusion models to gradually remove noise from an initial, randomly positioned ligand pose. It predicts vectors for translation, rotation, and torsional adjustments to iteratively refine the pose towards the true binding pose.

  • What is the role of the confidence model in DiffDock?

    -The confidence model in DiffDock is used to rank the generated poses. It is trained to classify poses as being within two angstroms RMSD of the ground truth pose or not, helping to select the most accurate poses for further analysis.

  • How does DiffDock handle the issue of steric clashes?

    -DiffDock does not explicitly handle steric clashes during its training or generative process. It focuses on achieving a geometrically close pose to the ground truth without considering whether the pose clashes with the protein side chains.

  • What are the computational requirements for running DiffDock?

    -DiffDock is designed to run on GPU and can produce samples in either 10 or 40 seconds per run, depending on the number of samples taken. The speed can be adjusted by changing the number of samples and the batch size.

  • How does DiffDock perform on predicted protein structures, such as those from AlphaFold?

    -DiffDock retains a good level of performance on predicted structures like those from AlphaFold, even when the side chains are not accurate, which is a challenge for traditional docking methods.

  • What are the potential applications of DiffDock?

    -DiffDock can be used for blind docking to discover new binding sites, for virtual screening of potential drug candidates, and for understanding the mechanism of action of new drugs by reverse screening on specific pathways.

  • What is the DiffDock-pocket and how does it improve upon the original DiffDock?

    -DiffDock-pocket is a follow-up work that addresses the ability to control for a specific binding pocket and predict the rearrangement of side chains upon binding. It introduces pocket conditioning and side chain torsional flexibility into the diffusion process, improving performance on these tasks.

Outlines

00:00

😀 Introduction to SBGrid Webinar Series

The first paragraph introduces the SBGrid YouTube channel, which features software tutorials, lectures, and unique content for structural biologists. The host announces upcoming webinars with Robin Pearce discussing DeepFoldRNA and Graham Winter from Diamond Light Source talking about DIALS. The current webinar features a group presentation by Gabriele Corso, Hannes Stark, and Bowen Jing on DiffDock, a deep learning approach for small molecule docking. The host encourages audience interaction and questions.

05:01

🔬 DiffDock: An Overview and Methodology

The second paragraph delves into the specifics of DiffDock, a tool for small molecule docking using deep learning. The presenters, Hannes, Gabriele, and Bowen, explain the task at hand, which involves predicting the 3D coordinates of a small molecule's atoms in relation to a protein's 3D structure. They discuss the limitations of traditional docking methods and the potential of deep learning to overcome these challenges. The paragraph also touches on the generative modeling approach that DiffDock employs, contrasting it with regression-based methods.

10:02

🧬 Generative Modeling and Diffusion Models in DiffDock

The third paragraph focuses on the generative model mechanics within DiffDock, specifically the use of diffusion models. These models have been successful in various fields, including AI-generated art. The explanation covers how diffusion models add noise to data and learn to remove it, using a neural network to approximate a complex function. The generative process is outlined, detailing how the model operates on the space of ligand poses, using chemically consistent noise and training to remove torsional, positional, and orientational noise.

15:03

📈 Results and Performance of DiffDock

The fourth paragraph presents the results and performance benchmarks of DiffDock. The tool is trained on PDBBind, a standard benchmark with high-quality structures. The performance is evaluated on both hollow and predicted structures, with DiffDock showing higher reliability and performance on the latter. The paragraph also highlights the successful application of DiffDock in reverse screening by Tim Peterson's group and provides a summary of how to use the tool, including accessing models and notebooks on GitHub.

20:06

🔍 DiffDock-Pocket: Enhancements and Follow-up Work

The fifth paragraph introduces DiffDock-pocket, an enhancement to address specific binding pockets and predict side chain rearrangements upon binding. It discusses the improvements made to the original DiffDock, including pocket conditioning and side chain torsional flexibility. The performance of DiffDock-pocket is compared to traditional methods, showing significant advantages in certain scenarios. The paragraph also outlines the process of using the tool, from input to output, and the incorporation of a confidence model for pose selection.

25:08

🚀 DiffDock's Sampling Process and Future Research

The sixth paragraph discusses the sampling process in DiffDock, emphasizing that it is data-driven and not based on physics. It addresses the possibility of incorporating reliability information about the protein structure into DiffDock and suggests that manual adjustments can be made during inference. The paragraph also explores the idea of using prior information, such as multiple structure predictions, to guide the docking process. It concludes with a discussion on the general benchmarks for speed and the potential for future research in this area.

30:13

🤖 DiffDock's Scoring Function and Its Implications

The seventh paragraph compares the scoring function of DiffDock, particularly the confidence model, with traditional scoring functions. It explains that the confidence model is trained to select poses within two angstroms RSD of the ground truth, which contrasts with traditional scoring functions that consider interatomic distances and steric clashes. The discussion highlights the advantages of DiffDock's scoring function in terms of simplicity and avoidance of local minima. It also touches on the potential for handling larger ligands and conformational spaces.

35:14

🎨 Analogies Between DiffDock and AI Image Generation Tools

The eighth paragraph draws parallels between DiffDock and AI image generation tools, noting that the same underlying model is used and that concepts from image diffusion have analogs in molecular diffusion. It suggests that intuitions developed from using image generation tools could transfer to DiffDock, although specific adjustments like sampler steps or batch size may vary. The paragraph concludes with a note on the potential for further exploration and research in these areas.

40:18

📝 Final Questions and Closing Remarks

The ninth and final paragraph wraps up the discussion with final questions from the audience. The presenters address questions about the size of ligands that can be sampled with DiffDock and the potential for exploring larger conformational spaces. They also discuss the limitations when predicting side chain flexibility and the advantages of the diffusion approach over traditional methods. The session concludes with thanks to the presenters and the audience for their participation.

Mindmap

Keywords

💡DiffDock

DiffDock is a deep learning approach for small molecule docking, which predicts the 3D coordinates of a small molecule in relation to a protein structure. It is designed to handle blind docking scenarios where the binding pocket is not pre-defined. In the video, DiffDock is presented as a solution that outperforms traditional docking methods, especially when dealing with large search spaces or computationally generated structures like those from AlphaFold.

💡DeepFoldRNA

DeepFoldRNA is mentioned as a topic for an upcoming webinar, indicating it is a relevant and advanced method or tool within the field of structural biology. Although not the main focus of the current video, it suggests the channel's content is cutting-edge and involves various advanced techniques in the study of biological structures.

💡DIALS

DIALS is another topic that will be discussed in a future webinar, suggesting it is a significant tool or method within the field. The mention of DIALS implies the channel offers a deep dive into various specialized areas of structural biology and related computational techniques.

💡Small molecule docking

Small molecule docking is the process of determining how small molecules, such as drugs or ligands, bind to a target protein. It is a crucial step in drug discovery. In the context of the video, it is the primary task that DiffDock aims to improve through the use of deep learning, as opposed to traditional, search-based docking methods.

💡Blind docking

Blind docking refers to the scenario where the algorithm attempts to find the binding site of a small molecule on a protein without prior knowledge of the location of the binding pocket. This is in contrast to pocket-level docking where the search is restricted to a known binding region. The video emphasizes that DiffDock is effective for blind docking, which is more challenging than pocket-level docking.

💡Deep learning

Deep learning is a subset of machine learning that uses neural networks with multiple layers to analyze and learn from data. In the video, deep learning is harnessed by DiffDock to predict the binding positions of small molecules to proteins. It is a key technology that enables DiffDock to potentially overcome limitations of traditional docking methods.

💡Generative modeling

Generative modeling is a type of machine learning that generates new data samples that are similar to the training data. In the context of DiffDock, generative modeling is used to sample possible poses of a ligand and iteratively refine them to find the most probable binding pose. This approach is contrasted with regression-based models, which the video suggests do not perform as well for the docking task.

💡Diffusion models

Diffusion models are a class of generative models that work by gradually transforming a complex data distribution into a simpler one—often a Gaussian distribution—through a learned process. In the video, DiffDock utilizes diffusion models to navigate the space of possible ligand poses and to predict the binding pose by reversing the diffusion process.

💡PDBBind

PDBBind is a benchmark dataset used for evaluating the performance of protein-ligand docking algorithms. It consists of a curated set of protein-ligand complexes extracted from the Protein Data Bank (PDB). In the video, PDBBind is used to train and test the DiffDock model, demonstrating its effectiveness against traditional docking methods.

💡ESMFold

ESMFold is a protein structure prediction tool based on deep learning. It is mentioned in the video as a method to generate protein structures when docking to predicted structures, rather than using the crystallographic or NMR-derived structures. This highlights the potential of using computationally generated structures in the docking process.

💡AlphaFold

AlphaFold is a cutting-edge deep learning algorithm developed by DeepMind that predicts the 3D structure of proteins from their amino acid sequences. The video discusses the use of AlphaFold-generated structures in the context of docking with DiffDock, emphasizing the robustness of DiffDock even when dealing with computationally generated protein structures.

Highlights

SBGrid YouTube channel hosts software tutorials and lectures by structural biologists.

Upcoming webinars feature Robin Pearce discussing DeepFoldRNA and Graham Winter talking about DIALS.

Gabriele Corso, Hannes Stark, and Bowen Jing present DiffDock, a deep learning approach for small molecule docking.

DiffDock is designed for blind docking, using the entire protein structure to find small molecule binding sites.

The output of DiffDock includes 3D coordinates of every atom of the small molecule and scores for multiple candidates.

Traditional docking methods are compared with deep learning approaches, highlighting the challenges in large search spaces.

DiffDock addresses the sensitivity issues related to inaccuracies in protein structures, such as those generated by AlphaFold.

The generative modeling approach is introduced as a solution for the docking problem, as opposed to regression-based methods.

DiffDock utilizes diffusion models to sample from complex probability distributions, inspired by molecular structures.

The generative model for molecular poses is constructed by adding chemically consistent noise and training a model to remove it.

DiffDock can sample multiple poses and is capable of identifying multiple binding modes, unlike regression models.

The model is not trained with any notion of steric clash, focusing solely on geometric accuracy.

DiffDock is tested on PDBBind, a standard benchmark containing about 19,000 high-quality structures from PDB.

Results show DiffDock outperforms traditional methods, especially when docking to predicted structures like those from AlphaFold.

DiffDock has been successfully used in reverse screening to understand the mechanism of action of new drugs.

DiffDock-pocket, an upcoming tool, aims to address the limitations of controlling for specific binding pockets and predicting side chain rearrangements.

DiffDock-pocket shows improved performance in predicting correct side chain rearrangements compared to traditional methods.

The scoring function of DiffDock is trained to select poses within two angstroms RMSD of the ground truth, differing from traditional scoring functions.

DiffDock's smoothed energy surface allows for exploration of larger conformational spaces and more rotatable bonds.

The principles and intuitions from AI-generated image tools can transfer to DiffDock, as they are based on the same type of model.

Transcripts

play00:00

Welcome to the SBGrid YouTube channel,

play00:03

software tutorials by developers,

play00:06

lectures by structural biologists, unique content

play00:09

brought to you by SBGrid.

play00:16

[MUSIC PLAYING]

play00:23

Hello, everybody.

play00:24

Welcome to the SBGrid webinar series continuing next week

play00:30

we're going to be-- oh, next month on December 12th,

play00:33

we're going to have Robin Pearce talking about DeepFoldRNA

play00:37

and then in January, Graham Winter from Diamond Light

play00:40

Source is going to be joining us to talk about the DIALS.

play00:43

And today we have a group presentation

play00:47

from Gabriele Corso, Hannes Stark, and Bowen Jing.

play00:51

They're here to talk to us about DiffDock,

play00:53

which is an approach for using some deep learning

play00:56

approaches for small molecule docking,

play00:58

and they will explain it much better than I will.

play01:01

So I'm happy to hand it over to them.

play01:03

If you have questions, feel free to use the Q&A function

play01:06

or send messages to one of the hosts in the chat

play01:09

and we'll moderate until the end.

play01:11

And with that, Gabriele, Hennes, and Bowen,

play01:14

thank you for joining us again, and take it away.

play01:18

Excellent.

play01:19

Thank you very much for the nice introduction.

play01:22

So we'll be talking about DiffDock here with--

play01:24

and I'm the Hannes guy.

play01:26

This is Grabriele and this Bowen.

play01:29

So first we get a little bit into very concretely

play01:34

what is the task that we're considering here,

play01:37

input, output.

play01:38

And then we'll get a little bit into how the method works,

play01:41

and then we'll get into some results

play01:43

and into how to use it in practice.

play01:46

OK, then let's get started because yeah,

play01:50

the inputs and outputs are very simple.

play01:53

As input, we have the the 3D structure of a protein

play01:57

and the 2D chemical graph of a small molecule.

play02:01

So of the small molecule, we do not know the 3D structure yet

play02:05

and we do not know the 3D structure with respect

play02:07

to the protein.

play02:09

And for the protein, we have the whole protein as input,

play02:13

and we're not considering the pocket level

play02:16

docking scenario, where we maybe have

play02:18

a bounding box of some pocket that we already know.

play02:20

Now instead, we're doing blind docking,

play02:22

where we have the whole protein and want to find out

play02:25

where the small molecule binds.

play02:27

And the output of the tool is the 3D coordinates,

play02:32

the 3D coordinates of every single atom

play02:36

of the small molecule.

play02:39

OK.

play02:39

And there will then also be some further outputs

play02:44

because we can produce multiple candidates,

play02:46

and we also have a score for all the candidates,

play02:50

but we'll get into that later.

play02:52

But then we wanted to motivate this a little bit.

play02:56

So what do we traditionally do with our usual docking methods,

play03:02

and why now do this with deep learning?

play03:05

Well, the traditional deep learning traditional docking

play03:09

methods, they're based on a scoring function that

play03:13

ranks every single conformer that ranks every single 3D

play03:18

position that the ligand can take

play03:20

with respect to the protein.

play03:21

And we have this energy function, this score function,

play03:24

and then we use a search algorithm

play03:26

to search over to find the minimum of this energy

play03:30

function.

play03:31

But of course, if we have a very large search

play03:35

space of blind docking and we don't already know the pocket,

play03:39

then this can be quite--

play03:42

it can take a long time to find the minimum.

play03:45

And another issue is the sensitivity

play03:48

to slight inaccuracies in the protein structure,

play03:52

such as if we, for example, have computationally

play03:56

generated structures where maybe a side chain might

play04:00

be a little bit off.

play04:01

And there has been evidence in some papers

play04:04

that when we're talking to computationally generate

play04:09

AlphaFold structures, for example, then

play04:11

these classical methods, they struggle with this a bit.

play04:16

And with that, we now have our question of,

play04:19

what can we do with deep learning for docking?

play04:22

And for that, so far we've seen the regression-based

play04:26

approaches, say, where the deep learning method would

play04:30

have some graph neural network, where the nodes of the protein

play04:36

are given by the protein residues,

play04:38

the nodes of the small molecule are given by its atoms,

play04:42

and they are also associated with locations.

play04:45

And then we would do some message passing.

play04:48

For example, this regression approach,

play04:50

it would make its prediction by predicting key points

play04:54

for the protein, where the model thinks

play05:00

that small molecule should bind, or where the model think

play05:03

that the pocket is, and it would predict key points,

play05:07

like interaction points for the small molecule.

play05:10

And then we would calculate the translation and the rotation

play05:14

to optimally align those key points

play05:17

and apply the same transformation

play05:19

to the small molecule to end up with the final location

play05:23

prediction.

play05:24

But these types of regression approaches,

play05:28

they did not meaningfully improve the performance

play05:32

that we were able to achieve compared to traditional docking

play05:38

methods, where here in red we're showing deep learning methods,

play05:42

and in blue we're showing traditional search

play05:47

based methods, traditional search based methods.

play05:51

Yeah.

play05:52

And now we argue that--

play05:57

we have this little summary here where we argue.

play06:00

In our search based method, we have our energy function.

play06:07

We learned this ground truth energy function [INAUDIBLE]..

play06:11

We learned the scoring function.

play06:13

And then we use a search algorithm.

play06:16

We start somewhere at a random location,

play06:18

and then we use our search algorithm

play06:21

to find the modes of this energy function,

play06:25

or the minimum of this energy function.

play06:27

But we might very easily get stuck in local minima.

play06:31

Meanwhile, if we have our deep learning regression

play06:34

approaches--

play06:36

yeah, I should also mention here,

play06:38

in green in this visualization, we

play06:41

would have the ground truth what we want to predict,

play06:44

and in yellow, we have what we get as output with the method.

play06:50

So this is for the traditional search based methods.

play06:54

And then here we have our regression based regression

play06:58

based deep learning methods, where we have this distribution

play07:03

which, during training, we sample our data from,

play07:06

and then we make our prediction with our model.

play07:10

And our prediction tries to minimize the mean square error.

play07:15

And then the best it can do if it makes a single prediction

play07:19

to minimize the mean square error to samples

play07:21

from this distribution is to put its prediction at the mean,

play07:25

but this is actually not what we're interested in here.

play07:29

We're interested in the sample from the modes,

play07:31

the global mode, and this is why we argue another approach.

play07:36

Generative modeling should be the approach taken

play07:41

for the docking problem, and that's

play07:42

what Gabriele will talk about now.

play07:47

OK.

play07:47

So Hennes has motivated kind of why traditional docking

play07:54

methods really struggle with a very large search space.

play07:57

Why the previous deep learning methods based on regression

play08:02

also struggle in this task.

play08:05

And to give another kind of intuition

play08:07

for why this is the case, what's particularly

play08:10

hard about docking.

play08:12

And what's particularly hard is that there is

play08:15

a lot uncertainty in the task.

play08:16

And this is both aleatoric, which

play08:18

just means that there might be multiple poses,

play08:21

and epistemic, which basically means

play08:23

that the model will be undecided between multiple poses.

play08:27

And if we have regression models as I'll show you

play08:30

on the next slide, we're going to get some kind of mean that's

play08:34

not very useful, while with our generative model,

play08:39

we'll try to populate all these modes.

play08:41

And let me give you a couple of concrete examples.

play08:44

So here we have this protein in gray.

play08:47

This is actually a drug target against malaria.

play08:50

And you can see in green this chromo inhibitor.

play08:55

The dots in these two sites in the proteins.

play08:59

And if we run one of the regression models,

play09:01

we obtain a prediction that is right in the middle.

play09:03

This is clearly not a useful prediction,

play09:07

while with the generative model that will show,

play09:10

we are actually able to sample both modes.

play09:12

, Similarly we have here another complex where instead we have

play09:18

a single true docking pose.

play09:24

And, however, still these regression

play09:28

methods really struggle, either putting

play09:32

large part of the ligand in steric clash with the protein

play09:36

or having this completely unphysical conformation,

play09:42

and instead we will see that we are actually

play09:45

able to sample relatively accurately the pose

play09:49

with the generative model.

play09:50

And Bowen will kind introduce how we're actually constructing

play09:55

this generative model.

play09:57

OK.

play10:00

All right.

play10:02

All right, so I'll briefly talk about the mechanics

play10:04

of the model and how we actually have this generative model

play10:07

for molecular poses.

play10:10

Now there are many different classes

play10:13

of generative models in deep learning,

play10:14

and what we're going to use is diffusion models.

play10:17

These have recently been quite famous.

play10:20

If you've heard in the news about AI generated

play10:23

art or photorealistic imagery, this all came from diffusion

play10:26

models which are very good at modeling very

play10:28

complex probability distributions,

play10:30

which makes them very well suited for molecular structures

play10:33

as well, and it's this aspect that we leverage

play10:36

in developing DiffDock Now I want

play10:40

to briefly outline how these models work just

play10:42

to ground everyone in some common language.

play10:46

You've probably heard people say that diffusion models add noise

play10:49

to the data and then learn to remove noise.

play10:51

So what that looks like is, you have your data

play10:55

and you can imagine some kind of diffusion process happening

play10:57

here.

play10:58

So you can imagine red is like concentration

play11:00

of the data at a particular point in space,

play11:02

and this diffusion process is adding noise.

play11:05

And then with your neural network,

play11:06

you learn a vector field that points

play11:08

in the direction of higher concentration or probability

play11:11

density.

play11:12

And this, intuitively, is the part of the diffusion framework

play11:15

that corresponds to removing noise.

play11:17

And this vector field is going to be

play11:19

evolving over time, right?

play11:20

So it's a pretty complex function

play11:22

that we're going to approximate with the neural network called

play11:24

the score function.

play11:25

And then what you do at inference time

play11:27

is you draw random samples from your initial noisy distribution

play11:30

and follow the neural network as it tells you

play11:33

how to remove noise by following this vector

play11:35

field into eventually you get to the data distribution.

play11:38

Hopefully the visualization is clear

play11:40

even though the arrows are kind of small.

play11:43

So now this right here is just a toy diffusion on a 2D space.

play11:48

What we're going to want to do in DiffDock is think

play11:50

about how this generalizes to the space of ligand poses,

play11:54

right?

play11:55

So just to emphasize that even though we're

play11:57

talking about diffusion with very

play11:59

similar mathematical formalism to physical or chemical

play12:01

diffusion, this is really diffusion over the space

play12:05

that the data distribution lives in, right?

play12:07

So in the case of ligand poses, we're

play12:10

going to want to think about what that space is

play12:12

and how to diffuse over that.

play12:14

So the space that we're actually going to look at

play12:18

is actually quite inspired by the way

play12:22

that traditional docking methods have

play12:24

thought about the space of ligand poses.

play12:26

So in GLIDE or Vina you're probably

play12:30

familiar that you provide a conformer

play12:33

of the ligand as input, and then what GLIDE or Vina will do

play12:36

is it will move this ligand around with rigid body motions

play12:40

and update its torsion angles, but it will not

play12:43

disrupt the bond lengths, bond angles and ring

play12:45

structures of the ligand.

play12:47

So we're going to take away from this paradigm is

play12:50

that the space of ligand poses is actually

play12:53

this non-Euclidean manifold that is described

play12:58

by the space of poses accessible by twisting these torsion

play13:02

angles and moving around the ligand.

play13:04

So this is going to be the space of ligand poses

play13:07

that takes the place of that 2D toy Gaussian example

play13:12

that we saw on the previous slide

play13:13

when it comes to thinking about diffusion models.

play13:15

So the diffusion that we're going to want to construct,

play13:17

the kind of noise that we're going to want to add,

play13:20

is going to be this kind of like chemically consistent noise

play13:24

with the initial conformer.

play13:26

So the noise is just going to be in the ligand structure,

play13:30

rather in the ligand torsion angles and its rigid body

play13:33

motion.

play13:34

And when we talk about doing diffusion

play13:35

over this space, what we're going to do

play13:37

is train a model that removes noise of this kind.

play13:41

This is going to be a model that removes

play13:43

torsional noise, positional noise,

play13:45

and orientational noise from a randomly seated initial ligand

play13:49

pose in order to move it towards the distribution

play13:52

of the true pose.

play13:55

And so I will skip these technical details here,

play13:58

but what that really looks like is shown in the bottom

play14:02

left corner here.

play14:04

So, recall in the earlier visualization

play14:06

that we had a neural network that

play14:08

tried to learn a vector field on this 2D space

play14:11

so that it could point towards the direction of lower

play14:13

noise in the direction of higher probability density.

play14:17

That's exactly what's going to be happening in the score model

play14:20

that we have DiffDock except that the noise is torsional,

play14:24

orientational, and positional.

play14:26

So what that means is that the score model is going

play14:28

to look at the current ligand pose,

play14:30

and when I say current what I mean here

play14:32

is because diffusion is an iterative generative process,

play14:35

so the input the beginning of a diffusion generative process

play14:39

is going to be some randomly positioned ligand

play14:42

pose that you're going to just initialize the diffusion

play14:45

process with.

play14:46

And the assumption is that this random pose has a lot of noise,

play14:50

and the model the generative model

play14:54

is going to be progressively removing

play14:56

that noise one step at a time.

play14:58

And there are three kinds of noise, positional noise,

play15:00

translational noise, and torsional noise.

play15:02

So the score model is going to predict a vector,

play15:07

this brown vector here.

play15:08

So that's kind of like the direction of removing

play15:10

translational noise.

play15:11

You can think of that as like a linear velocity

play15:13

or linear momentum.

play15:14

We're going to have a rotation vector, which is

play15:17

removing orientational noise.

play15:18

This is like an angular momentum or a angular velocity.

play15:21

And then for each torsion angle we're

play15:24

also going to predict a quantity that tells us

play15:26

how quickly to twist that particular rotatable bond

play15:31

and in which direction in order to make the pose look less

play15:34

noisy.

play15:34

And this is, I guess you can say,

play15:37

it's like an angular velocity around that particular torsion

play15:39

angle.

play15:41

And all of this is done with a particular kind of message

play15:44

passing neural network, which I will not

play15:47

get into the details of, but the upshot of all this

play15:51

is that the generative process looks

play15:53

something like the following.

play15:54

So, again, similar to GLIDE or Vina or Autodock

play15:58

or any of these very well established docking tools,

play16:02

the input to our method is again going

play16:03

to be a conformer either from rdkit or maybe

play16:06

from the Cambridge crystallography database

play16:08

if you prefer that.

play16:09

And what our model will then do is

play16:11

it will first sample from the equivalent

play16:15

of that initial Gaussian distribution, which

play16:17

in this case means that we're going to randomly position

play16:20

the ligand relative to the protein.

play16:22

We're going to completely randomize its orientation

play16:25

and its torsion angles.

play16:26

And so the distribution of poses looks something

play16:29

like what you see at the bottom left here.

play16:31

And what this corresponds to is kind of like the noisiest state

play16:35

possible for the pose, and then we're

play16:36

going to just progressively use our model

play16:39

to figure out how to remove noise from the pose,

play16:42

both translationally or intentionally and

play16:45

conformationally by adjusting the torsion

play16:46

angles do this multiple times in practice about 20 times,

play16:50

and then all of the poses will hopefully

play16:52

move towards a low noise state which

play16:54

hopefully corresponds to them being in the binding pocket.

play16:58

Now I do want to emphasize that here we

play17:00

are showing independent samples and independent trajectories.

play17:04

So when these poses are moving relative to each other,

play17:09

they are not interacting with each other in any way

play17:11

whatsoever.

play17:12

This illustration is just showing multiple instantiations

play17:15

of that process.

play17:17

Now, of course, finally, you do want

play17:19

to have an ability to select from this distribution of poses

play17:22

a high ranking pose for downstream analysis,

play17:25

and we also provide a bespoke confidence model.

play17:28

This is trained as a under two Angstrom RMSD classifier

play17:31

and it will select out from the many samples

play17:36

that you can draw from the generative model which

play17:39

pose you would use for downstream analysis.

play17:42

Here is just kind of a final visualization of that.

play17:46

So what we have here at the beginning

play17:47

is a cloud of randomly initialized ligand poses,

play17:50

and then as you can see, they move

play17:52

towards the binding pocket.

play17:54

One thing that I do want to emphasize here

play17:56

that is quite interesting, and maybe this

play17:58

is a key point of difference between this iterative process

play18:01

and a traditional docking process,

play18:03

is that as you can see, during the course of this denoising

play18:07

the ligand oftentimes will pass straight through the protein.

play18:11

It will look like it passes through very

play18:13

energetically unfavorable regions of state space.

play18:16

Now this is a good thing because this

play18:18

is what allows us to actually reach

play18:20

the global minimum of this data driven energy

play18:23

function in the first place.

play18:25

Because if we otherwise had to deal with this rugged energy

play18:27

landscape, we would not get there.

play18:29

But the other aspect that I want to highlight

play18:31

is that DiffDock is not trained with any notion

play18:34

of steric clash.

play18:35

It is trained only with the objective

play18:38

of getting as geometrically as close to the ground

play18:41

truth pose as possible.

play18:43

So what this means in practice is

play18:45

that the output pose, for example,

play18:47

if you're doing cross docking or if you're

play18:49

talking to a AlphaFold structure or a structure

play18:52

where the side chains the wrong, DiffDock

play18:54

will put the poles in what it thinks is the right binding

play18:57

pocket but without any regard for whether

play19:00

or not it clashes with the side chain.

play19:01

So when you look at a DiffDock pose

play19:03

and evaluate whether or not you like this pose or not,

play19:06

the energy or the scoring function under a scoring

play19:09

function, for example, will probably not be the best metric

play19:12

that you will want to use.

play19:14

You will probably want to do some relaxation first,

play19:16

because DiffDock, while it will generally

play19:18

get the geometry of the pose right,

play19:20

it will not try to do anything about these energetics,

play19:23

so that's something to keep in mind.

play19:27

And then I will hand it over to Gabrie to talk about results.

play19:34

OK.

play19:35

And if there are any questions on the method side,

play19:38

I guess we can take them also now.

play19:40

But, otherwise, also happy to take them at the end.

play19:47

OK, so let's see some results and some summaries.

play19:53

So first of all, what do we train this on?

play19:56

So we use PDBBind, which is a standard benchmarks.

play20:00

It contains about 19,000 structures from PDB.

play20:05

They are being curated to be of higher quality

play20:10

and obviously containing ligands.

play20:14

We do a time based split so we train

play20:19

on complexes that was a result before 2019

play20:23

and we test on your complexes.

play20:26

The complexes that we test on, we

play20:29

make sure that there are no ligands that

play20:32

were in our training set to look more

play20:37

for some kind of generalization.

play20:39

And we have various different kinds of baselines

play20:41

both from traditional methods and deep learning ones.

play20:47

OK, so the first set of results that we are going to see

play20:50

is when we provide the methods, the hollow structures.

play20:54

So this means that we actually feed into the methods

play20:57

the exact structure that the protein will

play21:00

take when it will be bound.

play21:06

And this is typically the way that these methods are

play21:09

evaluated, although, one could argue

play21:11

that it's not very realistic.

play21:14

But we can still see here that for this blind docking task

play21:18

traditional methods and this deep learning methods

play21:20

don't really get a success rate where we're measuring success

play21:25

by the proportion of predictions with the top one RMSD

play21:31

below 2 angstrom.

play21:32

And we can see that the success rate is below 25%.

play21:37

Now this can be increased by combining basically a pocket

play21:43

finding methods like to rank or equipping itself

play21:47

with one of these traditional methods focused

play21:50

on that specific pocket.

play21:53

But then we actually see that DiffDock itself

play21:57

can achieve a significantly higher performance.

play22:04

We can also look at the performance

play22:08

on predicted structures.

play22:10

So this is in the setting that instead of fitting the ground

play22:14

truth structure so the ground truth bound structure,

play22:16

we are fitting the full generated structure.

play22:20

So we feed the sequence of the protein in ESMFold,

play22:24

we get the structure, and then we

play22:26

try to dock the ligand to the structure.

play22:28

And as Hannes said at the beginning,

play22:30

the traditional methods have also

play22:33

been shown in previous works but we can also

play22:36

see here that they really struggle in this task.

play22:39

The performance and the success rates

play22:42

drops all the way to below 5%.

play22:46

And the reason is because often the side chains

play22:49

of the structures are not accurate,

play22:52

but in the sense that they will change upon binding.

play22:57

And on the other hand, you can see that actually DiffDock

play23:00

is a lot more reliable and loses a much smaller percentage

play23:05

of its success rate, and so it retains

play23:12

some good level of performance also on AlphaFold

play23:17

or fold predictor structures.

play23:23

Now we actually wanted to give some shout

play23:29

outs to some of the works that have used DiffDock

play23:33

since we published.

play23:34

So this, for example, is a very interesting work

play23:37

on where DiffDock was basically used to do reverse screening,

play23:43

this came from the group of Tim Peterson at the Washington

play23:46

University in Saint Louis where they use DiffDock

play23:50

and trying to bind DiffDock on a series of proteins

play23:55

on a particular pathway to try to understand

play23:58

the mechanism of action of a new drug that they had discovered.

play24:03

And this is a very promising, for example,

play24:06

application where blind docking and in particular blind docking

play24:10

to AlphaFold or in general infrastructure,

play24:15

we think will have a big, big impact.

play24:20

To kind summarize a bit more using the tool itself.

play24:27

So you can find first of all kind of more detailed structure

play24:32

on our GitHub where you also find the models and Colab

play24:37

notebooks in case you prefer using those.

play24:41

And so the input is a protein structure.

play24:45

And here you can either give the structure

play24:47

file if you have, for example, a cross talk structure.

play24:52

Or you can even feed the input and in which case,

play24:55

the model will fold the protein with ESMfold.

play25:01

And then you have to provide the ligand.

play25:04

And so here again can be either a structure file or a smile

play25:08

string, but in either setting, we

play25:11

don't assume that this structure is actually

play25:14

accurate for ligand.

play25:16

And then the reverse diffusion runs,

play25:20

we run the confidence model, and then

play25:22

the output will be basically the files with the predicted ligand

play25:26

poses where you will also find the rank and the confidence

play25:31

of each of the poses in the name of the file.

play25:38

And this is kind of mainly what we also have.

play25:44

We also wanted to give a brief overview of the follow-up work

play25:50

that we are going to be releasing soon called

play25:54

DiffDock-pocket where we tackle two of the commonly

play25:59

reported problems with the original DiffDock

play26:01

for many people that have been using it,

play26:04

and one was the ability of not being able to control

play26:08

for a specific binding pocket.

play26:10

And the second is although I've shown

play26:12

you some good infrastructure performance,

play26:15

there was not really any way of predicting

play26:18

the rearrangement of the side chains upon binding.

play26:23

And so potentially making the relaxation step that Bowman

play26:28

was talking about harder.

play26:31

So what we do here is we do a few changes to have this pocket

play26:36

conditioning where we restrict the focus to the pocket

play26:39

and we give access to the model to the full atomic coordinates

play26:44

and then we also add side chains torsional flexibility

play26:49

built directly into the diffusion process.

play26:54

And again, you can see here again kind of hollow docking

play27:00

performance on PDB bind when condition on a specific pocket.

play27:05

So when we are dealing with hollow structures

play27:08

or often with cross docking structure,

play27:11

we are at least as good or better

play27:14

than some of these traditional methods that

play27:16

were designed for this task.

play27:18

But again we have significantly better performance

play27:22

when docking to structures and we can see here also

play27:27

when it comes to predicting the correct rearrangement

play27:30

of the side chains, we do significantly better

play27:33

than some of the traditional methods.

play27:37

And with that, we'd like to thank you for listening

play27:40

and open the floor for any questions.

play27:51

Thank you.

play27:51

That was very interesting talk.

play27:53

And I have to admit that DiffDock-pocket answer it

play27:55

in one of the questions I had queued up

play27:57

before I even got a chance to ask it.

play27:59

So, well prepared.

play28:02

So we have one question that came in from Joseph DeCorte.

play28:07

Beyond relaxation, do you recommend local refinement

play28:10

with one of the DiffDock pose with one of the more

play28:13

traditional algorithms?

play28:16

Yeah.

play28:20

People have used they've talked with the combination

play28:23

of different tools.

play28:26

In general, it kind of I think depends on what you're

play28:31

going to use down the line.

play28:33

I think in general, whenever you're using some,

play28:36

for example, scoring function or some energy function

play28:40

is best to basically either relax or do

play28:44

some pose refinement using that same scoring function,

play28:51

because different scoring function

play28:54

and also the intrinsic scoring function

play28:56

DiffDock have kind of slightly different properties.

play29:01

And so in general, depending on the tool

play29:04

I think that you want to use downstream,

play29:06

you should basically do relaxation

play29:08

with the same kind of level of tool.

play29:15

OK.

play29:15

Thank you.

play29:18

Since you touched on the sampling a little bit,

play29:21

is that a case where you've tested multiple different ways

play29:24

of doing the scoring during sampling or could

play29:27

you just say a little bit more about how you're doing that,

play29:31

Scoring during sampling?

play29:33

Was meant by that?

play29:37

Is that just purely geometric or is

play29:40

that one of the traditional atomic force fields?

play29:44

So the sampling process itself is not driven by any physics

play29:51

based energy or scoring function.

play29:55

Oh, are you talking about this?

play29:56

This part?

play29:57

Thi ranking of the poses.

play29:59

No, I think going back further.

play30:02

So you were answering the right area.

play30:04

Yeah, the actual sampling itself.

play30:06

So, I mean, a kind of overviewed here

play30:12

in diffusion models, the neural network

play30:15

is actually predicting the movement

play30:18

that would bring the pose towards the data distribution.

play30:21

So rather maybe that was not the best illustration,

play30:24

maybe this is the slide, sorry for so many slides.

play30:26

So in this slide here the neural network

play30:29

is quite literally predicting a set

play30:31

of two vectors and a torsional velocity

play30:35

around each one of these.

play30:36

So it's all coming from the neural network

play30:37

and it's all data driven.

play30:39

There's no physics based sampling involved.

play30:43

So is there a way to incorporate the any reliability information

play30:48

about the protein structure into the DiffDock

play30:52

way of thinking about it.

play30:55

So in general, the question of how

play30:57

to incorporate this prior knowledge

play30:59

into these diffusion sampling processes

play31:01

is an active area of research.

play31:05

Yeah, there are lots of interesting ways to go here.

play31:08

Yeah, I was just thinking like what if you have deep factors.

play31:12

What if you have multiple AlphaFold or ESMfold

play31:15

predictions and you want to use a cluster.

play31:17

And it sounds like I should wait and see for future research.

play31:22

Yeah.

play31:22

Yeah.

play31:24

I mean, one very first pass idea that one could have is,

play31:28

if you want to restrict to a certain binding pocket,

play31:30

then you can manually adjust this translation vector.

play31:35

So that's the one in brown here.

play31:36

So one thing that you can do if you

play31:38

want to dock to a specific pocket without retraining

play31:41

the dock is just if the brown vector is

play31:44

pointing towards a different pocket, just correct it.

play31:47

But what one generally finds is that when

play31:51

it is possible to retrain the model

play31:52

to explicitly use the prior information as input,

play31:56

it generally does better.

play31:59

So maybe that's why the focus has

play32:00

been a bit more on how to incorporate

play32:03

the different kinds of knowledge that you

play32:04

would want to incorporate.

play32:05

But it is true that there are a number of inference time tricks

play32:09

that you could consider doing.

play32:11

And the rough paradigm is just, like, you can manually

play32:15

adjust any of these updates if you feel like they're wrong.

play32:21

One other approach for if you wanted

play32:23

to target a pocket before DiffDock pocket came out,

play32:27

it sounds like you could just edit your input structure

play32:31

and delete the parts that you didn't want to have docking on

play32:34

and that would be ugly and feel kind of like a hack but--

play32:38

Yeah, yeah.

play32:39

And here you know, there and there are different ways

play32:47

that one can think about it I mean

play32:49

one could also like just sample multiple times until you

play32:52

get closer inside of that pocket and do a manual kind

play32:56

of confidence filtering.

play32:58

But in general that's not really kind of making

play33:03

the model kind of like--

play33:05

We would expect the performance when

play33:07

restricting to a particular pocket

play33:08

to be better because the model has should have an easier task

play33:12

and so this was one of the motivation

play33:14

for actually training model fully on the pocket

play33:19

and also trying to predict pocket rearrangements

play33:23

at the same time.

play33:25

Yeah.

play33:26

Thank you.

play33:27

Another question, could you talk a little bit

play33:29

about the general benchmarks as far as speed

play33:32

for like number of molecules per day for GPU?

play33:35

And obviously there's going to be a lot of range.

play33:39

Yeah, let me see, I think it might be on one

play33:43

of the supplementary slides.

play33:47

So it's always a bit hard to compare these methods.

play33:53

So first of all, so here it's kind of the general number,

play33:59

which is on a depending on how many samples that you take,

play34:04

when you're on DiffDock, it takes either 10

play34:07

or 40 seconds on a GPU.

play34:13

And so based on that do some calculations

play34:21

on how many samples a day you can obtain.

play34:27

There are a range of tricks that one can do to accelerate them.

play34:33

And the obvious one is you can take fewer samples

play34:39

if you have access to less computational resources.

play34:45

And you can see in our paper, or I

play34:49

think we should also have here, that there is a sort of you

play34:53

know obviously a curve.

play34:55

So here we've kind of presented results with like 40 samples.

play35:00

You have relatively little loss in performance just taking

play35:04

10 samples, and and that's for example four times faster.

play35:10

So there is a range of hyperparameters

play35:14

that one can play with if speed is an important consideration.

play35:23

Yeah, these models also, something I should say

play35:26

is that, they do run best on GPU, so running on GPU

play35:33

often takes even longer than the traditional methods when

play35:38

compared on GPU.

play35:40

And could you say a little bit about the DRam requirements.

play35:46

Because some things are very DRam heavy,

play35:49

some things are not.

play35:51

Nobody has enough GPU time.

play35:53

Yeah, so it really depends to be honest on the size

play36:01

of the protein and the size of the protein ligand.

play36:08

But something that can be controlled

play36:10

and is one of the hyperparameters that

play36:12

can be fed into DiffDock is basically

play36:16

what we call the batch size, which is basically

play36:18

how many complexes are sampled, how many kind of samples we

play36:24

take in parallel.

play36:27

So basically if you are taking 40 samples, by default

play36:33

we're taking them in batches of 10 samples.

play36:36

But if you have a smaller memory,

play36:38

you can scale this to, for example, four or eight

play36:43

and to fit your GPU memory.

play36:51

Thank you.

play36:52

We've got a couple more questions coming in.

play36:55

So could you say something about if the scoring for DiffDock

play37:01

pocket, or DiffDock in general, how would you compare that

play37:07

to traditional scoring?

play37:13

And I believe this is something that you addressed earlier,

play37:16

but I might be wrong about that.

play37:20

Well, I think maybe to answer that question

play37:24

without further knowledge of what specifically is meant

play37:28

by the comparison, the scoring by this confidence model

play37:33

here is based on a trained classifier where the model is

play37:38

trained to predict whether the pose is under two Angstrom

play37:41

RSD or above two Angstrom RSD.

play37:44

So what that will mean is that compared

play37:46

to a traditional scoring function that

play37:49

is based on pairwise interaction terms,

play37:53

this scoring function will give a very good score to oppose

play37:58

that is say 1.5 angstroms away but has overlapping atoms

play38:02

with one of the side chains.

play38:05

But it will not give a good score

play38:07

to a ligand that is in the wrong pocket

play38:10

but happens to have a very good energy inside of that pocket.

play38:14

So I guess maybe to summarize and to hit home the message,

play38:19

this scoring function is trained with the sole purpose

play38:24

of selecting poses that are within two angstroms RSD

play38:27

geometrically of the ground truth ligand pose.

play38:29

So if ideally trained, it is a convex energy surface.

play38:34

There will be no spurious local minima.

play38:36

Of course, that may not be the case in practice.

play38:39

But this is to sharply contrast with traditional scoring

play38:42

functions which can be very, very sensitive, for example,

play38:45

to specific interatomic distances avoiding

play38:50

steric clashes and have a very rugged energy surface.

play38:55

Hopefully that answers the question.

play38:59

I think it does, and it illustrates,

play39:03

"Is A better than B?", well, better for what?

play39:07

Exactly, exactly.

play39:09

Jason has his hands up so I assume he has another question.

play39:13

Yeah, I did.

play39:16

If you're taking this approach where we are not really

play39:18

paying attention to clashes, we don't really

play39:20

have to worry about physics, we got a big set of conformers

play39:23

we just dive in and let the--

play39:25

what are the ramifications for the size of the ligands

play39:28

that you can sample?

play39:30

Can you go to bigger conformational spaces?

play39:34

Can you use more roadable bonds?

play39:39

Have you looked at things like--

play39:42

Typically, with the approaches, people

play39:46

use fragments or maybe even 20-25 atoms,

play39:51

but can you go even bigger with this,

play39:53

and do you pay a penalty performance wise?

play39:56

Yeah, so definitely that's a great point, which

play40:00

is the fact that you're using this kind of smoothed out

play40:06

energy surfaces gives particular improvements

play40:13

when your dimensional space that you're searching over it's

play40:17

higher dimensions.

play40:18

And so we do expect it to do better

play40:22

in terms of when you add more and more kind of degrees

play40:25

of freedom.

play40:26

Now, I don't think we have, I don't

play40:29

know if we have some of these results in the presentation,

play40:33

but also the other problem, however perpendicular problem,

play40:37

is the fact that potentially larger and larger ligands are

play40:42

out of the domain where the model was trained

play40:48

so that's potential caveat.

play40:51

But I think one setting where it becomes

play40:54

very clear kind of disadvantage is

play40:56

when you look at predicting flexibility of side chain.

play41:00

So many of the traditional methods

play41:03

allow you to basically enable some flexibility in side chains

play41:08

by basically allowing to also model the torsion

play41:14

angles of the side chain, similarly to the way

play41:16

the DiffDock pocket does.

play41:18

But when you actually see how these models do,

play41:21

they actually work pretty terribly.

play41:28

And I think from my intuition the reason why kind of they

play41:34

really struggle is that the degrees of freedom

play41:36

really increase, and so this search algorithm that they

play41:41

use really, really struggle.

play41:44

And so this is, I think, kind of one

play41:48

setting where we can really see the advantage of having

play41:53

these kind of diffusion like approach versus a energy

play41:58

based approach and search based approach

play42:01

of traditional methods.

play42:03

Yeah, that's interesting.

play42:04

I hadn't thought about that that the diffusion based approach,

play42:07

well, it kind of skirts that limitation

play42:09

in the sense that it doesn't have

play42:11

to worry about steric interactions

play42:13

is limited that it's trained on it.

play42:16

So, like, it's expanding beyond that is probably challenging.

play42:30

I think that may have run through all of our questions.

play42:33

So, if there are no more from the audience,

play42:36

then thank you for a great talk, and I enjoyed it,

play42:40

and I hope the audience did too.

play42:43

Thank you.

play42:44

Thank you very much.

play42:45

Thank you very much.

play42:46

It was great.

play42:47

I guess maybe one final question, on a lighter note,

play42:50

you'd mentioned the image generation AI tools,

play42:54

do you think that the intuitions that people would develop

play42:57

playing around with those tools would transfer to something

play42:59

like DiffDock in terms of, oh, I need more sampler steps,

play43:03

oh, I need a different batch size, or is

play43:05

that just, you know?

play43:07

Well, I mean, it is the same kind of model,

play43:09

so at the end of the day, there's

play43:11

a ton of shared language.

play43:13

And pretty much every concept that

play43:15

has been applied in image diffusion

play43:17

has some kind of analog and molecular diffusion.

play43:20

And if they haven't been explored already then

play43:22

their active areas of research.

play43:27

Cool.

play43:27

Well, thank you again.

play43:30

Thank you very much.

play43:33

[MUSIC PLAYING]

Rate This

5.0 / 5 (0 votes)

Related Tags
Structural BiologySmall Molecule DockingDeep LearningWebinar SeriesDiffDockProtein StructureMolecular ModelingGenerative ModelsAlphaFoldESMFoldSBGrid