Graph Neural Networks: A gentle introduction

Aladdin Persson
3 Jul 202229:14

Summary

TLDRThis video offers a foundational overview of Graph Neural Networks (GNNs), explaining why they're essential for handling data naturally structured as graphs. It covers common applications like social networks and molecules, and introduces fundamental tasks such as node classification, graph classification, and link prediction. The video simplifies complex concepts like permutation invariance and message passing in GNNs, providing a clear pathway for beginners to grasp and apply GNNs in practical scenarios.

Takeaways

  • 🌐 **Graphs as Data Representation**: Graphs are a natural way to represent certain types of data, such as social networks, transportation networks, and molecules.
  • 🚀 **Emerging Field**: Graph Neural Networks (GNNs) are an emerging area in deep learning with significant research activity and potential applications.
  • 🔍 **Beyond Grid-like Structures**: GNNs aim to move beyond the assumptions of grid-like structures inherent in traditional deep learning methods for data like images and videos.
  • 🔑 **Graph Components**: A graph consists of vertices (or nodes) and edges, which can be undirected or directed, and may have weights or features associated with them.
  • 🎯 **Common Graph Tasks**: Key tasks in graph neural networks include node classification, graph classification, node clustering, link prediction, and influence maximization.
  • 📊 **Graph Representation**: Graphs can be represented using feature matrices for node attributes and adjacency matrices to describe connections between nodes.
  • 🤖 **GNN Computation**: GNNs operate on the principle of local neighborhood computation, where information is aggregated from neighboring nodes and updated through layers of computation.
  • 🔄 **Information Propagation**: In GNNs, information can propagate across the graph through multiple layers, allowing the model to capture global structure from local connections.
  • 🔧 **Key GNN Properties**: Important properties of GNN layers include permutation invariance and permutation equivariance, ensuring that the model's output remains consistent regardless of node ordering.
  • 🛠️ **GNN Architectures**: Different GNN architectures like Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs) vary in how they compute messages and aggregate information from the graph structure.

Q & A

  • What is the main goal of the video on graph neural networks?

    -The main goal of the video is to provide a background overview of graph neural networks (GNNs), explaining why they are important, how they work, and what tasks they can be used for, to give viewers the necessary background knowledge to understand and follow more detailed coding implementations for specific graph tasks.

  • Why are graphs considered important in the context of this video?

    -Graphs are considered important because many types of data, especially those from applications like transportation networks, social networks, and biological molecules, are naturally described as graphs. This video aims to explore how graph neural networks can be used to process and analyze such data effectively.

  • What are some common tasks that can be performed using graph neural networks?

    -Common tasks that can be performed using GNNs include node classification, graph classification, node clustering, link prediction, and influence maximization. These tasks are crucial for various applications such as fraud detection, molecule property prediction, social network analysis, recommender systems, and communication network analysis.

  • How does the representation of a graph typically involve vertices and edges?

    -A graph is typically represented by a set of vertices (or nodes) and edges. Vertices represent entities, and edges represent the relationships or interactions between these entities. The graph can be undirected, meaning there is no specific direction of flow between nodes, or directed, indicating a flow from one node to another.

  • What is the significance of permutation invariance in the context of GNNs?

    -Permutation invariance is a key property that GNNs should have, which means the output of a GNN layer should remain the same even if the order of nodes in the input is changed, as long as the underlying graph structure remains unchanged. This property is important for ensuring that the model is robust to the arbitrary ordering of nodes.

  • How does information propagate in a graph neural network?

    -Information in a GNN propagates through a mechanism of local neighborhood computation. Each node aggregates information from its neighbors, and this process is done in parallel for all nodes. By stacking multiple layers, information can propagate throughout the entire graph, allowing the model to capture global structure from local connections.

  • What is the role of the adjacency matrix in representing a graph?

    -The adjacency matrix is used to represent which nodes are connected to other nodes in a graph. It is a matrix where each row corresponds to a node, and the entries indicate connections to other nodes. If the graph is undirected, the adjacency matrix is symmetric, reflecting that the connections are bidirectional.

  • Can you explain the concept of message passing in GNNs as described in the video?

    -Message passing in GNNs involves each node performing a local computation on its neighbors, aggregating the results, and then updating the node's representation. This process is done in parallel for all nodes and can include steps like computing messages for each neighbor, aggregating these messages, and updating the node's state based on the aggregated information.

  • What is the difference between a graph convolution network and a graph attention network?

    -Both graph convolution networks and graph attention networks are types of GNNs, but they differ in how they compute the updates for node representations. Graph convolution networks typically use a summation of neighbor node features followed by a linear transformation, while graph attention networks compute attention scores to weigh the importance of different neighbors before aggregating their features.

  • How does the video script differentiate between node features and edge features in graph representations?

    -The video script mentions that each vertex or node can be represented by a feature vector or embedding vector, which describes attributes of the node. Similarly, edges can also have features in a feature vector, which describe the relationship or bond between two nodes. This distinction allows for a more detailed and comprehensive representation of the graph's structure and properties.

Outlines

00:00

🌐 Introduction to Graph Neural Networks

The speaker welcomes viewers to an introductory video on Graph Neural Networks (GNNs), aiming to provide a foundational understanding of GNNs, their functionality, applications, and the types of tasks they can perform. The video is geared towards those familiar with GNNs but lacking in-depth knowledge. The speaker intends to deliver the necessary background for viewers to comprehend and follow detailed coding implementations for specific graph tasks in future videos. The speaker also shares personal experience with GNNs in a startup context, highlighting the current surge in GNN research, likening it to the 'ImageNet moment' in deep learning. The video begins by addressing why graphs are essential, noting that data from various applications like transportation networks, social networks, and biological molecules are naturally graph-structured. The speaker emphasizes that most deep learning methods assume a certain data structure, such as the grid structure of images, and that GNNs aim to extend these methods beyond such assumptions, which is part of the broader field of geometric deep learning.

05:00

🔍 Graph Representation and Features

This paragraph delves into the representation of graphs, explaining that a graph consists of vertices (or nodes) and edges. The graph can be undirected, where the edges have no specific direction, or directed, where the flow between nodes is unidirectional. Edges can also have weights, indicating a measure of connection strength, such as the volume of water flow. Each node can be represented by a feature vector or embedding, which could detail attributes like the number of protons, electrons, or neutrons in a molecular graph. The edges can similarly have feature vectors describing the nature of the connection between nodes. Using caffeine as an example, the speaker illustrates how a molecule can be represented as a graph with atoms as nodes and bonds as edges, each with their respective features.

10:01

🎯 Common Graph Tasks and Their Importance

The speaker outlines common tasks performed using graphs, focusing on five primary applications. Node classification is highlighted, where the goal is to categorize nodes—such as identifying fraudulent bank users based on their transaction patterns. Graph classification involves assessing the entire graph, like determining if a molecule is toxic. Node clustering aims to group subgraphs into clusters, link prediction seeks to identify missing connections, such as recommending products or movies based on user behavior, and influence maximization focuses on finding the most influential node in a network. The speaker stresses that node classification, graph classification, and link prediction are the most crucial tasks and encourages reframing problems to fit these tasks due to the availability of robust methods to solve them.

15:04

📊 Graph Representation Methods and Computation

The discussion shifts to how graphs are represented, with a focus on feature matrices for nodes and adjacency matrices for edges. The speaker mentions the challenges of representing sparse graphs with adjacency matrices due to their potentially large size and numerous empty values. An alternative, the adjacency list, is briefly introduced. The core of the paragraph is an explanation of how GNNs work, starting with a basic understanding of convolution operations adapted for graph structures. Instead of sliding filters as in image processing, GNNs leverage the local neighborhood of each node, aggregating information from neighboring nodes to compute new node representations. The speaker uses a two-layer GNN example to illustrate how information can propagate across the graph, emphasizing the importance of considering node degrees and the shortest path lengths in determining the number of layers needed in a GNN.

20:05

🔄 Key Properties of GNN Layers and Message Passing

The speaker introduces two key properties desired in GNN layers: permutation invariance and permutation equivariance. Permutation invariance ensures that the output remains unchanged when the order of nodes is altered, as the underlying graph structure remains the same. Permutation equivariance ensures that any permutation applied to the input is reflected in the output, maintaining the correspondence between input and output orderings. The paragraph then describes the message-passing mechanism fundamental to GNNs, where each node performs local computations, aggregates messages from its neighbors, and updates its representation. The speaker contrasts this with traditional convolutional operations, highlighting the importance of aggregation methods that are invariant to the order of nodes.

25:06

🛠 Graph Convolution Networks and Attention Mechanisms

The final paragraph provides a deeper look into Graph Convolution Networks (GCNs), explaining the computation process involving message passing and aggregation using node embeddings. The formula for GCNs is discussed, detailing how node features are aggregated and combined with a learned transformation matrix. The speaker also touches on Graph Attention Networks (GATs), which introduce attention mechanisms to weigh the importance of different nodes in the neighborhood, allowing the model to focus on more relevant features. The video concludes with a summary of the key concepts covered and an anticipation for future videos that will delve into coding implementations using PyTorch Geometric.

Mindmap

Keywords

💡Graph Neural Networks (GNN)

Graph Neural Networks (GNN) are a type of deep learning model designed to handle data that can be represented as graphs. In the video, GNNs are introduced as a way to process non-Euclidean data structures like social networks, molecules, or transportation networks. GNNs are highlighted as a frontier in deep learning, with significant research and applications in various fields.

💡Node Classification

Node classification is a task mentioned in the script where each node within a graph is classified into different categories. This is exemplified by determining if a bank user is fraudulent or not, based on their transaction network. Node classification is one of the primary tasks in GNNs, demonstrating the practical application of GNNs in analyzing and categorizing entities within a network.

💡Graph Classification

Graph classification is described as a task where the entire graph is classified into categories, such as determining if a molecule is toxic. This task is crucial for applications in chemistry and biology, where the properties of the whole structure need to be inferred from its graph representation.

💡Link Prediction

Link prediction involves predicting missing or potential connections between nodes in a graph. The script uses the example of recommending products or movies to users based on their existing connections or interactions, which is a common application in recommender systems. This task showcases GNNs' ability to model relationships and predict interactions within a network.

💡Adjacency Matrix

The adjacency matrix is a key concept in graph theory and is used to represent the connections between nodes in a graph. In the script, it is mentioned as a common way to represent graphs, especially in the context of GNNs, where it helps define the structure of the graph that the neural network will process.

💡Feature Vector

A feature vector is a mathematical representation of an entity, used to describe nodes or edges in a graph. The video script explains that each node can be represented by a feature vector, which could include attributes like the number of protons, electrons, or neutrons in a molecule. Feature vectors are essential for GNNs as they provide the input features that the network learns from.

💡Permutation Invariance

Permutation invariance is a property that GNNs aim to achieve, meaning that the output of the network should remain unchanged if the order of nodes is permuted, as long as the graph structure itself doesn't change. This property is crucial for ensuring that GNN models are robust and can handle different orderings of the same graph data.

💡Permutation Equivariance

Permutation equivariance refers to the property that any permutation applied to the input nodes should correspond to the same permutation in the output nodes. This concept is important for GNNs to maintain the correct relationships between nodes even when their order changes, ensuring that the model's predictions are consistent with the graph's structure.

💡Message Passing

Message passing is a core mechanism in GNNs where information is exchanged between nodes through a series of local computations and aggregations. The script describes how this process allows information to propagate across the graph, enabling the network to learn from both the features of individual nodes and their connections to other nodes.

💡Graph Convolution Network (GCN)

Graph Convolution Network is a specific type of GNN that uses a convolution operation adapted for graph-structured data. The script explains how GCNs aggregate information from neighboring nodes and apply a learnable transformation to update node representations. GCNs are highlighted as a foundational architecture for performing various graph-related tasks.

💡Graph Attention Network (GAT)

Graph Attention Network introduces attention mechanisms to GNNs, allowing the model to weigh the importance of different nodes in a graph. As mentioned in the script, GATs compute attention scores to determine which nodes should be given more significance during the aggregation process. This concept is pivotal for tasks where the relationships between nodes are not uniform and some connections are more influential than others.

Highlights

Introduction to Graph Neural Networks (GNNs) and their significance in deep learning.

GNNs are used for tasks where data is naturally described as graphs, such as social networks and molecular structures.

GNNs aim to go beyond the Euclidean geometry assumption of traditional deep learning methods.

Graphs are defined by vertices (nodes) and edges, which can be directed or undirected, and may have weights.

Nodes and edges in a graph can have feature vectors or embeddings that describe their properties.

Examples of graph representations include feature matrices and adjacency matrices, with the latter showing node connections.

Graph tasks such as node classification, graph classification, and link prediction are common applications of GNNs.

Node classification involves categorizing nodes within a graph, like identifying fraudulent bank users.

Graph classification is used to categorize entire graphs, such as determining if a molecule is toxic.

Link prediction aims to identify missing connections within a graph, useful for recommender systems.

Influence maximization seeks to find the most influential node in a communication network.

GNNs require key properties like permutation invariance and equivariance to ensure consistent outputs regardless of node ordering.

The message passing mechanism in GNNs allows information to propagate through the graph structure.

Graph Convolutional Networks (GCNs) perform local computations and aggregate messages from neighboring nodes.

Graph Attention Networks (GATs) assign importance scores to nodes, focusing on the most relevant neighbors.

GNNs can incorporate edge features into their computations, expanding their representational capabilities.

The video provides an overview of GNNs, setting the stage for more detailed coding implementations in future videos.

Transcripts

play00:01

hello and welcome to this introductory

play00:04

video on graph neural networks and the

play00:08

idea or the goal of this video is uh to

play00:11

give you a background overview of you

play00:16

know why we have this area uh how they

play00:21

work uh what they're used

play00:23

for um what kind of tasks we can perform

play00:27

and so on so we give you a good overview

play00:29

um I think if you are one of those who

play00:32

have heard about GNN but you haven't

play00:35

really looked into it too much so my

play00:38

goal is to give you that introduction

play00:40

and then uh give you the necessary

play00:43

background knowledge so that you can

play00:45

understand and follow more detailed uh

play00:48

coding implementations for specific

play00:50

graph tasks which uh my goal is to do in

play00:53

the subsequent uh videos um and uh let's

play00:58

see so I also want to say you know uh I

play01:03

have been doing some projects now with

play01:05

GNN at for the startup that I'm working

play01:08

on uh working with uh the AI

play01:11

framework um and uh so I thought you

play01:15

know this would be a good opportunity to

play01:16

also make this video and hopefully uh

play01:18

there will be more videos uh moving

play01:21

moving forward now but with that said

play01:23

let's let's get started so the first

play01:26

question you know that arises is why

play01:28

graphs um

play01:31

and the reason is really simple in that

play01:35

you know the data that we collect or

play01:38

data for particular types of

play01:40

applications are naturally described as

play01:42

graphs so examples of that could be in

play01:46

the top left you know a transportation

play01:48

network of some kind um maybe that could

play01:51

be Google Maps U maybe that could be a a

play01:55

u Subway uh

play01:58

Network and um another example could be

play02:01

a social network and that could be you

play02:04

know Twitter Facebook Instagram I don't

play02:08

know

play02:09

Tinder um uh could also be some type of

play02:13

um other you know they're really many uh

play02:17

but then we can also have um many applic

play02:20

many sort of um biom medicine and

play02:23

biology applications are also naturally

play02:25

described as graph uh so we could have

play02:28

for example molecules where we have

play02:31

atoms and we have bonds and that can be

play02:34

described as a graph uh but

play02:37

fundamentally you know if you look at

play02:38

what we've been doing so far in in deep

play02:41

learning is uh we we are really assuming

play02:45

some type of uh data some some type of

play02:48

structure on the data so for images for

play02:53

example we're assuming that we have you

play02:55

know a grid of pixels and that they're

play02:58

connected in a particular way using the

play02:59

these XY coordinates and that you know

play03:02

if if you would Shuffle some of the

play03:03

pixels the the the image wouldn't be the

play03:05

same at all um and this um this

play03:11

assumption on the GE the geometry or the

play03:14

spatial loc that the data is having some

play03:17

type of spatial locality is what most of

play03:20

the methods that we we're using is built

play03:22

on so you know if we look at the data

play03:25

like images videos text time series data

play03:29

and so on all of that assume some type

play03:30

of spatial locality so this I just

play03:33

wanted to mention this that um going

play03:36

beyond the particular ukian geometry is

play03:39

one of the goals of of graph neural

play03:41

networks and this field this broader

play03:43

field of geometric deep

play03:45

learning all right so now that we've

play03:47

covered why graphs uh and and you know I

play03:49

just want to say too that uh I think you

play03:53

know graphs are really really uh having

play03:56

their sort of they're the New Frontier

play03:58

of uh of Deep learning uh there's a lot

play04:00

of research and active research going on

play04:02

on graphs and it's kind of having some

play04:05

some you know its imag net moment so to

play04:08

speak right now uh and so they're an

play04:11

exciting thing to learn about but a

play04:13

quick recap you know just to go back

play04:15

what is actually a graph so graph is a

play04:18

set of vertices and

play04:20

edges and really uh Vertex or node uh

play04:23

they mean the same thing they they're

play04:25

both Ed

play04:26

interchangeably uh and in this

play04:28

particular graph here we have have four

play04:30

nodes or four vertices uh and we have

play04:34

five edges and the graph can either be

play04:38

uh undirected meaning there's no

play04:40

particular direction of the of the flow

play04:43

uh between nodes so that would be the

play04:45

edge connecting the nodes um or it can

play04:48

be directed where we say that uh the

play04:51

flow goes from one particular node to

play04:54

another node um and uh even more than

play04:58

that we can also have you know if we

play05:00

have some water flow for example

play05:02

Illustrated in this graph uh we could uh

play05:05

show that the flow uh from one

play05:07

particular not to another is I don't

play05:09

know 10 L per hour or something um or 10

play05:14

gallons uh and then

play05:17

um you know uh we uh we can specify that

play05:21

weight on the edge uh describing that so

play05:25

uh The Edge can also have a weight uh

play05:28

but even more General than that uh if we

play05:31

look at sort of the most General way to

play05:32

Des describe it each Vertex or node

play05:36

could be represented as a as having uh

play05:39

some type of feature vector or uh

play05:42

described as having some embedding uh

play05:45

some embedding

play05:47

Vector so the uh the embedding Vector

play05:50

could be of some size and you know to

play05:53

give some context maybe let's say we

play05:54

have a molecule at the atom is uh

play05:57

containing having some specific number

play05:59

number of protons uh electrons neutrons

play06:02

and so on and that this can be described

play06:05

uh more detailed in an embedding Vector

play06:08

uh we can also have that the edges uh

play06:11

can have some type of uh you know some

play06:14

number of features in a in a feature

play06:16

Vector in that uh it it um can describe

play06:21

the particular bond between these two

play06:23

atoms in some more uh detailed way so

play06:27

that is just some idea of of you know

play06:28

what uh what a graph is and what can be

play06:31

in in the most General sense how we

play06:33

describe the

play06:34

graph so an example here could be that

play06:37

we have a molecule and uh the molecule

play06:40

in this case is actually caffeine and

play06:44

then you know the right is the

play06:45

corresponding graph um and um yeah it's

play06:50

an example of how you describe the

play06:52

molecule in this case you know the there

play06:55

are different atoms and there are

play06:56

different bonds and maybe those would be

play06:58

represented as the features for each

play07:00

respective Edge and node um all right

play07:06

so now that we know what a graph is uh

play07:10

what are the the sort of the overview of

play07:12

what we want to do with the graph uh and

play07:15

there are some most common tasks uh and

play07:20

here I've chosen to uh show you five of

play07:24

them and uh you know the the most common

play07:27

one perhaps is node classification where

play07:29

where we have a particular graph uh and

play07:32

we want to know uh let's say that this

play07:34

is some Bank user uh who sends

play07:38

transactions uh you know described by

play07:41

the edges to to other bank users or

play07:44

something uh and we want to know you

play07:46

know is this particular Bank user is he

play07:48

is he is he fraudulent or not and so we

play07:52

want to classify each particular node

play07:54

each uh person or Bank customer perhaps

play07:58

Bank user into different categories in

play08:02

that case you know that would be a

play08:03

binary classification task and so we can

play08:05

maybe say that the the red one is uh is

play08:08

a fraudulent one and the blue one is not

play08:10

so that's a one of the most common ones

play08:13

uh we can also do entire GLA graph

play08:16

classification so that could be you know

play08:19

we have a molecule and we want to know

play08:21

is this a toxic molecule or is it not a

play08:24

toxic molecule that would also be a

play08:25

binary classification task but maybe it

play08:28

could also be you know what is the

play08:30

um odor uh what is the class belonging

play08:34

for this entire graph meaning uh in this

play08:37

case it could be what is does the

play08:38

molecule smell like what is the odor of

play08:40

the graph perhaps uh and then we can

play08:44

have node clustering we want to identify

play08:46

particular subgraphs of the graph uh

play08:49

belonging to to different

play08:51

clusters uh uh the other one is link

play08:54

prediction where we have U maybe we have

play08:58

users buying particular ular products or

play09:01

users watching particular movies um and

play09:05

so or or liking particular movie so user

play09:08

liking a particular movie would be

play09:10

described by an edge for example and the

play09:12

users would be some type

play09:15

of the I guess the users and okay let's

play09:19

not get into too much detail on that

play09:20

let's just ex imagine that example users

play09:23

buying particular product what we want

play09:25

to do is uh we want to do a a link

play09:27

prediction uh where we want to predict

play09:30

the missing edges meaning we want to uh

play09:33

give you know that these items might

play09:36

this user might like these items and

play09:39

that we should recommend them so even

play09:40

now when I said recommend you can

play09:42

imagine this is very uh prevalent in

play09:44

recommender systems uh and so it's a

play09:48

very important graph task another one is

play09:51

influence maximization where we want to

play09:53

find a particular node that is um having

play09:57

the most influence on a some kind of

play10:01

communication Network so maybe the

play10:03

question here would be if we want to

play10:05

remove one particular node which is the

play10:07

node to remove that influences the

play10:09

communication the

play10:10

most um but the three most actually the

play10:13

like the three most important ones to

play10:15

keep in mind is this node classification

play10:16

graph classification and Link prediction

play10:18

so these three are the absolute uh main

play10:22

ones and in many scenarios if you are

play10:26

describing you know if you realize that

play10:28

you're having this problem that you're

play10:29

trying to solve is uh the it's described

play10:33

as a graph um what you want to do even

play10:37

if it's not immediately apparent you

play10:38

want to uh modify your your problem in

play10:43

such a way that it can be described or

play10:45

it can be solved using one of those

play10:47

three tasks and uh meaning node

play10:51

classification graph classification or

play10:52

link prediction so if you can reduce it

play10:55

or sort of someone somehow rephrase the

play10:58

problem such as it can be solved then

play11:00

that's a really good way because we have

play11:03

strong uh stable solid methods to solve

play11:06

those those types of

play11:08

problems so another thing is uh you know

play11:11

how is the graph represented uh

play11:14

and how the graph is represented I just

play11:17

want to give you some overview it can be

play11:19

represented in different ways but the

play11:21

most common one is that we have some

play11:22

feature Matrix uh this one uh where we

play11:25

have n nodes and we have in this case

play11:27

that each node is represented by four

play11:30

dimensional um Vector uh and um the then

play11:35

we have an adjacency Matrix uh which is

play11:38

descri which is

play11:40

describing uh which nodes are connected

play11:42

to what other nodes so here for example

play11:44

the first row corresponds to what nodes

play11:48

are the first node connected to so that

play11:50

will be here number two number four as

play11:53

we can see number two and number four

play11:56

and so on and there's more as well uh

play11:59

and we can also see that in this case

play12:01

The adjacency Matrix is uh is uh

play12:04

symmetric so if we transpose this we get

play12:07

the same Matrix uh that means that the

play12:10

graph is undirected if it would be

play12:12

directed there would this property would

play12:15

not hold um but that's just an idea of

play12:18

how the graph is represented uh just to

play12:20

give an idea here if we have a very

play12:22

sparse graph then this is obviously not

play12:24

the ideal way to to um to sort of store

play12:28

the gra

play12:29

because the adjacency Matrix would be

play12:31

have a lot of zeros or be empty uh and

play12:34

another way to describe the graph

play12:36

instead is using an adjacency

play12:38

list but I don't want to get into too

play12:40

much detail on that you can keep this

play12:42

idea in mind that this is how the graph

play12:45

is is

play12:46

represented so now we get to um maybe

play12:50

the most important thing I don't know

play12:53

which is that uh you know how do they

play12:56

actually work and uh I'm not going to go

play12:59

into like the details details of how

play13:01

each respective architecture of GNN is

play13:04

worked I just want to give you a a

play13:06

strong intuition and we'll look at some

play13:09

of the main uh variants uh later on but

play13:14

first of all let's let's look at a

play13:16

convolution so a convolution um as

play13:19

you're probably as I'm assuming you're

play13:20

familiar with um works by sliding a a

play13:24

filter or a kernel uh with particular

play13:26

weights and it does a DOT product with

play13:28

the underlying

play13:30

uh pixel values for that for where the

play13:34

filter is located in the image and then

play13:36

it does uh it gets a value and that's

play13:38

sort of the next output from that

play13:41

convolution uh for graphs you know we

play13:43

don't slide anything uh because we don't

play13:46

have that type of structure in the data

play13:48

but what we do have is that we have some

play13:50

local neighborhood to each node so for

play13:52

example uh if we want if we're looking

play13:54

at this node um and I'm assuming you can

play13:57

see the mouse point winner here but if

play14:00

we're looking at node three on the left

play14:03

then uh what we can

play14:05

do uh is we can look at the neighbors uh

play14:08

close to that and that would be each

play14:10

node except for the first node um and

play14:15

then we do some computation uh between

play14:18

those uh using the information we have

play14:21

about about those nodes and that will

play14:23

get us some result so the idea here is

play14:26

that we use local neighborhood to that

play14:28

particular node and we do this to all

play14:30

nodes in parallel and get some type of

play14:33

result from

play14:35

that

play14:37

so now I want to give you some intuition

play14:40

as to how information propagates by

play14:42

using this mechanism so if we look at at

play14:45

the computational graph and let's say

play14:48

that we we're look at a two- layer GNN

play14:50

and we're looking at the vertex uh C

play14:54

here um we can see that the the ones

play14:57

Associated the ones connected to the to

play14:59

the green node or the uh node C is uh

play15:03

first of all uh a b e and

play15:07

f so um I will I will describe this

play15:12

there's some things to unpack here but

play15:15

first of all if we're looking at just

play15:16

the first computation using that local

play15:18

neighborhood in the first layer it will

play15:21

look at those four nodes um and if we

play15:25

then look at uh for two-layered um we we

play15:29

know that the computation for the the

play15:31

nodes connected to that will also use

play15:34

the local neighborhood um so for node a

play15:37

that would be D B and C right um

play15:42

so uh we can see that uh information can

play15:46

really propagate uh throughout the

play15:48

network or the throughout the graph uh

play15:51

if we use a if we stack a bunch of

play15:54

layers and here in this case we're

play15:58

actually get information from all

play16:00

nodes because we have that the shortest

play16:03

path uh from C uh is is two so then we

play16:08

only need a two layer GNN but if we have

play16:10

a a sort of if we don't if we have more

play16:14

nodes uh sort of as n nodes um that's

play16:17

bad description if the shortest path is

play16:19

is longer then we would have to have

play16:22

more layers in our GNN um so that's one

play16:26

way that that it can propagate and then

play16:28

I want to show you also some some of the

play16:30

computation here uh the idea and we'll

play16:33

get into this more as well is that first

play16:35

the there's some local uh computation

play16:37

done on the Node itself and then these

play16:40

are aggregated and put together uh in

play16:43

some way you don't have to focus on that

play16:46

too much all I wanted to show you is

play16:47

that information can propagate and this

play16:49

is how the computational graph can look

play16:51

like for a two- layer GNN uh we can also

play16:55

have um you know if we're looking at the

play16:57

d uh vert X then we can see that the

play17:00

computational graph looks very

play17:02

different uh because of the uh the no

play17:05

degree uh is very different for node D

play17:08

and node C then the the computational

play17:10

graphs becomes much narrower than the it

play17:13

is

play17:14

wide all right so now you have I hope

play17:17

you have some idea of the how

play17:19

computation and information can flow uh

play17:21

using this idea of local neighborhood

play17:24

computation now let's get to what we

play17:27

want from a GNN layer uh what is it that

play17:30

we want what are the Key Properties or

play17:33

the key ideas that we want a GNN layer

play17:36

to M to

play17:37

withhold um or be able to withhold um

play17:41

because this difference on the the GNN

play17:43

layer that we use uh the first one is

play17:46

permutation invariance uh permutation

play17:49

invariance means that if we

play17:51

take the uh feature Matrix and we take

play17:54

the adjacency Matrix um and we you know

play17:57

we choose to have particular numbers of

play18:00

the nodes for example here we have n

play18:01

node one uh on the graph to the right

play18:05

and then we have node two that's an

play18:07

arbitrary ordering right but it will

play18:09

matter uh because the feature Matrix

play18:13

will have the first node as the first

play18:15

row and

play18:17

correspondingly uh in The adjacency

play18:22

Matrix uh but really the ordering is

play18:25

arbitrary because we we chose it we can

play18:27

just as well uh swap the what we call

play18:30

node one and node two and it will still

play18:32

be the exact same graph uh even though

play18:34

the ordering is changed so permutation

play18:36

invariance says that if we apply some

play18:39

function that takes all information from

play18:42

all the the nodes and the

play18:45

edges uh and we get a single output uh

play18:50

then that should not matter uh as to how

play18:54

we chose the ordering of the

play18:57

nodes so

play18:59

um you know to the left here we have

play19:01

that P is a

play19:03

permutation don't if this confuses you

play19:06

don't Focus too much on it all it means

play19:08

is that if we change what we call number

play19:10

one and N node number seven if we just

play19:13

swap those two then uh the output should

play19:17

be the exact same because the underlying

play19:19

graph is the exact same uh even though

play19:21

the ordering of what we call node one

play19:23

and node 7 is is changed so that that's

play19:26

one key idea and a key property that we

play19:29

want some GNN layers to be able to have

play19:31

permutation

play19:32

invariance another uh key idea is

play19:35

permutation equivariance which is a

play19:37

little bit more subtle and um is uh a

play19:41

little bit less perhaps

play19:44

strict uh so uh let me explain what it

play19:47

is so we we have some graph on the top

play19:50

left here where we have uh chosen to to

play19:53

draw the graph in particular way uh and

play19:56

on the bottom we have the exact same

play19:58

graph

play19:59

um so when the graphs are structurally

play20:02

identical we call that they are

play20:04

isomorphic so these two graphs are

play20:06

isomorphic uh and if you look at them

play20:08

for a little bit uh you'll see that they

play20:11

are in fact you know exactly the same uh

play20:16

graph and so when we choose to describe

play20:20

the adjacency Matrix and the node uh the

play20:23

feature Matrix uh then we will choose

play20:25

some ordering for Blue uh orange yellow

play20:28

and gray we apply some some GNN layer

play20:31

and we get some output labels so in this

play20:33

case we're doing node classification we

play20:36

get that blue the blue node is

play20:38

corresponding to a cat uh orange is a

play20:41

person and so on now if we instead you

play20:44

know choose to draw the same graph but

play20:48

because of the way we drew it is

play20:49

different we're also choosing a

play20:50

different ordering um so basically all

play20:53

we did is we now permuted the rows um

play20:57

exact same graph though all we did is

play20:58

change the ordering uh when we apply F

play21:02

we still get the same output but the

play21:05

outputs are permuted um so we we see now

play21:08

that the the exact same labels the

play21:09

output labels are identical but they're

play21:11

not in the exact same ordering um and

play21:14

how um permutation equivariance is is

play21:18

that the permutation that we appli to

play21:20

the input should also be the same that

play21:22

is applied to the output um and so um

play21:26

you know more mathematically maybe I

play21:27

should have written the formula for this

play21:29

is that uh the function applied to the

play21:32

permutation of the input is the same as

play21:34

the permutation applied to the output um

play21:37

so this is that's what it means here um

play21:40

you can see that I guess Sigma here

play21:42

would be the permutation the difference

play21:44

is per in the permutation in the output

play21:46

is the difference in the permutation on

play21:48

the output uh input and output has the

play21:51

same

play21:53

permutation okay so that's two key ideas

play21:57

two Key Properties we want to have and a

play22:00

GNN is basically a collection of GNN

play22:03

layers where we want you know each GNN

play22:06

layer to have some type of this property

play22:08

so for example we could have you know we

play22:10

could have GNN layer one is permutation

play22:12

equivariant equivariant for Layer Two

play22:15

equivariant for layer three and then we

play22:16

have a permutation invariant for layer

play22:19

four that's just an

play22:22

example so um now we get you know the

play22:26

computation that we want to do for G

play22:28

follows this uh this idea of message

play22:31

passing so we go back to the to a

play22:34

similar graph that we looked at

play22:36

previously so we know uh that you know

play22:39

we have this computational graph now and

play22:41

we have that each node connected to it

play22:43

uh is performed um on some local

play22:46

computation um where we are trying to

play22:48

compute what is the message that we want

play22:51

to send to this uh this output node here

play22:54

and remember this is done for each node

play22:56

in parallel uh but we're doing some type

play22:58

of message then uh it's very important

play23:01

that we a we aggregate this to the The

play23:04

Ordering of the notes don't matter uh

play23:06

with the aggregation for example could

play23:07

be a a summation or a

play23:10

mean um but we we put together the

play23:13

messages in some way and then we pass it

play23:15

uh and we update the the node that we're

play23:18

interested in so this is the a general

play23:21

form and no matter what type of uh GNN

play23:25

architecture you look at um and I'll

play23:28

just you know uh give some example maybe

play23:30

we have graph convolution network uh um

play23:34

we have a graph attention Network or we

play23:36

have graph Sage um you know don't matter

play23:40

if don't care if you don't recognize

play23:42

those names but really all that's

play23:45

different between those is a a

play23:47

difference in either the message and

play23:50

slash or the aggregation function so uh

play23:54

now I want to you know look at how does

play23:56

a graph convolution Network look like uh

play23:59

what is the the formula for the

play24:01

computation that's

play24:02

performed and um so what we do here um

play24:09

is uh looking at the top right image

play24:10

here we have that some message is

play24:12

performed um and so the message uh

play24:16

that's performed is uh is I'm really

play24:20

hoping you can see the mouse cursor now

play24:22

uh but is the the one to the with the

play24:25

summation here uh where we we here we

play24:28

sum each of the connected node

play24:31

embeddings and we divide by the node

play24:33

degree um and the reason why we do this

play24:35

is just to normalize because

play24:37

otherwise degrees nodes that have larger

play24:40

node degrees would just blow up in the

play24:43

value uh but if we normalize then we

play24:46

will have similar distribution for all

play24:49

nodes so uh you know we we perform some

play24:53

message and that's done locally to the

play24:55

node uh now this looks a little bit

play24:58

confusing perhaps because we're doing a

play24:59

summation first and then we're doing a a

play25:02

matrix multiply with WK uh but remember

play25:06

it's just a linear linear operator so um

play25:10

the the reason why we do the sum first

play25:12

and then the Matrix multiply is because

play25:15

it's we can take out the linear uh

play25:18

operator um but if you choose to choose

play25:21

to view it more as the computation done

play25:23

in the top right then uh you can move

play25:25

the the uh the the weight inside and

play25:29

then take a summation so the summation

play25:31

here is a is the aggregation and the W

play25:34

is the message that's

play25:36

performed uh now one uh thing here is

play25:40

that you know one of the most important

play25:42

nodes to to the output of that

play25:45

particular node that we're looking at in

play25:47

this case this uh top node uh is

play25:50

obviously the node embedding of that

play25:51

node itself so we comp we add an

play25:54

additional computation to the right here

play25:56

with BK times the node embedding of that

play25:59

particular node uh

play26:01

V and then um so we we do that and we we

play26:05

sum that and then we apply a

play26:07

nonlinearity to that and then we produce

play26:11

the output embedding and then K here uh

play26:15

I'm assuming correspond to the layer

play26:17

yeah I think it should be the layer of

play26:19

the of the

play26:20

GNN and so that's a graph convolution

play26:22

Network and I really hope that you get

play26:24

the general idea here and that you know

play26:26

you realize that it's not really that

play26:28

scary as you might have thought that

play26:30

they were I don't know that's what I

play26:32

thought at least when I felt like I

play26:34

understood how they work but uh if we

play26:37

now look at a graph attention network

play26:40

instead uh we uh we do a very similar

play26:43

type of computation uh it's just that uh

play26:46

what we're doing now is that we're also

play26:48

adding some type of uh importance score

play26:51

to the nodes uh you know previously we

play26:53

just did a a summation meaning that we

play26:56

really chose to look at each node uh in

play27:00

the local neighborhood as having equal

play27:02

importance but now we um perform this

play27:05

additional step of of actually Computing

play27:08

uh through multi-headed attention uh you

play27:11

know what nodes we should pay most

play27:13

attention

play27:14

to um and I don't want to get into

play27:17

exactly how that computation is done I

play27:20

might actually do that in future videos

play27:22

if that would be important if that would

play27:23

be of interest um um but I know also

play27:27

that uh uh there have been people that

play27:29

have implemented these from scratch um

play27:33

AI Epiphany has done I think a video

play27:36

where he did this from

play27:39

scratch um so

play27:41

anyways um let me know if you want to

play27:44

see videos on that where we can look at

play27:47

the paper and understand the actual

play27:49

details of it but now for now just

play27:51

understand that the the idea is that we

play27:54

have these attention scores

play27:56

computed uh where we look at the

play27:58

importance of each node and then we do a

play28:01

summation um and uh let's see if there's

play28:04

anything else I want to say on this yeah

play28:06

so now uh you know we we said here that

play28:09

h u uh and HV is the node embedding for

play28:14

node V and node U one thing to keep in

play28:17

mind is that as I said in the beginning

play28:19

we can also have Edge features uh this

play28:22

is not described here in the formula but

play28:25

it would be one additional uh step

play28:27

towards a more General

play28:28

computation uh and it would not look too

play28:31

different it would just include the edge

play28:33

feeders as part of the uh the

play28:37

computation all right so that was a

play28:40

little bit longer than I hoped the video

play28:42

would actually be um but at least I hope

play28:46

you now have an idea of gnn's and a good

play28:49

overview uh that I think we can build on

play28:52

in uh in subsequent future videos I will

play28:55

make uh where we go into the details of

play28:57

of coding stuff using Pyro's geometric

play29:00

so very excited for that and thank you

play29:02

so much for watching the video and I

play29:05

hope to see you in the next one

play29:09

[Music]

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Graph Neural NetworksDeep LearningMachine LearningAI FrameworksData StructuresNode ClassificationGraph ClassificationLink PredictionMessage PassingNeural Networks
هل تحتاج إلى تلخيص باللغة الإنجليزية؟