Graph Neural Networks: A gentle introduction
Summary
TLDRThis video offers a foundational overview of Graph Neural Networks (GNNs), explaining why they're essential for handling data naturally structured as graphs. It covers common applications like social networks and molecules, and introduces fundamental tasks such as node classification, graph classification, and link prediction. The video simplifies complex concepts like permutation invariance and message passing in GNNs, providing a clear pathway for beginners to grasp and apply GNNs in practical scenarios.
Takeaways
- 🌐 **Graphs as Data Representation**: Graphs are a natural way to represent certain types of data, such as social networks, transportation networks, and molecules.
- 🚀 **Emerging Field**: Graph Neural Networks (GNNs) are an emerging area in deep learning with significant research activity and potential applications.
- 🔍 **Beyond Grid-like Structures**: GNNs aim to move beyond the assumptions of grid-like structures inherent in traditional deep learning methods for data like images and videos.
- 🔑 **Graph Components**: A graph consists of vertices (or nodes) and edges, which can be undirected or directed, and may have weights or features associated with them.
- 🎯 **Common Graph Tasks**: Key tasks in graph neural networks include node classification, graph classification, node clustering, link prediction, and influence maximization.
- 📊 **Graph Representation**: Graphs can be represented using feature matrices for node attributes and adjacency matrices to describe connections between nodes.
- 🤖 **GNN Computation**: GNNs operate on the principle of local neighborhood computation, where information is aggregated from neighboring nodes and updated through layers of computation.
- 🔄 **Information Propagation**: In GNNs, information can propagate across the graph through multiple layers, allowing the model to capture global structure from local connections.
- 🔧 **Key GNN Properties**: Important properties of GNN layers include permutation invariance and permutation equivariance, ensuring that the model's output remains consistent regardless of node ordering.
- 🛠️ **GNN Architectures**: Different GNN architectures like Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs) vary in how they compute messages and aggregate information from the graph structure.
Q & A
What is the main goal of the video on graph neural networks?
-The main goal of the video is to provide a background overview of graph neural networks (GNNs), explaining why they are important, how they work, and what tasks they can be used for, to give viewers the necessary background knowledge to understand and follow more detailed coding implementations for specific graph tasks.
Why are graphs considered important in the context of this video?
-Graphs are considered important because many types of data, especially those from applications like transportation networks, social networks, and biological molecules, are naturally described as graphs. This video aims to explore how graph neural networks can be used to process and analyze such data effectively.
What are some common tasks that can be performed using graph neural networks?
-Common tasks that can be performed using GNNs include node classification, graph classification, node clustering, link prediction, and influence maximization. These tasks are crucial for various applications such as fraud detection, molecule property prediction, social network analysis, recommender systems, and communication network analysis.
How does the representation of a graph typically involve vertices and edges?
-A graph is typically represented by a set of vertices (or nodes) and edges. Vertices represent entities, and edges represent the relationships or interactions between these entities. The graph can be undirected, meaning there is no specific direction of flow between nodes, or directed, indicating a flow from one node to another.
What is the significance of permutation invariance in the context of GNNs?
-Permutation invariance is a key property that GNNs should have, which means the output of a GNN layer should remain the same even if the order of nodes in the input is changed, as long as the underlying graph structure remains unchanged. This property is important for ensuring that the model is robust to the arbitrary ordering of nodes.
How does information propagate in a graph neural network?
-Information in a GNN propagates through a mechanism of local neighborhood computation. Each node aggregates information from its neighbors, and this process is done in parallel for all nodes. By stacking multiple layers, information can propagate throughout the entire graph, allowing the model to capture global structure from local connections.
What is the role of the adjacency matrix in representing a graph?
-The adjacency matrix is used to represent which nodes are connected to other nodes in a graph. It is a matrix where each row corresponds to a node, and the entries indicate connections to other nodes. If the graph is undirected, the adjacency matrix is symmetric, reflecting that the connections are bidirectional.
Can you explain the concept of message passing in GNNs as described in the video?
-Message passing in GNNs involves each node performing a local computation on its neighbors, aggregating the results, and then updating the node's representation. This process is done in parallel for all nodes and can include steps like computing messages for each neighbor, aggregating these messages, and updating the node's state based on the aggregated information.
What is the difference between a graph convolution network and a graph attention network?
-Both graph convolution networks and graph attention networks are types of GNNs, but they differ in how they compute the updates for node representations. Graph convolution networks typically use a summation of neighbor node features followed by a linear transformation, while graph attention networks compute attention scores to weigh the importance of different neighbors before aggregating their features.
How does the video script differentiate between node features and edge features in graph representations?
-The video script mentions that each vertex or node can be represented by a feature vector or embedding vector, which describes attributes of the node. Similarly, edges can also have features in a feature vector, which describe the relationship or bond between two nodes. This distinction allows for a more detailed and comprehensive representation of the graph's structure and properties.
Outlines
🌐 Introduction to Graph Neural Networks
The speaker welcomes viewers to an introductory video on Graph Neural Networks (GNNs), aiming to provide a foundational understanding of GNNs, their functionality, applications, and the types of tasks they can perform. The video is geared towards those familiar with GNNs but lacking in-depth knowledge. The speaker intends to deliver the necessary background for viewers to comprehend and follow detailed coding implementations for specific graph tasks in future videos. The speaker also shares personal experience with GNNs in a startup context, highlighting the current surge in GNN research, likening it to the 'ImageNet moment' in deep learning. The video begins by addressing why graphs are essential, noting that data from various applications like transportation networks, social networks, and biological molecules are naturally graph-structured. The speaker emphasizes that most deep learning methods assume a certain data structure, such as the grid structure of images, and that GNNs aim to extend these methods beyond such assumptions, which is part of the broader field of geometric deep learning.
🔍 Graph Representation and Features
This paragraph delves into the representation of graphs, explaining that a graph consists of vertices (or nodes) and edges. The graph can be undirected, where the edges have no specific direction, or directed, where the flow between nodes is unidirectional. Edges can also have weights, indicating a measure of connection strength, such as the volume of water flow. Each node can be represented by a feature vector or embedding, which could detail attributes like the number of protons, electrons, or neutrons in a molecular graph. The edges can similarly have feature vectors describing the nature of the connection between nodes. Using caffeine as an example, the speaker illustrates how a molecule can be represented as a graph with atoms as nodes and bonds as edges, each with their respective features.
🎯 Common Graph Tasks and Their Importance
The speaker outlines common tasks performed using graphs, focusing on five primary applications. Node classification is highlighted, where the goal is to categorize nodes—such as identifying fraudulent bank users based on their transaction patterns. Graph classification involves assessing the entire graph, like determining if a molecule is toxic. Node clustering aims to group subgraphs into clusters, link prediction seeks to identify missing connections, such as recommending products or movies based on user behavior, and influence maximization focuses on finding the most influential node in a network. The speaker stresses that node classification, graph classification, and link prediction are the most crucial tasks and encourages reframing problems to fit these tasks due to the availability of robust methods to solve them.
📊 Graph Representation Methods and Computation
The discussion shifts to how graphs are represented, with a focus on feature matrices for nodes and adjacency matrices for edges. The speaker mentions the challenges of representing sparse graphs with adjacency matrices due to their potentially large size and numerous empty values. An alternative, the adjacency list, is briefly introduced. The core of the paragraph is an explanation of how GNNs work, starting with a basic understanding of convolution operations adapted for graph structures. Instead of sliding filters as in image processing, GNNs leverage the local neighborhood of each node, aggregating information from neighboring nodes to compute new node representations. The speaker uses a two-layer GNN example to illustrate how information can propagate across the graph, emphasizing the importance of considering node degrees and the shortest path lengths in determining the number of layers needed in a GNN.
🔄 Key Properties of GNN Layers and Message Passing
The speaker introduces two key properties desired in GNN layers: permutation invariance and permutation equivariance. Permutation invariance ensures that the output remains unchanged when the order of nodes is altered, as the underlying graph structure remains the same. Permutation equivariance ensures that any permutation applied to the input is reflected in the output, maintaining the correspondence between input and output orderings. The paragraph then describes the message-passing mechanism fundamental to GNNs, where each node performs local computations, aggregates messages from its neighbors, and updates its representation. The speaker contrasts this with traditional convolutional operations, highlighting the importance of aggregation methods that are invariant to the order of nodes.
🛠 Graph Convolution Networks and Attention Mechanisms
The final paragraph provides a deeper look into Graph Convolution Networks (GCNs), explaining the computation process involving message passing and aggregation using node embeddings. The formula for GCNs is discussed, detailing how node features are aggregated and combined with a learned transformation matrix. The speaker also touches on Graph Attention Networks (GATs), which introduce attention mechanisms to weigh the importance of different nodes in the neighborhood, allowing the model to focus on more relevant features. The video concludes with a summary of the key concepts covered and an anticipation for future videos that will delve into coding implementations using PyTorch Geometric.
Mindmap
Keywords
💡Graph Neural Networks (GNN)
💡Node Classification
💡Graph Classification
💡Link Prediction
💡Adjacency Matrix
💡Feature Vector
💡Permutation Invariance
💡Permutation Equivariance
💡Message Passing
💡Graph Convolution Network (GCN)
💡Graph Attention Network (GAT)
Highlights
Introduction to Graph Neural Networks (GNNs) and their significance in deep learning.
GNNs are used for tasks where data is naturally described as graphs, such as social networks and molecular structures.
GNNs aim to go beyond the Euclidean geometry assumption of traditional deep learning methods.
Graphs are defined by vertices (nodes) and edges, which can be directed or undirected, and may have weights.
Nodes and edges in a graph can have feature vectors or embeddings that describe their properties.
Examples of graph representations include feature matrices and adjacency matrices, with the latter showing node connections.
Graph tasks such as node classification, graph classification, and link prediction are common applications of GNNs.
Node classification involves categorizing nodes within a graph, like identifying fraudulent bank users.
Graph classification is used to categorize entire graphs, such as determining if a molecule is toxic.
Link prediction aims to identify missing connections within a graph, useful for recommender systems.
Influence maximization seeks to find the most influential node in a communication network.
GNNs require key properties like permutation invariance and equivariance to ensure consistent outputs regardless of node ordering.
The message passing mechanism in GNNs allows information to propagate through the graph structure.
Graph Convolutional Networks (GCNs) perform local computations and aggregate messages from neighboring nodes.
Graph Attention Networks (GATs) assign importance scores to nodes, focusing on the most relevant neighbors.
GNNs can incorporate edge features into their computations, expanding their representational capabilities.
The video provides an overview of GNNs, setting the stage for more detailed coding implementations in future videos.
Transcripts
hello and welcome to this introductory
video on graph neural networks and the
idea or the goal of this video is uh to
give you a background overview of you
know why we have this area uh how they
work uh what they're used
for um what kind of tasks we can perform
and so on so we give you a good overview
um I think if you are one of those who
have heard about GNN but you haven't
really looked into it too much so my
goal is to give you that introduction
and then uh give you the necessary
background knowledge so that you can
understand and follow more detailed uh
coding implementations for specific
graph tasks which uh my goal is to do in
the subsequent uh videos um and uh let's
see so I also want to say you know uh I
have been doing some projects now with
GNN at for the startup that I'm working
on uh working with uh the AI
framework um and uh so I thought you
know this would be a good opportunity to
also make this video and hopefully uh
there will be more videos uh moving
moving forward now but with that said
let's let's get started so the first
question you know that arises is why
graphs um
and the reason is really simple in that
you know the data that we collect or
data for particular types of
applications are naturally described as
graphs so examples of that could be in
the top left you know a transportation
network of some kind um maybe that could
be Google Maps U maybe that could be a a
u Subway uh
Network and um another example could be
a social network and that could be you
know Twitter Facebook Instagram I don't
know
Tinder um uh could also be some type of
um other you know they're really many uh
but then we can also have um many applic
many sort of um biom medicine and
biology applications are also naturally
described as graph uh so we could have
for example molecules where we have
atoms and we have bonds and that can be
described as a graph uh but
fundamentally you know if you look at
what we've been doing so far in in deep
learning is uh we we are really assuming
some type of uh data some some type of
structure on the data so for images for
example we're assuming that we have you
know a grid of pixels and that they're
connected in a particular way using the
these XY coordinates and that you know
if if you would Shuffle some of the
pixels the the the image wouldn't be the
same at all um and this um this
assumption on the GE the geometry or the
spatial loc that the data is having some
type of spatial locality is what most of
the methods that we we're using is built
on so you know if we look at the data
like images videos text time series data
and so on all of that assume some type
of spatial locality so this I just
wanted to mention this that um going
beyond the particular ukian geometry is
one of the goals of of graph neural
networks and this field this broader
field of geometric deep
learning all right so now that we've
covered why graphs uh and and you know I
just want to say too that uh I think you
know graphs are really really uh having
their sort of they're the New Frontier
of uh of Deep learning uh there's a lot
of research and active research going on
on graphs and it's kind of having some
some you know its imag net moment so to
speak right now uh and so they're an
exciting thing to learn about but a
quick recap you know just to go back
what is actually a graph so graph is a
set of vertices and
edges and really uh Vertex or node uh
they mean the same thing they they're
both Ed
interchangeably uh and in this
particular graph here we have have four
nodes or four vertices uh and we have
five edges and the graph can either be
uh undirected meaning there's no
particular direction of the of the flow
uh between nodes so that would be the
edge connecting the nodes um or it can
be directed where we say that uh the
flow goes from one particular node to
another node um and uh even more than
that we can also have you know if we
have some water flow for example
Illustrated in this graph uh we could uh
show that the flow uh from one
particular not to another is I don't
know 10 L per hour or something um or 10
gallons uh and then
um you know uh we uh we can specify that
weight on the edge uh describing that so
uh The Edge can also have a weight uh
but even more General than that uh if we
look at sort of the most General way to
Des describe it each Vertex or node
could be represented as a as having uh
some type of feature vector or uh
described as having some embedding uh
some embedding
Vector so the uh the embedding Vector
could be of some size and you know to
give some context maybe let's say we
have a molecule at the atom is uh
containing having some specific number
number of protons uh electrons neutrons
and so on and that this can be described
uh more detailed in an embedding Vector
uh we can also have that the edges uh
can have some type of uh you know some
number of features in a in a feature
Vector in that uh it it um can describe
the particular bond between these two
atoms in some more uh detailed way so
that is just some idea of of you know
what uh what a graph is and what can be
in in the most General sense how we
describe the
graph so an example here could be that
we have a molecule and uh the molecule
in this case is actually caffeine and
then you know the right is the
corresponding graph um and um yeah it's
an example of how you describe the
molecule in this case you know the there
are different atoms and there are
different bonds and maybe those would be
represented as the features for each
respective Edge and node um all right
so now that we know what a graph is uh
what are the the sort of the overview of
what we want to do with the graph uh and
there are some most common tasks uh and
here I've chosen to uh show you five of
them and uh you know the the most common
one perhaps is node classification where
where we have a particular graph uh and
we want to know uh let's say that this
is some Bank user uh who sends
transactions uh you know described by
the edges to to other bank users or
something uh and we want to know you
know is this particular Bank user is he
is he is he fraudulent or not and so we
want to classify each particular node
each uh person or Bank customer perhaps
Bank user into different categories in
that case you know that would be a
binary classification task and so we can
maybe say that the the red one is uh is
a fraudulent one and the blue one is not
so that's a one of the most common ones
uh we can also do entire GLA graph
classification so that could be you know
we have a molecule and we want to know
is this a toxic molecule or is it not a
toxic molecule that would also be a
binary classification task but maybe it
could also be you know what is the
um odor uh what is the class belonging
for this entire graph meaning uh in this
case it could be what is does the
molecule smell like what is the odor of
the graph perhaps uh and then we can
have node clustering we want to identify
particular subgraphs of the graph uh
belonging to to different
clusters uh uh the other one is link
prediction where we have U maybe we have
users buying particular ular products or
users watching particular movies um and
so or or liking particular movie so user
liking a particular movie would be
described by an edge for example and the
users would be some type
of the I guess the users and okay let's
not get into too much detail on that
let's just ex imagine that example users
buying particular product what we want
to do is uh we want to do a a link
prediction uh where we want to predict
the missing edges meaning we want to uh
give you know that these items might
this user might like these items and
that we should recommend them so even
now when I said recommend you can
imagine this is very uh prevalent in
recommender systems uh and so it's a
very important graph task another one is
influence maximization where we want to
find a particular node that is um having
the most influence on a some kind of
communication Network so maybe the
question here would be if we want to
remove one particular node which is the
node to remove that influences the
communication the
most um but the three most actually the
like the three most important ones to
keep in mind is this node classification
graph classification and Link prediction
so these three are the absolute uh main
ones and in many scenarios if you are
describing you know if you realize that
you're having this problem that you're
trying to solve is uh the it's described
as a graph um what you want to do even
if it's not immediately apparent you
want to uh modify your your problem in
such a way that it can be described or
it can be solved using one of those
three tasks and uh meaning node
classification graph classification or
link prediction so if you can reduce it
or sort of someone somehow rephrase the
problem such as it can be solved then
that's a really good way because we have
strong uh stable solid methods to solve
those those types of
problems so another thing is uh you know
how is the graph represented uh
and how the graph is represented I just
want to give you some overview it can be
represented in different ways but the
most common one is that we have some
feature Matrix uh this one uh where we
have n nodes and we have in this case
that each node is represented by four
dimensional um Vector uh and um the then
we have an adjacency Matrix uh which is
descri which is
describing uh which nodes are connected
to what other nodes so here for example
the first row corresponds to what nodes
are the first node connected to so that
will be here number two number four as
we can see number two and number four
and so on and there's more as well uh
and we can also see that in this case
The adjacency Matrix is uh is uh
symmetric so if we transpose this we get
the same Matrix uh that means that the
graph is undirected if it would be
directed there would this property would
not hold um but that's just an idea of
how the graph is represented uh just to
give an idea here if we have a very
sparse graph then this is obviously not
the ideal way to to um to sort of store
the gra
because the adjacency Matrix would be
have a lot of zeros or be empty uh and
another way to describe the graph
instead is using an adjacency
list but I don't want to get into too
much detail on that you can keep this
idea in mind that this is how the graph
is is
represented so now we get to um maybe
the most important thing I don't know
which is that uh you know how do they
actually work and uh I'm not going to go
into like the details details of how
each respective architecture of GNN is
worked I just want to give you a a
strong intuition and we'll look at some
of the main uh variants uh later on but
first of all let's let's look at a
convolution so a convolution um as
you're probably as I'm assuming you're
familiar with um works by sliding a a
filter or a kernel uh with particular
weights and it does a DOT product with
the underlying
uh pixel values for that for where the
filter is located in the image and then
it does uh it gets a value and that's
sort of the next output from that
convolution uh for graphs you know we
don't slide anything uh because we don't
have that type of structure in the data
but what we do have is that we have some
local neighborhood to each node so for
example uh if we want if we're looking
at this node um and I'm assuming you can
see the mouse point winner here but if
we're looking at node three on the left
then uh what we can
do uh is we can look at the neighbors uh
close to that and that would be each
node except for the first node um and
then we do some computation uh between
those uh using the information we have
about about those nodes and that will
get us some result so the idea here is
that we use local neighborhood to that
particular node and we do this to all
nodes in parallel and get some type of
result from
that
so now I want to give you some intuition
as to how information propagates by
using this mechanism so if we look at at
the computational graph and let's say
that we we're look at a two- layer GNN
and we're looking at the vertex uh C
here um we can see that the the ones
Associated the ones connected to the to
the green node or the uh node C is uh
first of all uh a b e and
f so um I will I will describe this
there's some things to unpack here but
first of all if we're looking at just
the first computation using that local
neighborhood in the first layer it will
look at those four nodes um and if we
then look at uh for two-layered um we we
know that the computation for the the
nodes connected to that will also use
the local neighborhood um so for node a
that would be D B and C right um
so uh we can see that uh information can
really propagate uh throughout the
network or the throughout the graph uh
if we use a if we stack a bunch of
layers and here in this case we're
actually get information from all
nodes because we have that the shortest
path uh from C uh is is two so then we
only need a two layer GNN but if we have
a a sort of if we don't if we have more
nodes uh sort of as n nodes um that's
bad description if the shortest path is
is longer then we would have to have
more layers in our GNN um so that's one
way that that it can propagate and then
I want to show you also some some of the
computation here uh the idea and we'll
get into this more as well is that first
the there's some local uh computation
done on the Node itself and then these
are aggregated and put together uh in
some way you don't have to focus on that
too much all I wanted to show you is
that information can propagate and this
is how the computational graph can look
like for a two- layer GNN uh we can also
have um you know if we're looking at the
d uh vert X then we can see that the
computational graph looks very
different uh because of the uh the no
degree uh is very different for node D
and node C then the the computational
graphs becomes much narrower than the it
is
wide all right so now you have I hope
you have some idea of the how
computation and information can flow uh
using this idea of local neighborhood
computation now let's get to what we
want from a GNN layer uh what is it that
we want what are the Key Properties or
the key ideas that we want a GNN layer
to M to
withhold um or be able to withhold um
because this difference on the the GNN
layer that we use uh the first one is
permutation invariance uh permutation
invariance means that if we
take the uh feature Matrix and we take
the adjacency Matrix um and we you know
we choose to have particular numbers of
the nodes for example here we have n
node one uh on the graph to the right
and then we have node two that's an
arbitrary ordering right but it will
matter uh because the feature Matrix
will have the first node as the first
row and
correspondingly uh in The adjacency
Matrix uh but really the ordering is
arbitrary because we we chose it we can
just as well uh swap the what we call
node one and node two and it will still
be the exact same graph uh even though
the ordering is changed so permutation
invariance says that if we apply some
function that takes all information from
all the the nodes and the
edges uh and we get a single output uh
then that should not matter uh as to how
we chose the ordering of the
nodes so
um you know to the left here we have
that P is a
permutation don't if this confuses you
don't Focus too much on it all it means
is that if we change what we call number
one and N node number seven if we just
swap those two then uh the output should
be the exact same because the underlying
graph is the exact same uh even though
the ordering of what we call node one
and node 7 is is changed so that that's
one key idea and a key property that we
want some GNN layers to be able to have
permutation
invariance another uh key idea is
permutation equivariance which is a
little bit more subtle and um is uh a
little bit less perhaps
strict uh so uh let me explain what it
is so we we have some graph on the top
left here where we have uh chosen to to
draw the graph in particular way uh and
on the bottom we have the exact same
graph
um so when the graphs are structurally
identical we call that they are
isomorphic so these two graphs are
isomorphic uh and if you look at them
for a little bit uh you'll see that they
are in fact you know exactly the same uh
graph and so when we choose to describe
the adjacency Matrix and the node uh the
feature Matrix uh then we will choose
some ordering for Blue uh orange yellow
and gray we apply some some GNN layer
and we get some output labels so in this
case we're doing node classification we
get that blue the blue node is
corresponding to a cat uh orange is a
person and so on now if we instead you
know choose to draw the same graph but
because of the way we drew it is
different we're also choosing a
different ordering um so basically all
we did is we now permuted the rows um
exact same graph though all we did is
change the ordering uh when we apply F
we still get the same output but the
outputs are permuted um so we we see now
that the the exact same labels the
output labels are identical but they're
not in the exact same ordering um and
how um permutation equivariance is is
that the permutation that we appli to
the input should also be the same that
is applied to the output um and so um
you know more mathematically maybe I
should have written the formula for this
is that uh the function applied to the
permutation of the input is the same as
the permutation applied to the output um
so this is that's what it means here um
you can see that I guess Sigma here
would be the permutation the difference
is per in the permutation in the output
is the difference in the permutation on
the output uh input and output has the
same
permutation okay so that's two key ideas
two Key Properties we want to have and a
GNN is basically a collection of GNN
layers where we want you know each GNN
layer to have some type of this property
so for example we could have you know we
could have GNN layer one is permutation
equivariant equivariant for Layer Two
equivariant for layer three and then we
have a permutation invariant for layer
four that's just an
example so um now we get you know the
computation that we want to do for G
follows this uh this idea of message
passing so we go back to the to a
similar graph that we looked at
previously so we know uh that you know
we have this computational graph now and
we have that each node connected to it
uh is performed um on some local
computation um where we are trying to
compute what is the message that we want
to send to this uh this output node here
and remember this is done for each node
in parallel uh but we're doing some type
of message then uh it's very important
that we a we aggregate this to the The
Ordering of the notes don't matter uh
with the aggregation for example could
be a a summation or a
mean um but we we put together the
messages in some way and then we pass it
uh and we update the the node that we're
interested in so this is the a general
form and no matter what type of uh GNN
architecture you look at um and I'll
just you know uh give some example maybe
we have graph convolution network uh um
we have a graph attention Network or we
have graph Sage um you know don't matter
if don't care if you don't recognize
those names but really all that's
different between those is a a
difference in either the message and
slash or the aggregation function so uh
now I want to you know look at how does
a graph convolution Network look like uh
what is the the formula for the
computation that's
performed and um so what we do here um
is uh looking at the top right image
here we have that some message is
performed um and so the message uh
that's performed is uh is I'm really
hoping you can see the mouse cursor now
uh but is the the one to the with the
summation here uh where we we here we
sum each of the connected node
embeddings and we divide by the node
degree um and the reason why we do this
is just to normalize because
otherwise degrees nodes that have larger
node degrees would just blow up in the
value uh but if we normalize then we
will have similar distribution for all
nodes so uh you know we we perform some
message and that's done locally to the
node uh now this looks a little bit
confusing perhaps because we're doing a
summation first and then we're doing a a
matrix multiply with WK uh but remember
it's just a linear linear operator so um
the the reason why we do the sum first
and then the Matrix multiply is because
it's we can take out the linear uh
operator um but if you choose to choose
to view it more as the computation done
in the top right then uh you can move
the the uh the the weight inside and
then take a summation so the summation
here is a is the aggregation and the W
is the message that's
performed uh now one uh thing here is
that you know one of the most important
nodes to to the output of that
particular node that we're looking at in
this case this uh top node uh is
obviously the node embedding of that
node itself so we comp we add an
additional computation to the right here
with BK times the node embedding of that
particular node uh
V and then um so we we do that and we we
sum that and then we apply a
nonlinearity to that and then we produce
the output embedding and then K here uh
I'm assuming correspond to the layer
yeah I think it should be the layer of
the of the
GNN and so that's a graph convolution
Network and I really hope that you get
the general idea here and that you know
you realize that it's not really that
scary as you might have thought that
they were I don't know that's what I
thought at least when I felt like I
understood how they work but uh if we
now look at a graph attention network
instead uh we uh we do a very similar
type of computation uh it's just that uh
what we're doing now is that we're also
adding some type of uh importance score
to the nodes uh you know previously we
just did a a summation meaning that we
really chose to look at each node uh in
the local neighborhood as having equal
importance but now we um perform this
additional step of of actually Computing
uh through multi-headed attention uh you
know what nodes we should pay most
attention
to um and I don't want to get into
exactly how that computation is done I
might actually do that in future videos
if that would be important if that would
be of interest um um but I know also
that uh uh there have been people that
have implemented these from scratch um
AI Epiphany has done I think a video
where he did this from
scratch um so
anyways um let me know if you want to
see videos on that where we can look at
the paper and understand the actual
details of it but now for now just
understand that the the idea is that we
have these attention scores
computed uh where we look at the
importance of each node and then we do a
summation um and uh let's see if there's
anything else I want to say on this yeah
so now uh you know we we said here that
h u uh and HV is the node embedding for
node V and node U one thing to keep in
mind is that as I said in the beginning
we can also have Edge features uh this
is not described here in the formula but
it would be one additional uh step
towards a more General
computation uh and it would not look too
different it would just include the edge
feeders as part of the uh the
computation all right so that was a
little bit longer than I hoped the video
would actually be um but at least I hope
you now have an idea of gnn's and a good
overview uh that I think we can build on
in uh in subsequent future videos I will
make uh where we go into the details of
of coding stuff using Pyro's geometric
so very excited for that and thank you
so much for watching the video and I
hope to see you in the next one
[Music]
浏览更多相关视频
Stanford CS224W: Machine Learning with Graphs | 2021 | Lecture 1.1 - Why Graphs
SNA Chapter 1 Lecture 3
PointNet | Lecture 43 (Part 1) | Applied Deep Learning
#1 Struktur Data Tree (Pohon) & Graph (Graf) - Berpikir Komputasional Kelas 9 | Informatika Fase D
Taxonomy of Neural Network
What is Recurrent Neural Network (RNN)? Deep Learning Tutorial 33 (Tensorflow, Keras & Python)
5.0 / 5 (0 votes)