Stanford CS224W: ML with Graphs | 2021 | Lecture 6.1 - Introduction to Graph Neural Networks

Stanford Online
29 Apr 202110:31

Summary

TLDRThis lecture introduces deep learning for graphs, focusing on graph neural networks (GNNs) and their applications. The professor reviews previous topics like node embeddings, where similar nodes in a graph are mapped close together in an embedding space. Shallow encoders, such as DeepWalk, have limitations like high computational cost and inability to handle unseen nodes. GNNs address these limitations by using deep, nonlinear transformations to create embeddings. They can handle more complex data types, such as dynamic graphs with multimodal features, enabling applications like node classification, link prediction, and graph similarity.

Takeaways

  • 🧠 Introduction to deep learning for graphs, focusing on graph neural networks as a key topic of the course.
  • 🖇️ Previously discussed node embeddings aimed to map nodes to a d-dimensional space where similar nodes are close.
  • 📊 Node embeddings were encoded in an encoder-decoder framework to match node similarity in the graph and embedding space.
  • 📝 Shallow encoding approaches, such as DeepWalk, directly learn an embedding matrix but have limitations like high parameter costs and lack of node features.
  • ⚠️ Shallow methods are transductive, meaning they can't generate embeddings for unseen nodes or transfer embeddings across graphs.
  • 💡 Deep graph encoders (graph neural networks) use multiple layers of nonlinear transformations for encoding, enabling better generalization and learning.
  • 🌐 Graph neural networks can be trained for tasks like node classification, link prediction, and clustering, with an end-to-end approach.
  • 🤖 Modern deep learning excels at fixed-size data types like images or sequences, but graph neural networks extend this to more complex, arbitrary graph structures.
  • 🔄 Graphs have unique challenges, such as arbitrary size, complex topological structure, and no fixed ordering of nodes, making them different from grid-based data like images.
  • 🌟 Graph neural networks enable representation learning for more complex data types, making them versatile for domains where graph structures are essential.

Q & A

  • What is the main topic of the lecture?

    -The main topic of the lecture is deep learning for graphs, specifically focusing on graph neural networks (GNNs) and their application to graph-structured data.

  • What are node embeddings, and why are they important in graph neural networks?

    -Node embeddings are d-dimensional representations of nodes in a graph, where the goal is to map similar nodes in the graph to similar positions in the embedding space. They are important because they enable the application of machine learning models to graph data by encoding node relationships and properties in a compact, meaningful way.

  • What is the limitation of shallow encoders like DeepWalk and Node2Vec?

    -The main limitations of shallow encoders like DeepWalk and Node2Vec include high memory and computation costs, lack of parameter sharing between nodes, and their transductive nature, meaning they can only make predictions for nodes seen during training. They also do not incorporate node features, which limits their ability to leverage additional information.

  • How do deep graph encoders differ from shallow encoders?

    -Deep graph encoders, like graph neural networks, apply multiple layers of nonlinear transformations based on the graph structure, allowing for more expressive and generalizable representations. They can also incorporate node and edge features, which shallow encoders cannot.

  • What are some tasks that graph neural networks (GNNs) can be applied to?

    -Graph neural networks can be applied to various tasks, including node classification, link prediction, clustering, community detection, and measuring similarity or compatibility between different graphs or sub-networks.

  • What makes graphs a challenging data structure for deep learning?

    -Graphs are challenging for deep learning because they have arbitrary size and complex topological structures. They lack spatial locality, fixed reference points, or defined node ordering, unlike grids or sequences. Additionally, graphs can be dynamic and contain multimodal features on nodes and edges.

  • What is the significance of non-linear transformations in deep graph encoders?

    -Non-linear transformations in deep graph encoders allow for capturing more complex relationships and dependencies between nodes in a graph. They enable the model to progressively refine the embeddings through multiple layers, leading to richer, more expressive representations.

  • Why is transductive learning a limitation in shallow encoding methods?

    -Transductive learning is a limitation in shallow encoding methods because it restricts the model to making predictions only for nodes that were present during training. It cannot generalize to unseen nodes or transfer knowledge between different graphs.

  • How do graph neural networks handle node features compared to shallow encoders?

    -Graph neural networks can directly incorporate node and edge features into the learning process, which helps them make more informed predictions. In contrast, shallow encoders like DeepWalk and Node2Vec do not use node features and only rely on the graph structure.

  • What are some examples of simple data types that traditional deep learning excels at processing?

    -Traditional deep learning excels at processing simple data types like fixed-size grids (e.g., images) and linear sequences (e.g., text or speech). These data types have well-defined spatial or sequential structures, making them easier to handle with standard deep learning techniques.

Outlines

00:00

📊 Introduction to Graph Neural Networks and Node Embeddings

In this section, the speaker introduces a new topic: deep learning for graphs, with a focus on graph neural networks (GNNs). This marks an important transition in the course. The speaker explains how previous discussions have centered on node embeddings, where the goal was to map graph nodes into d-dimensional embeddings, ensuring that similar nodes are close in the embedding space. The idea was to learn a function that encodes graph nodes into vectors and measures node similarity through techniques like cosine distance or dot products. The speaker emphasizes that the existing node embedding techniques discussed so far—referred to as shallow encoding—are limited in terms of scalability and parameter efficiency, especially for large graphs. Each node requires unique parameters, leading to inefficiencies, and shallow encoding models, like DeepWalk, do not allow for generalization to unseen nodes or graphs.

05:00

🧠 Limitations of Shallow Encoders and Introduction to Deep Graph Encoders

The speaker addresses key limitations of shallow encoders, such as DeepWalk and node2vec, which do not incorporate node features and are computationally expensive due to the need for learning individual embeddings for each node. The speaker introduces the concept of deep graph encoders, particularly Graph Neural Networks (GNNs), as a more efficient alternative. These deep encoders use multiple layers of nonlinear transformations based on the graph structure, allowing for more complex learning and incorporating node and edge features. Deep encoders can be trained for various tasks, including node classification, link prediction, and clustering, offering flexibility and efficiency across a range of graph-based tasks.

10:03

🔄 The Challenge of Applying Deep Learning to Graphs

This paragraph discusses the challenges of applying traditional deep learning to graphs. While current deep learning methods excel with simple data types like fixed-size matrices (used for images) or linear sequences (used for text or speech), graphs present more complexity. Graphs do not have fixed sizes, clear topological structures, or reference points like grids or sequences. This complexity makes it difficult to apply deep learning techniques that were originally designed for simpler, more structured data. Graphs are dynamic, multi-modal, and vary in size and topology, necessitating new approaches, such as Graph Neural Networks (GNNs), for effective representation learning.

Mindmap

Keywords

💡Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are a type of deep learning model specifically designed to process and analyze graph-structured data. In the video, GNNs are introduced as a central topic, offering a way to perform deep encoding on graphs. Unlike shallow encoders, GNNs utilize multi-layer transformations based on graph structures to capture complex relationships between nodes, which is important for tasks like node classification and link prediction.

💡Node Embeddings

Node embeddings are low-dimensional vector representations of nodes in a graph that preserve the graph’s structure and the relationships between nodes. The goal, as explained in the video, is to position similar nodes closer together in the embedding space. This concept plays a foundational role in understanding how GNNs transform graph data into useful embeddings for downstream tasks.

💡Shallow Encoding

Shallow encoding refers to a simpler approach to node embeddings where each node’s representation is learned directly, without involving complex transformations. The video contrasts shallow encoding methods, like DeepWalk, with GNNs. Shallow encoders are noted for being computationally expensive and limited in generalization since they do not share parameters across nodes or handle unseen nodes.

💡Similarity Function

A similarity function is used to define how relationships between nodes in the graph map to relationships in the embedding space. In the video, it’s explained that the goal is to ensure that similar nodes in the graph are similarly represented in the embedding space. Examples of similarity measures mentioned include cosine distance and dot products, which are crucial in determining how well the embeddings capture the structure of the graph.

💡Transductive Learning

Transductive learning is a type of machine learning where the model is only capable of making predictions on the examples it has seen during training. In the context of the video, shallow encoders are described as transductive because they cannot generate embeddings for unseen nodes, limiting their application to new graphs or nodes that weren’t part of the training set.

💡Non-linear Transformations

Non-linear transformations refer to the process of applying multiple layers of transformations in a neural network to capture complex patterns and relationships in data. In GNNs, these transformations are applied to the graph structure, enabling deeper and more expressive node embeddings. The video emphasizes that GNNs use non-linear transformations to build sophisticated representations of nodes.

💡Node Classification

Node classification is a task where the goal is to predict a label or category for each node in a graph based on its features and connections. The video mentions that GNNs can be trained end-to-end for node classification, utilizing graph structures and node features to improve accuracy in classifying nodes across various types of graphs.

💡Link Prediction

Link prediction is the task of predicting whether a link (or edge) exists between two nodes in a graph. In the video, link prediction is highlighted as one of the applications of GNNs. By leveraging the graph’s structure and node embeddings, GNNs can predict new or missing links, making them useful for tasks such as recommending friends on social networks.

💡Parameter Sharing

Parameter sharing refers to the idea that a model can reuse the same parameters across different parts of the data to reduce computational complexity. In the video, shallow encoders are criticized for lacking parameter sharing, meaning each node has to learn its own set of parameters. GNNs, however, allow for parameter sharing across nodes, making them more scalable to large graphs.

💡Dynamic Graphs

Dynamic graphs refer to graphs that evolve over time, with changes in node properties, edges, or topology. The video mentions that graphs often have dynamic and multi-modal features, which adds complexity to learning from them. GNNs are positioned as a powerful tool to handle such dynamic graphs, as they can continuously update node embeddings and predictions as the graph changes.

Highlights

Introduction to deep learning for graphs, focusing on graph neural networks as a central topic.

Node embeddings are used to map nodes in a graph into d-dimensional spaces, where similar nodes are embedded close together.

The concept of encoder-decoder framework: encoding the nodes of a graph and decoding their similarity.

Shallow encoding techniques like deep walk and node2vec are limited by their inability to handle large graphs and require large numbers of parameters.

Shallow encoders suffer from being transductive, meaning they cannot generate embeddings for unseen nodes during training.

A major drawback of shallow encoding is that it doesn’t incorporate node features, limiting its ability to capture node properties.

Deep graph encoders, such as graph neural networks, introduce multi-layer non-linear transformations for node embeddings based on the graph structure.

Deep neural networks are capable of handling tasks such as node classification, link prediction, clustering, and graph comparison.

Graph neural networks allow for end-to-end training, connecting graph structures to task-specific labels.

Classical deep learning tools are well-suited for simple data types like fixed-size matrices and linear sequences, but struggle with complex data types like graphs.

Graph neural networks provide a way to extend deep learning to complex data types with arbitrary size and complex topological structures.

Unlike images and text, graphs have no fixed reference point, no spatial locality, and no ordering for their nodes.

Graphs are often dynamic and multimodal, with features assigned to both nodes and edges, creating additional complexity in representation.

Graph neural networks expand the scope of deep learning, applying it to areas that require more than fixed-size matrices or linear sequences.

Graphical representations are crucial for numerous domains where traditional deep learning approaches are insufficient.

Transcripts

play00:04

Great. So, uh, welcome back to the class.

play00:07

Uh, we're starting a very exciting, uh,

play00:10

new topic, uh, for which we have been building up,

play00:13

uh, from the beginning of the course.

play00:15

So today, we are going to talk about, uh,

play00:17

deep learning for graphs and in particular,

play00:19

uh, techniques around graph neural networks.

play00:22

And this should be one of the most central topics of the- of the class.

play00:26

And we are going to spend the next, uh, two weeks, uh,

play00:29

discussing this and going, uh,

play00:31

deeper, uh, into this exciting topic.

play00:34

So the way we think of this is the following: so far,

play00:37

we have been talking about node embeddings,

play00:39

where our intuition was to map nodes of

play00:42

the graph into d-dimensional embeddings such that,

play00:46

uh, nodes similar in the graph are embedded close together.

play00:50

And our goal was to learn this, uh,

play00:52

function f that takes in a graph and gives us the positions,

play00:57

the embeddings of individual nodes.

play01:00

And the way we thought about this is,

play01:03

we thought about this in this encoder-decoder framework where we said

play01:07

that we want the similarity of nodes in the network,

play01:12

uh, denoted as here to be similar or to

play01:15

match the similarity of the nodes in the embedding space.

play01:19

And here, by similarity,

play01:21

we could measure distance.

play01:22

Uh, there are many types of distances you can- you can quantify.

play01:26

One type of a distance that we were interested in was cosine distance, a dot products.

play01:30

So we say that, you know,

play01:32

the dot product of the embeddings of the nodes

play01:34

has to match the notion of similarity of them in the network.

play01:38

So the goal was, that given an input network,

play01:40

I want to encode the nodes by computing

play01:43

their embeddings such that if nodes are similar in the network,

play01:47

they are similar in the embedding space as well.

play01:50

So their similarity, um, is higher.

play01:52

And of course, when we talked about this,

play01:55

we were defining about what does it mean to be

play01:57

similar and what does it mean, uh, to encode?

play02:00

And, ah, as I mentioned,

play02:02

there are two key components in

play02:04

this node embedding framework that we talked to- in- in the class so far.

play02:09

So our goal was to make each node to a low-dimensional vector.

play02:13

And, um, we wanted to encode the embedding of a node in this vector, uh, z_v.

play02:19

And this is a d-dimensional embedding.

play02:23

So, you know, a vector of d numbers.

play02:25

And, uh, we also needed to specify the similarity function that specifies how

play02:30

the relationships in the vector space

play02:33

mapped to the relationships in the original network, right?

play02:36

So we said, um, you know,

play02:37

this is the similarity defined in terms of the network.

play02:40

This is the similarity defined, uh,

play02:42

in terms of the, um, embedding space,

play02:46

and this similarity computation in the embedding space,

play02:49

we can call this a dot,

play02:51

uh, uh, we can call this a decoder.

play02:52

So our goal was to encode the coordinate so that when we decode them,

play02:56

they decode the similarity in the network.

play03:00

Um, so far, we talked about what is called shallow encoding,

play03:04

which is the simplest approach to, uh,

play03:06

learning that encoder which is that basically encoder is just an embedding-look- look-up,

play03:11

which means we said we are going to learn this matrix z,

play03:15

where every node will have our column,

play03:17

uh, reserved in this matrix.

play03:19

And this- the way we are going to decode the- the embedding of a given node will simply

play03:24

be to basically look at the appropriate column of

play03:27

this matrix and say here is when it's embedding the store.

play03:30

So this is what we mean by shallow because basically,

play03:33

you are just memorizing the embedding of every node.

play03:36

We are directly learning,

play03:38

directly determining the embedding of every node.

play03:41

Um, so what are some of the limitations of the approaches?

play03:45

Um, like deep walk or nodes to end that we have talked about, uh,

play03:49

so far that apply this, uh,

play03:51

shallow approach to- to learning node embeddings.

play03:55

First is that this is extremely expensive in

play03:58

terms of the number of parameters that are needed, right?

play04:01

The number of parameters that the model has,

play04:03

the number of variables it has to

play04:05

learn is proportional to the number of nodes in the network.

play04:08

It is basically d times number of nodes.

play04:11

This- the reason for this being is that for every node we have

play04:14

to determine or learn d-parameters,

play04:17

d-values that determine its embedding.

play04:19

So, um, this means that, uh,

play04:21

for huge graphs the parameter space will be- will be giant.

play04:25

Um, there is no parameter sharing, uh, between the nodes.

play04:29

Um, in some sense, every node has to determine its own unique embedding.

play04:33

So there is a lot of computation that we need to do.

play04:36

Then, um, this is what is called, uh, transductive.

play04:40

Um, this means that in transductive learning,

play04:43

we can only make predictions over the examples that we have actually seen,

play04:47

uh, during the training phase.

play04:49

So this means, in this case,

play04:50

if you cannot generate an embedding for

play04:52

a node that- for a node that was not seen during training.

play04:56

We cannot transfer embeddings for one graph to another because for every graph,

play05:00

for every node, we have to directly learn- learn that embedding in the training space.

play05:05

And then, um, another important,

play05:08

uh, uh, drawback of this shallow encoding- encoders,

play05:11

as I said, like a deep walk or, uh,

play05:13

node correct, is that they do not incorporate node features, right?

play05:16

Many graphs have features,

play05:18

properties attached to the nodes of the network.

play05:21

Um, and these approaches do not naturally, uh, leverage them.

play05:25

So today, we are going to talk about deep graph encoders.

play05:30

So we are going to discuss graph neural networks that are examples of deep, um,

play05:36

graph encoders where the idea is that this encoding of, uh, er, er,

play05:41

of an embedding of our node v is

play05:44

a multi-layers of nonlinear transformations based on the graph structure.

play05:48

So basically now, we'll really think about

play05:50

deep neural networks and how they are transforming

play05:53

information through multiple layers of

play05:56

nonlinear transforms to come up with the final embedding.

play06:00

And important to note that all these deep encoders

play06:04

can be combined also with node similarity functions in Lecture 3.

play06:07

So we could say I want to learn a deep encoder that encodes the similarity function,

play06:13

uh, for example, the random walk similarity function that we used,

play06:16

uh, in the previous lectures.

play06:17

Or in this case,

play06:19

we can actually directly learn the- the-

play06:21

the encoder so that it is able to decode the node labels,

play06:26

which means we can actually directly train these models for, uh,

play06:30

node classification or any kind of,

play06:32

uh, graph prediction task.

play06:35

So intuitively, what we would like to do,

play06:38

or what we are going to do is we are going to develop

play06:40

deep neural networks that on the left will- on the input will take graph,

play06:45

uh, as a structure together with, uh, properties,

play06:48

features of nodes, uh,

play06:50

and potentially of edges.

play06:52

We will send this to the multiple layers of

play06:54

non-linear transformations in the network so that at the end,

play06:58

in the output, we get, for example,

play07:00

node embeddings, we also embed entire sub-graphs.

play07:04

We can embed the pairs of nodes and make various kinds of predictions, uh, in the end.

play07:10

And the good thing would be that we are able to train this in an end-to-end fashion.

play07:14

So from the labels, uh,

play07:16

on the right all the way down to the,

play07:19

uh, graph structure, uh, on the left.

play07:21

Um, and this would be in some sense,

play07:24

task, uh, agnostic or it will be applicable to many different tasks.

play07:28

We'll be able to do, uh, node classification,

play07:31

we'll be able to do, uh, link prediction,

play07:33

we'll be able to do any kind of clustering community detection,

play07:37

as well as to measure similarity, um, or,

play07:40

um, compatibility between different graphs or different sub-networks.

play07:44

So, uh, this, uh, will really, uh,

play07:46

allow us to be applied to any of these,

play07:49

uh, different, uh, tasks.

play07:51

So why is this interesting or why is it hard or why is it different from,

play07:57

let's say, classical, uh,

play07:58

machine learning or classical deep learning?

play08:00

If you think about the classical deep learning toolbox,

play08:03

it is designed for simple data types.

play08:06

Essentially, what tra- traditional or current toolbox is really good at is,

play08:11

we know how to process fixed-size matrices, right?

play08:15

So basically, I can resize every image and I represent it as

play08:17

a fixed-size matrix or as a fixed-size grid graph.

play08:20

And I can also take,

play08:22

for example, texts, speech,

play08:23

and represent- think of it basically as a linear sequence,

play08:26

as a chain graph.

play08:28

And we know how to process that, uh, really, really well.

play08:30

So kind of the claim, uh,

play08:32

and motivation for this is that

play08:34

modern deep learning toolbox is designed for simple data types,

play08:38

meaning sequences, uh, linear sequences,

play08:40

and, uh, fixed-size grids.

play08:42

Um, and of course the question is,

play08:44

how can we generalize that?

play08:45

How can we apply deep learning representation learning to more complex data types?

play08:51

And this is where graph neural networks come into play because they allow

play08:57

us to apply representation learning to

play08:59

much more complex data types than just the span of two very simple,

play09:03

uh, uh, data types,

play09:05

meaning the fixed size grids,

play09:06

uh, and linear sequences.

play09:08

And why is this hard?

play09:10

Why this non-trivial to do?

play09:12

It's because networks have a lot of complexity, right?

play09:15

They have arbitrary size and they have complex topological structure, right?

play09:20

There is no spatial locality like in grids.

play09:23

Uh, this means there is also no reference point or at no fixed orderings on the nodes,

play09:29

which means that in a graph there is no top left or bottom right as there is in a grid,

play09:35

or you know there is no left and right,

play09:37

as you can define in a text because it's a sequence.

play09:40

In a graph, there is no notion of a reference point.

play09:42

There is no notion of direction.

play09:44

Um, and more interestingly, right,

play09:46

these graphs are often dynamic and have multiple,

play09:49

um multi-modal features assigned to the nodes and edges of them.

play09:54

So this becomes, uh,

play09:55

very- very interesting because it really, um, expands, uh,

play10:00

ways in which we can describe the data and in which we

play10:02

can represent the underlying domain in underlying data.

play10:06

Because not everything can be represented as

play10:09

a ma- fixed size matrix or as a linear sequence.

play10:13

And there are- as I showed in the beginning,

play10:15

right, in the first lecture,

play10:17

there is a lot of domains,

play10:18

a lot of use cases where, um,

play10:21

proper graphical representation is,

play10:24

uh, very, very important.

Rate This

5.0 / 5 (0 votes)

関連タグ
Deep LearningGraph Neural NetworksNode EmbeddingsData ScienceMachine LearningAI TechniquesNeural NetworksGraph TheoryComplex DataNode Classification
英語で要約が必要ですか?