Stanford CS224W: ML with Graphs | 2021 | Lecture 2.3 - Traditional Feature-based Methods: Graph

Stanford Online
15 Apr 202120:10

Summary

TLDRThis lecture introduces graph-level features and kernels for predicting entire graphs' structures. It explains the concept of kernels that measure similarity between graphs without needing an explicit feature vector. The lecture discusses graphlet and Weisfeiler-Lehman kernels, which represent graphs as 'bags' of graphlets or colors, respectively. The Weisfeiler-Lehman kernel, in particular, is highlighted for its efficiency and effectiveness in graph similarity tasks, using a color refinement algorithm to enrich node features based on neighborhood information.

Takeaways

  • 📊 **Graph-Level Features**: The lecture focuses on features that characterize the entire structure of a graph, not just individual nodes or edges.
  • 🔍 **Kernel Methods**: Kernel methods are introduced as a technique to make predictions for entire graphs by designing a kernel rather than a feature vector.
  • 🤖 **Graph Kernels**: A graph kernel measures the similarity between two graphs and is used in machine learning for graph-level prediction.
  • 📏 **Positive Semi-Definite**: For a kernel to be valid, its kernel matrix must be positive semi-definite, meaning it has positive eigenvalues and is symmetric.
  • 🔑 **Feature Representation (Phi)**: Kernels imply the existence of a feature representation such that the kernel between two graphs is the dot product of their feature vectors.
  • 🛍️ **Bag-of-Words Model**: Graphs can be represented as a 'bag-of-words' model where nodes are considered as words and their frequencies are counted.
  • 🔄 **Graphlet Kernel**: The graphlet kernel represents a graph as a count of different graphlets (small subgraphs) within the graph.
  • 🚀 **Efficiency of Weisfeiler-Lehman (WL) Kernel**: The WL kernel is highlighted for its efficiency and effectiveness in measuring graph similarity through iterative color refinement.
  • 🚩 **Normalization**: Graphlet kernel features are normalized to account for graph size and density, making them more representative.
  • 🚨 **Limitations**: Counting graphlets is computationally expensive, leading to limitations in scalability for larger graphs.
  • 🌐 **Graph Representation**: The lecture concludes that representing graphs as bags of graphlets or colors is crucial for traditional machine learning on graphs.

Q & A

  • What are graph-level features?

    -Graph-level features are characteristics that describe the overall structure of an entire graph, as opposed to individual nodes or edges.

  • What is the purpose of graph kernels?

    -Graph kernels are designed to measure the similarity between two graphs or the similarity between different data points in general, allowing for predictions for entire graphs.

  • What is the significance of a kernel matrix in graph analysis?

    -A kernel matrix measures the similarity between all pairs of data points or graphs and must be positive semi-definite, meaning it has positive eigenvalues and is symmetric.

  • How does the graphlet kernel represent a graph?

    -The graphlet kernel represents a graph as a count of the number of different graphlets within the graph, which are small subgraphs of a certain size.

  • What is the difference between graphlets in node-level features and graphlet kernels?

    -In node-level features, graphlets do not need to be connected and are not rooted, whereas in graphlet kernels, they are considered as connected subgraphs of a certain size.

  • Why is counting graphlets computationally expensive?

    -Counting graphlets is computationally expensive because it takes time exponential in the size of the graphlet and polynomial in the number of nodes in the graph.

  • What is the Weisfeiler-Lehman graph kernel and how does it work?

    -The Weisfeiler-Lehman graph kernel is an efficient graph feature descriptor that uses a neighborhood structure to iteratively enrich node vocabulary through a color refinement process.

  • How does the Weisfeiler-Lehman kernel handle the similarity between graphs?

    -The Weisfeiler-Lehman kernel counts the number of nodes with a given color after several iterations of color refinement and measures similarity by taking the dot product of these color count vectors for two graphs.

  • What is the computational complexity of the Weisfeiler-Lehman graph kernel?

    -The computational complexity of the Weisfeiler-Lehman graph kernel is linear in the number of edges in the two graphs, making it extremely fast and efficient.

  • How does the bag-of-words representation relate to graph kernels?

    -The bag-of-words representation is analogous to representing a graph as a bag of graphlets or colors, which is used in graph kernels to define a feature vector for a given graph.

  • What are some other graph kernels mentioned in the script?

    -Other graph kernels mentioned include the random-walk kernel and the shortest-path kernel, though they are not discussed in detail within the scope of the lecture.

Outlines

00:00

🌐 Graph-Level Features and Kernels

The paragraph introduces the concept of graph-level features and graph kernels for predicting the properties of entire graphs, as opposed to individual nodes or edges. It explains the need for features that can characterize the structure of a whole graph. The lecturer uses an example graph with two loosely connected parts to illustrate the idea. The paragraph then delves into kernel methods, which are widely used in machine learning for graph-level prediction. Kernels are described as functions that measure the similarity between two graphs, or data points in general, and the importance of the kernel matrix being positive semi-definite and symmetric is emphasized. The concept of a feature representation Phi is introduced, which allows the kernel to be computed as the dot product of the feature vectors of two graphs. The paragraph concludes by mentioning that kernel methods can be used with existing machine learning models like kernel support vector machines for predictions.

05:01

🔑 Kernels and Graph Representation

This paragraph discusses the concept of kernels in more detail, focusing on their use in graph-level tasks. It explains that kernels can be thought of as a 'bag-of-words' type representation for graphs, drawing an analogy with text documents. The paragraph points out the limitations of using simple node counts as features, as it does not capture the structural differences between graphs. It then introduces the idea of using more complex features like node degrees to represent graphs. The concept of the graphlet kernel is introduced, which represents a graph as a count of different graphlets within the graph. The paragraph also explains the normalization of graphlet count vectors to account for graph size and density. The computational expense of counting graphlets is acknowledged as a limitation of the graphlet kernel.

10:02

🔍 Weisfeiler-Lehman Graph Kernel

The paragraph introduces the Weisfeiler-Lehman graph kernel as an efficient alternative to the graphlet kernel. It describes the process of iteratively enriching node vocabulary using neighborhood structures through a color refinement algorithm. The algorithm starts by assigning an initial color to each node and then iteratively aggregates and hashes colors from neighbors to create new colors. This process is likened to a multi-hop generalization of node degrees. The paragraph explains how this method can distinguish between graphs with similar structures but subtle differences. It also highlights the computational efficiency of the Weisfeiler-Lehman kernel, which has a linear time complexity relative to the size of the graph, making it practical for use in machine learning tasks.

15:02

📊 Summary of Traditional Graph Machine Learning

The final paragraph summarizes the lecture's discussion on traditional machine learning approaches to graph data. It reviews the different types of features discussed: node-level features like degree, centrality, and graphlets; edge-level features based on distance and neighborhood overlap; and graph-level features using graph kernels. The paragraph emphasizes the computational efficiency and effectiveness of the Weisfeiler-Lehman kernel for measuring graph similarity. It concludes by thanking the audience and noting that the lecture has covered various scalable and insightful ways to create feature vectors from nodes, links, and entire graphs.

Mindmap

Keywords

💡Graph-level features

Graph-level features refer to the characteristics that describe the overall structure of a graph, rather than individual nodes or edges. In the context of the video, these features are crucial for making predictions about entire graphs. The script discusses how to create descriptors that capture the structure of a graph, such as the presence of two loosely connected parts with many internal edges but only a few between them.

💡Kernel methods

Kernel methods are a family of algorithms used in machine learning that rely on the concept of a kernel function, which measures the similarity between data points. In the video, kernel methods are highlighted as a way to make predictions for entire graphs by designing a kernel function that can capture the similarity between different graphs.

💡Kernel matrix

A kernel matrix is a matrix that measures the similarity between all pairs of data points (or graphs in this case). Each element of the matrix is the result of applying the kernel function to a pair of graphs. The script mentions that for a kernel matrix to be valid, it must be positive semi-definite and symmetric.

💡Graph kernels

Graph kernels are specific types of kernels designed to measure the similarity between graphs. The video discusses different graph kernels, such as the graphlet kernel and the Weisfeiler-Lehman kernel, which are used to compare the structure of entire graphs for tasks like graph classification.

💡Graphlet kernel

The graphlet kernel is a type of graph kernel that represents a graph as a count of different graphlets (small subgraphs) within the graph. The script explains that graphlets in this context are not rooted and do not need to be connected. The kernel is then defined as the dot product of the graphlet count vectors of two graphs.

💡Weisfeiler-Lehman kernel

The Weisfeiler-Lehman (WL) kernel is an efficient graph kernel that uses a color refinement process to iteratively enrich node vocabulary based on neighborhood structure. The script describes how this kernel generalizes node degrees to multi-hop neighborhood information, creating a feature vector that represents the graph.

💡Bag-of-words

The bag-of-words model is a simple way to represent text documents where each word's frequency is counted regardless of its order or context. The video script extends this concept to graphs, suggesting that nodes or graphlets could be treated as 'words' in a 'bag-of-words' type representation of a graph.

💡Feature vector

A feature vector is a numerical representation that captures the characteristics of an object or observation. In the video, feature vectors are used to represent graphs, with the script discussing how to create these vectors through graph kernels and how they can be used for machine learning tasks.

💡Color refinement

Color refinement is a process used in the Weisfeiler-Lehman algorithm where node colors are iteratively updated based on the colors of their neighbors. The script uses this process to create a feature representation of a graph that captures multi-hop neighborhood information.

💡Graph isomorphism

Graph isomorphism is the problem of determining whether two graphs are structurally the same. The script mentions that counting graphlets is computationally expensive because it involves sub-graph isomorphism, which is NP-hard.

💡Machine learning models

Machine learning models are algorithms that learn from data to make predictions or decisions. The video script discusses how once a graph kernel is defined, traditional machine learning models like kernel support vector machines can be applied to make predictions on graph data.

Highlights

Introduction to graph-level features and graph kernels for predicting entire graphs.

Goal is to create features that characterize the structure of an entire graph.

Kernel methods are widely used for graph-level prediction in traditional machine learning.

A kernel measures similarity between graphs or data points.

Kernel matrix is a matrix that measures similarity between all pairs of graphs.

For a kernel to be valid, its kernel matrix must be positive semi-definite.

Kernels have a feature representation such that the kernel value is the dot product of the feature vectors.

Kernels can be computed without explicitly creating the feature representation.

Graph kernels allow the use of off-the-shelf machine learning models for predictions.

Different graph kernels discussed: graphlet kernel and Weisfeiler-Lehman kernel.

Graph kernels provide competitive performance in graph-level tasks.

Key idea behind kernels is to define a feature vector for a given graph.

The graphlet kernel represents a graph as a count of different graphlets in the graph.

Graphlets in the graphlet kernel are not rooted and don't have to be connected.

Graphlet kernel defines the graphlet count vector as the number of instances of a given graphlet.

Graphlet kernel is defined as the dot product between the graphlet count vectors of two graphs.

Normalization of the feature vector representation is common due to different graph sizes.

Graphlet kernel has an important limitation due to the high computational cost of counting graphlets.

Weisfeiler-Lehman graph kernel is introduced as a more efficient alternative.

Weisfeiler-Lehman kernel uses a neighborhood structure to iteratively enrich node vocabulary.

The algorithm generalizes node degrees to multi-hop neighborhood information.

Weisfeiler-Lehman kernel is based on the color refinement algorithm.

The kernel counts the number of nodes with a given color after color refinement.

Weisfeiler-Lehman kernel is computationally efficient with linear time complexity.

Weisfeiler-Lehman kernel is closely related to graph neural networks.

Conclusion of the lecture on traditional machine learning approaches to graphs.

Transcripts

play00:04

So far we discussed node and edge level features,

play00:11

uh, for a prediction in graphs.

play00:13

Now, we are going to discuss graph-level features and

play00:17

graph kernels that are going to- to allow us to make predictions for entire graphs.

play00:23

So the goal is that we want one

play00:26

features that characterize the structure of an entire graph.

play00:30

So for example, if you have a graph like I have here,

play00:33

we can think about just in words,

play00:34

how would we describe the structure of this graph,

play00:37

that it seems it has, kind of,

play00:38

two loosely connected parts that there are quite a few edges, uh,

play00:42

ins- between the nodes in each part and that there is

play00:45

only one edge between these two different parts.

play00:47

So the question is,

play00:48

how do we create the feature description- descriptor that will allow us to characterize,

play00:53

uh, the structure like I just, uh, explained?

play00:56

And the way we are going to do this,

play00:57

is we are going to use kernel methods.

play01:00

And kernel methods are widely used for

play01:02

traditional machine learning in, uh, graph-level prediction.

play01:06

And the idea is to design a kernel rather than a feature vector.

play01:10

So let me tell you what is a kernel and give you a brief introduction.

play01:15

So a kernel between graphs G,

play01:17

and G', uh, returns a real value,

play01:20

and measures a similarity between these two graphs,

play01:23

or in general, measure similarity between different data points.

play01:27

Uh, kernel matrix is then a matrix where

play01:31

simply measures the similarity between all pairs of data points,

play01:35

or all pairs of graphs.

play01:36

And for a kernel to be a valid kern- kernel this ma- eh,

play01:40

kernel matrix, uh, has to be positive semi-definite.

play01:44

Which means it has to have positive eigenvalues,

play01:47

for exam- and- and- as a consequence,

play01:49

it has to be, symet- it is a symmetric, uh, ma- matrix.

play01:54

And then what is also an important property of kernels,

play01:57

is that there exist a feature representation, Phi,

play02:00

such that the kernel between two graphs is simply a feature representation,

play02:06

uh, wa-, uh, of the first graph dot product

play02:09

with the feature representation of the second graph, right?

play02:12

So Phi of G is a vector,

play02:14

and Phi of G is a- is another vector,

play02:16

and the value of the kernel is simply a dot product of this vector representation,

play02:22

uh, of the two, uh, graphs.

play02:24

Um, and what is sometimes nice in kernels,

play02:27

is that this feature representation, Phi,

play02:30

doesn't even need to- to be explicitly created for us to be able to compute the value,

play02:36

uh, of the kernel.

play02:37

And once the kernel is defined,

play02:40

then off-the-shelf machine learning models,

play02:42

such as kernel support vector machines,

play02:44

uh, can be used to make, uh, predictions.

play02:48

So in this le- in this part of the lecture,

play02:51

we are going to discuss different, uh,

play02:54

graph kernels, which will allow us to measure similarity between two graphs.

play02:58

In particular, we are going to discuss

play03:01

the graphlet kernel as well as Weisfeiler-Lehman kernel.

play03:06

There are oth- also other kernels that are proposed in the literature,

play03:10

uh, but this is beyond the scope of the lecture.

play03:12

For example, random-walk kernel,

play03:14

shortest-path kernel, uh, and many, uh, others.

play03:18

And generally, these kernels provide a very competitive performance in graph-level tasks.

play03:25

So what is the key idea behind kernels?

play03:28

The key idea in the goal of kernels is to define a feature vector,

play03:33

Phi of a given graph,

play03:35

G. And the- the idea is that,

play03:37

we are going to think of this feature vector, Phi,

play03:40

as a bag-of-words type representation of a graph.

play03:43

So what is bag of words?

play03:45

When we have text documents,

play03:47

one way how we can represent that text document,

play03:50

is to simply to represent it as a bag of words.

play03:52

Where basically, we would say,

play03:54

for every word we keep a count of how often that word appears in the document.

play03:59

So we can think of, let's say,

play04:00

words sorted alphabetically, and then,

play04:02

you know, at position, i,

play04:05

of this bag-of-words representation,

play04:07

we will have the frequency,

play04:08

the number of occurrences of word,

play04:10

i, in the document.

play04:12

So in our- in the same way,

play04:15

and naive extension of this idea to graphs would be to regard nodes as words.

play04:20

However, the problem is that since both- since graphs can have very different structure,

play04:25

but the same number of nodes,

play04:27

we would get the same feature vector,

play04:29

or the same representation for two very different graphs, right?

play04:32

So if we re- regard nodes as words,

play04:35

then this graph has four nodes,

play04:36

this graphs has four nodes,

play04:38

so their representation would be the same.

play04:41

So we need a different candidate for- for the-

play04:44

for the word in this kind of bag-of-words representation.

play04:47

To be, for example, a bit more expressive,

play04:50

we could have what we could call,

play04:52

uh, degree kernel, where we could say,

play04:54

w- how are we going to represent a graph?

play04:56

We are going to represent it as a bag-of-node degrees, right?

play04:59

So we say, "Aha,

play05:01

we have one node of degree 1,

play05:03

we have three nodes of degree 2,

play05:08

and we have 0 nodes of degree, uh, 3."

play05:12

In the same way,

play05:14

for example, uh, here,

play05:16

we could be asking,

play05:17

how many nodes, uh,

play05:19

of different degrees do we have here?

play05:20

We have 0 nodes of degree, um, 1,

play05:23

we have two nodes, uh, of degree 2,

play05:25

and two nodes, uh, of degree, um, 3.

play05:28

So, um, and this means that now we would, er,

play05:31

obtain different feature representations for these,

play05:34

uh, different, uh, graphs,

play05:36

and that would allow us to distinguish these different, uh, graphs.

play05:40

And now, both the graphlets kernel as well as the Weisfeiler-Lehman kernel,

play05:45

use this idea of bag-of-something representation of a graph where- where the star,

play05:51

this something is more sophisticated than node degree.

play05:55

So let's, uh, first talk about the graphlets kernel.

play05:59

The idea is that writing 1,

play06:01

I represented the graph as a count of the number of different graphlets in the graph.

play06:06

Here, I wanna make,

play06:08

uh, um important point;

play06:10

the definition of graphlets for a graphlet kernel,

play06:13

is a bit different than the definition of a graphlet in the node-level features.

play06:18

And there are two important differences that graphlets in

play06:21

the node-level features do not need to be connected,

play06:25

um, and, um also that they are not, uh, uh, rooted.

play06:30

So graphlets, uh, in this- in the, eh,

play06:33

graphlets kernel are not rooted,

play06:35

and don't have to be connected.

play06:37

And to give you an example,

play06:39

let me, uh, show you, uh, the next slide.

play06:42

So for example, if you have a list of graphlets that we are interested

play06:47

in little g_1 up to the little g_n_k,

play06:52

let's say these are graphlets of size k,

play06:55

then let say for k equals 3,

play06:57

there are four different graphlets, right?

play06:59

There are four different con- graphs on three nodes,

play07:02

and directed, fully connected two edges,

play07:05

one edge, and no edges.

play07:07

So this is the definition of graphlets in the graph kernel.

play07:10

And for example, for k equals 4,

play07:13

that are 11 different graphlets,

play07:15

fully connected graph all the way to the graph on four nodes without any connections.

play07:21

And now, uh, given a graph,

play07:23

we can simply represent it as count of the number of structures,

play07:28

um, er, different graphlets that appear, uh, in the graph.

play07:31

So for example, given a graph,

play07:33

and the graphr- graphlet list,

play07:35

we define the graphlet count vector f,

play07:38

simply as the number of instances of a given graphlet that appears,

play07:42

uh, in our graph of interest.

play07:45

For example, if these G is our graph of interest,

play07:49

then in this graph,

play07:50

there resides one triangle,

play07:53

there resides three different parts of land 2,

play07:58

there reside six different edges with an isolated nodes, and there, uh,

play08:04

exist no, uh, triplet, uh, of nodes,

play08:07

uh, that are, uh, that are not connected,

play08:10

uh, in this graph.

play08:11

So the graphlet feature vector in this case, uh,

play08:14

would be here, would have ready 1,

play08:16

3, 6, uh, and 0.

play08:19

Now, given two graphs,

play08:21

we can define the graphlet kernel simply as the dot product between the graphlet, uh,

play08:27

count vector of the first graph times,

play08:30

uh, the graphlet count vector of the second graph.

play08:33

Um, this is a good idea,

play08:36

but actually, there is a slight problem.

play08:38

The problem is that graphs G1 and G2 may have different sizes,

play08:42

so the row counts will be very,

play08:45

uh, different of, uh,

play08:47

graphlets in different graphs.

play08:48

So a common solution people apply is to normalize this feature vector representation,

play08:54

uh, for the graph.

play08:55

So this means that, um,

play08:57

the- the graphlet, uh,

play08:59

vector representation for a given graph is simply the can- the count of

play09:04

individual graphlets divided by the total number of graphlets that appear in the graph.

play09:09

So if the- this essentially

play09:11

normalizes for the size and the density of the underlying graph,

play09:15

and then the graphlet kernel is defined as the dot product between these,

play09:19

um, uh, feature vector representations of graphs,

play09:23

uh, uh, h that capture,

play09:25

uh, the frequency or the proportion of- of our given graphlet,

play09:30

um, in a- in a graph.

play09:31

There is an important limitation of the graphlet graph kernel.

play09:36

And the limitation is that counting graphlets is very expens- expensive.

play09:41

Counting a k-size graphlet in a graph with n nodes, uh,

play09:45

by enumeration takes time or the n raised to the power k. So,

play09:50

um, this means that counting graphlets of

play09:54

size k is polynomial in the number of nodes in the graph,

play09:58

but it is exponential in the graphlet size.

play10:01

Um, and this is unavoidable in the worse-case

play10:04

since sub-graph isomorisic- isomorphism judging whether,

play10:08

uh, a sub-graph is a- is a, uh,

play10:11

isomorphic to another, uh,

play10:12

graph, is, uh, NP-hard.

play10:14

Um, and, uh, there are faster algorithms if,

play10:17

uh, graphs node, uh,

play10:19

node degree is bounded by d,

play10:21

then there exist a fa- faster algorithm to count the graphlets of size k. However,

play10:26

the issue still remains that counting

play10:28

these discrete structures in a graph is very time consuming, um, very expensive.

play10:33

So we can only count graphlets up to,

play10:36

uh, you know, uh,

play10:37

a handful of nodes.

play10:39

And then the- and then the exponential complexity takes over and we cannot count,

play10:44

uh, graphlets, uh, that are,

play10:46

uh, larger than that.

play10:47

Um, so the question is,

play10:50

how do we design a more efficient graph kernel?

play10:53

Um, and Weisfeiler-Lehman graph kernel, uh, achieves this goal.

play10:57

The goal here is to design an efficient graph feature descriptor Phi of G, uh,

play11:02

where the idea is that we wanna use a neighborhood structure to iteratively enrich,

play11:06

uh, node, uh, vocabulary.

play11:09

And, um, we generalize a version of node degrees since node degrees are

play11:14

one hot- one-hop neighborhood information to multi-hop neighborhood information.

play11:19

And the algorithm that achieves this is, uh, uh,

play11:23

called the Weisfeiler-Lehman graph isomorphism test,

play11:26

or also known as color refinement.

play11:29

So, uh, let me explain, uh, this next.

play11:32

So the idea is that we are given a graph G with a set of nodes V,

play11:38

and we're going to assign an initial color,

play11:40

um, c^0, so this is an initial color to each node.

play11:45

And then we are going to iteratively, er,

play11:48

aggregate or hash colors from the neighbors to invent new colors.

play11:53

So the way to think of this, uh,

play11:54

the new color for a given node v will be a hashed value of its own color, um,

play12:00

from the previous time step and a concatenation

play12:05

of colors coming from the neighbors u of the node v of interest,

play12:10

where hash is basically a hash functions that

play12:13

maps different inputs into different, uh, colors.

play12:17

And after k steps of this color refinement,

play12:20

um, um, c, uh,

play12:22

capital v of, uh,

play12:24

capital K of v summarizes the structure, uh,

play12:27

of the graph, uh, at the level of,

play12:30

uh, K-hop, uh, neighborhood.

play12:32

So let me, um, give you an example, uh, and explain.

play12:35

So for example, here I have two, uh,

play12:38

graphs that have very similar structure but are just slightly, uh, different.

play12:42

The difference is, uh, this, uh,

play12:44

edge and here, um, the- the, uh,

play12:46

the diagonal edge, the triangle closing edge,

play12:49

um, um, is, uh, different.

play12:52

So first we are going to assign initial colors to nodes.

play12:55

So every node gets the same color,

play12:57

every node gets a color of one.

play12:59

Now we are going to aggregate neighboring colors.

play13:02

For example, this particular node aggregate colors 1,1,

play13:06

1, um, and, uh,

play13:07

adds it to it- to itself,

play13:09

while this particular node up here aggregates colors from its neighbors,

play13:13

one and one, uh, and it is here.

play13:16

And the same process, uh,

play13:17

happens in this second,

play13:19

uh, graphs- graph as well.

play13:21

Now that, um, we have collected the colors,

play13:24

uh, we go, uh,

play13:26

and hash them, right?

play13:27

So we apply a hash- hash function that takes

play13:30

nodes' own color plus the colors from neighbors and produces new colors.

play13:35

And let's say here the hash function for the first combination returns one,

play13:39

then two, then three, uh, four, and five.

play13:41

So now we color the graphs,

play13:44

uh, based on these new refined colors, right?

play13:46

So this is the coloring of the first graph,

play13:48

and this is the coloring of the second graph based on the hash values

play13:52

of the- of the aggregated colors from the first step.

play13:56

Now we take these two graphs and,

play13:59

again, apply the same color aggregation scheme, right?

play14:02

So for example, this node, uh,

play14:04

with color 4 aggregates colors from its neighbors,

play14:08

so aggregates the 3, 4, and 5.

play14:10

So we have 3, 4, and 5 here, while, for example,

play14:13

this node here of color 2 aggregates from its neighbor,

play14:17

uh, that is colored 5,

play14:18

so it gets 2, 5.

play14:20

And again, for this graph,

play14:22

the same process happens.

play14:24

Now, again, we take, um, uh, this,

play14:27

uh, aggregated colors, um, and we hash them.

play14:30

And let's say our hash function, uh,

play14:33

assigns different, uh, new colors, uh,

play14:36

to, uh, to this,

play14:37

uh, colors that are,

play14:38

uh, combined aggregated from the previous timesteps.

play14:41

So now we can take this, uh, original, uh,

play14:44

aggregated colored graph and,

play14:46

uh, relabel the colors based on the hash value.

play14:49

So 4,345, uh, 4, um,

play14:52

er, where is, uh, er, uh,

play14:56

34- uh, 345- um, is, um,

play15:00

layer hashes into color 10s,

play15:02

so we replace a color 10, uh, here.

play15:05

And we could keep iterating this and we would come up, uh, with, uh,

play15:09

more and more, uh, refinement,

play15:11

uh, of the, uh,

play15:12

uh, colors of the graph.

play15:13

So now that we have run this color refinement for a,

play15:17

uh, a fixed number of steps,

play15:18

let say k iterations,

play15:20

the Weisfeiler-Lehman, uh, kernel counts number of nodes with a given color.

play15:26

So in our case,

play15:27

we run- we run this three times,

play15:29

so we have 13 different colors.. And now

play15:32

the feature description for a given graph is simply the count,

play15:36

the number of nodes of a given color, right?

play15:37

In the first iteration,

play15:39

uh, all the nodes were colored,

play15:40

um- all six nodes were colored the same one- uh, the same way.

play15:44

Um, so there is six instances of color 1.

play15:46

Then, um, after we iter- um,

play15:49

agg- aggregated the colors and refined them, you know,

play15:52

there were two nodes of color 2,

play15:54

one node of color 3,

play15:55

two nodes of color 4, um, and so on.

play15:58

So here is now the feature description in terms of color counts, uh, for, uh,

play16:03

for, uh, different colors for the first graph and different colors,

play16:07

uh, for the second graph.

play16:09

So now that we have the feature descriptions,

play16:12

the Weisfeiler-Lehman graph kernel would simply take the dot product between these two,

play16:17

uh, uh, feature descriptors and return a value.

play16:19

So for example, in our case,

play16:21

the si- the, uh, Weisfeiler-Lehman, uh,

play16:23

kernel similarity between the two graphs is the dot product between the,

play16:27

uh, feature descriptors, uh, here.

play16:29

These are the two, uh,

play16:31

feature descriptors and we compute the dot product,

play16:33

we would get a value of, uh, 49.

play16:36

So WL kernel is very popular and very strong,

play16:41

uh, gives strong performance,

play16:42

and it is also computationally efficient because the time complexity of

play16:47

this color refinement at each step is linear in the size of the graph.

play16:51

It is linear in the number of edges because all that

play16:55

every node has to do is collect the colors, uh, from its, uh,

play16:58

neighboring nodes and- and produce- and apply

play17:01

a simple hash function to- to come up with a new,

play17:05

um, uh, with a new, uh, color.

play17:08

When computing the kernel value,

play17:10

many colors, uh, appeared in two graphs need to be tracked.

play17:14

So the number of colors will be at most the number of nodes,

play17:18

uh, in the network.

play17:20

So this again won't be too- too large.

play17:22

And counting the colors again takes linear time because it's just a sweep over the nodes.

play17:27

So the- the total complexity, uh,

play17:30

of computing the Weisfeiler-Lehman graph kernel between a pair of, uh,

play17:34

graphs is simply linear in the number of edges in the two graphs.

play17:38

So this means this is extremely, uh,

play17:41

fast and actually works,

play17:43

uh, really well in practice.

play17:45

So to summarize the graph level features that we have discussed,

play17:49

first we talked about, uh,

play17:51

the notion of graph kernels,

play17:52

where basically graph is represented as a bag of graphlets or a bag of, uh, colors.

play17:59

Um, and, uh, when we represent the graph as a graph- uh,

play18:02

as a bag of graphlets,

play18:04

this is extremely- this is very expensive representation because counting the graphlets,

play18:10

uh, takes time exponential in the size of the graph.

play18:14

At the same time, Weisfeiler-Lehman, uh,

play18:17

kernel is based on this case step color refinement algorithm that

play18:22

enriches and produces new node colors that are aggregated from the,

play18:27

um, colors of the immediate neighbors of the node.

play18:30

And as multiple rounds of this color refinement are run,

play18:34

the node kind of gathers color information from farther away,

play18:38

uh, parts of the network.

play18:39

So here we represent the graph as a bag of colors.

play18:43

This is computationally efficient.

play18:45

The time is linear in the size of the graph, um,

play18:48

and it is also closely related to graph neural networks that we are going to study,

play18:54

uh, later in this course.

play18:55

So, um, Weisfeiler-Lehman is a really, uh,

play18:59

good way to measure similarity, um,

play19:01

between graphs, and in many cases,

play19:04

it is, uh, very hard to beat.

play19:06

So this concludes the today lecture where we talked about, um,

play19:11

three different, uh, approaches to traditional,

play19:14

uh, graph, uh, level, um, machine learning.

play19:18

We talked about, um, handcrafted features for node-level prediction,

play19:23

uh, in terms of node degree,

play19:24

centrality, clustering, coefficient, and graphlets.

play19:27

We talked about link-level or edge-level features,

play19:31

distance-based, as well as local and global neighborhood overlap.

play19:35

And then last we'd talk about how do we characterize the structure of the entire graph.

play19:39

We talked about graph kernels, uh,

play19:41

and in particular about graphlet kernel and the WL,

play19:45

meaning Weisfeiler-Lehman graph kernel.

play19:49

So this concludes our discussion of traditional machine learning approaches, uh,

play19:54

to graphs and how do we create feature vectors from nodes, links, um,

play19:59

and graphs, um, in a- in a scalable and interesting way. Uh, thank you very much.

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Graph LearningKernel MethodsGraph KernelsMachine LearningFeature VectorsWeisfeiler-LehmanGraphletsNeighborhood AnalysisGraph PredictionColor Refinement
هل تحتاج إلى تلخيص باللغة الإنجليزية؟