Stanford CS224W: ML with Graphs | 2021 | Lecture 2.2 - Traditional Feature-based Methods: Link

Stanford Online
15 Apr 202116:47

Summary

TLDRThe script discusses traditional machine learning approaches for graph-level predictions, focusing on link prediction tasks. It explores two methods: predicting missing links at random and forecasting links over time in evolving networks. The script introduces various features for node pairs, including distance-based, local neighborhood overlap, and global neighborhood overlap metrics, with examples like Katz index to predict future links based on the network's structure.

Takeaways

  • πŸ” The focus is on link prediction tasks in graph-level predictions, aiming to predict new links based on existing ones.
  • πŸ“ˆ The task involves ranking all unlinked node pairs at test time and identifying the top k pairs as predicted links.
  • 🧠 Key to the task is designing features for node pairs that capture the structure of links in the network.
  • πŸ”— A simple approach is to concatenate features of two nodes, but this can lose important relational information.
  • πŸ“Š Two formulations for link prediction are discussed: 'missing at random' and 'predicting links over time'.
  • πŸ•’ The 'missing at random' formulation assumes links are removed randomly, while 'predicting over time' considers network evolution.
  • πŸ“ Evaluation of predictions is done by comparing the ranked list of predicted links with actual future links.
  • πŸ“Œ Three types of link level features are discussed: distance-based, local neighborhood overlap, and global neighborhood overlap.
  • πŸ›° Distance-based features consider the shortest path between nodes but may lack depth in capturing connections.
  • 🌐 Local neighborhood overlap features like common neighbors and Jaccard coefficient focus on immediate connections between nodes.
  • 🌏 Global neighborhood overlap features, such as the Katz index, consider the entire graph structure for link prediction.

Q & A

  • What is the main focus of the investigation in the transcript?

    -The main focus of the investigation is on traditional machine learning approaches for graph-level predictions, specifically link prediction tasks and features that capture the structure of links in a given network.

  • What is the link level prediction task described in the transcript?

    -The link level prediction task is to predict new links based on the existing links in the network by evaluating all node pairs that are not yet linked at test time, ranking them, and proclaiming the top k node pairs as predicted links that are going to occur in the network.

  • How can the importance of information about the relationship between two nodes be lost?

    -The importance of information about the relationship between two nodes can be lost by simply concatenating the features of the two nodes and training a model on that representation, which often neglects the relational information between them.

  • What are the two different ways to formulate the link prediction task mentioned in the transcript?

    -The two ways to formulate the link prediction task are: 1) Assuming links in the network are missing at random, and 2) Predicting links over time in networks that naturally evolve, such as citation, social, or collaboration networks.

  • How is the performance of a link prediction algorithm evaluated in the context of networks that evolve over time?

    -The performance is evaluated by looking at a graph between two time points, t0 and t0', and based on the structure up to time t0', outputting a ranked list of predicted links that are expected to occur between times T1 and T1'. The algorithm's predictions are then compared to the actual links that appeared in that future time frame.

  • What is a feature descriptor for a pair of nodes in the context of link prediction?

    -A feature descriptor for a pair of nodes is a method to quantify the relationship between the two nodes in such a way that it can be used to predict or learn whether a link exists between them or not.

  • What is a distance-based feature in link prediction?

    -A distance-based feature in link prediction refers to the use of the shortest path distance between two nodes to characterize their relationship.

  • What is the local neighborhood overlap feature and how is it calculated?

    -The local neighborhood overlap feature captures the number of neighboring nodes shared between two nodes. It is calculated by taking the intersection of the neighbors of the two nodes, and can be normalized using the Jaccard coefficient or adjusted using the Adamic-Adar index.

  • What is the global neighborhood overlap feature and how does it differ from the local neighborhood overlap?

    -The global neighborhood overlap feature considers the entire graph structure to score the relationship between a pair of nodes, not just the immediate neighbors. It differs from the local neighborhood overlap by not being limited to two-hop distances and considering all possible paths between nodes.

  • How is the Katz index computed and what does it represent?

    -The Katz index is computed using the closed-form expression involving the inverse of the identity matrix minus beta times the adjacency matrix. It represents the global neighborhood overlap by counting the number of paths of all lengths between a pair of nodes, where paths are discounted exponentially with their length.

  • Why might the assumption of links missing at random not hold true in real-world scenarios?

    -The assumption of links missing at random might not hold true because in real-world scenarios, such as protein-protein interaction networks, the connections are often not tested at random but are influenced by previous positive results, leading to some areas of the network being more explored than others.

Outlines

00:00

πŸ”— Link Prediction in Graph Networks

The paragraph discusses the task of link prediction in graph networks, focusing on predicting new links based on existing ones. It emphasizes the importance of designing features for node pairs to capture the structure of links within the network. The speaker outlines two approaches to formulating link prediction tasks: one is treating links as missing at random and predicting them, and the other is predicting links over time in evolving networks. The paragraph also introduces the concept of evaluating these predictions by comparing the predicted links to actual future occurrences.

05:01

πŸ“ Distance-Based and Local Neighborhood Features

This section delves into the specifics of distance-based features for link prediction, such as the shortest path distance between nodes. However, it points out the limitations of such features in capturing the strength of connections. The paragraph then introduces local neighborhood overlap features, like the number of common neighbors and the Jaccard coefficient, which provide a more nuanced view of node relationships. It also mentions the Adamic-Adar index, which adjusts for the degree of common neighbors to better predict links in social networks.

10:08

🌐 Global Neighborhood Overlap Metrics

The speaker addresses the limitations of local neighborhood overlap metrics, which return zero for nodes without common neighbors, potentially overlooking future connections. To overcome this, global neighborhood overlap metrics like the Katz index are introduced. The Katz index considers all paths of varying lengths between nodes, discounting them by their length to give more importance to shorter paths. The paragraph explains how the Katz index can be computed using the powers of the adjacency matrix and provides a formula for its closed-form calculation.

15:11

πŸ” Link Level Feature Summary

In conclusion, the paragraph summarizes the different types of link level features discussed: distance-based, local neighborhood overlap, and global neighborhood overlap. It highlights the Katz index as a global metric that accounts for all path lengths between nodes, providing a comprehensive score for potential links. The summary underscores the importance of these features in accurately predicting links in graph networks.

Mindmap

Keywords

πŸ’‘Link Prediction

Link prediction is the task of predicting the likelihood of links forming between pairs of nodes in a network. It is a central concept in the video, which focuses on predicting future connections in networks like social or citation networks. The script discusses how to evaluate and rank node pairs that are not yet connected to predict potential future links.

πŸ’‘Node Pairs

Node pairs refer to any two nodes within a network that could potentially be connected by a link. The video script emphasizes the importance of designing features for these pairs to effectively predict new links, which is a core part of the link prediction task.

πŸ’‘Features

In the context of the video, features are attributes or properties used to describe node pairs for the purpose of machine learning. The script discusses various types of features, such as distance-based and neighborhood overlap features, which are crucial for predicting links between nodes.

πŸ’‘Graph Level Predictions

Graph level predictions refer to making predictions about the entire structure of a graph or network, rather than individual nodes. The video's investigation into link prediction tasks is an example of graph level predictions, as it considers the network's structure as a whole.

πŸ’‘Network Structure

Network structure refers to the arrangement of nodes and links within a network. The script discusses how features designed for link prediction must capture the structure of links in a given network to be effective.

πŸ’‘Common Neighbors

Common neighbors is a local neighborhood overlap feature that counts the number of neighboring nodes shared between two nodes. It is used in the video to illustrate how the number of shared connections can indicate the strength of a potential link between nodes.

πŸ’‘Jaccard Coefficient

The Jaccard coefficient is a measure used to capture local neighborhood overlap. It is calculated as the size of the intersection divided by the size of the union of the neighborhoods of two nodes. The video uses this coefficient to normalize the count of common neighbors by the total number of neighbors.

πŸ’‘Adamic-Adar Index

The Adamic-Adar index is another local neighborhood overlap metric mentioned in the script. It sums the inverse logarithm of the degrees of common neighbors, giving more weight to less connected nodes. This index is particularly effective for social networks.

πŸ’‘Katz Index

The Katz index is a global neighborhood overlap metric that counts all paths of different lengths between a pair of nodes, with paths of greater length being discounted. The video explains how this index can be computed using the powers of the adjacency matrix of a graph.

πŸ’‘Adjacency Matrix

An adjacency matrix is a square matrix used to represent a network, where the entry at row u and column v is 1 if nodes u and v are connected, and 0 otherwise. The video script uses the concept of the adjacency matrix to explain how to compute path lengths and the Katz index.

πŸ’‘Graph Evolution

Graph evolution refers to changes in the network over time, such as the addition or removal of nodes and links. The video discusses predicting links over time in evolving networks like social networks, where the structure naturally changes.

Highlights

Investigating traditional machine learning approaches for graph-level predictions.

Focusing on link prediction tasks and features capturing the structure of links in a network.

The task is to predict new links based on existing links in the network.

Evaluating all node pairs not yet linked at test time and ranking them.

Proclaiming the top k node pairs as predicted links by the algorithm.

Designing features for a pair of nodes is key to the link prediction task.

Concatenating features of two nodes may lose important relational information.

Link prediction can be formulated as predicting missing links at random or predicting links over time.

For evolving networks, predicting links between two time points is a natural approach.

Evaluating the approach by comparing predicted links with actual future links.

For static networks, assuming links are missing at random can be useful.

Biologists do not explore protein-protein interaction networks at random, influencing the exploration pattern.

Providing a feature descriptor for a given pair of nodes is crucial.

Scoring pairs of nodes based on the number of common neighbors.

Sorting all pairs according to the score and predicting the top pairs as new links.

Reviewing three different ways to featurize or create a descriptor of the relationship between two nodes.

Discussing distance-based features, such as shortest path distance between nodes.

Local neighborhood overlap features capture the number of shared neighbors.

Global neighborhood overlap features consider the entire graph structure.

Katz index counts all paths of different lengths between a pair of nodes, discounted by their length.

Computing the Katz index in closed form using the inverse of the adjacency matrix.

Transcripts

play00:04

We continue with our investigation of traditional machine learning approaches uh,

play00:10

to uh, graph level predictions.

play00:12

And now we have- we are going to focus on

play00:15

link pre- prediction tasks and features that capture,

play00:19

uh, structure of links,

play00:21

uh, in a given, uh, network.

play00:23

So the link le- level prediction tasks is the following.

play00:28

The task is to predict new links based on the existing links in the network.

play00:33

So this means at test time,

play00:35

we have to evaluate all node pairs that are not yet linked,

play00:39

uh, rank them, and then, uh,

play00:41

proclaim that the top k note pairs, as, um,

play00:45

predicted by our algorithm,

play00:47

are the links that are going to occur,

play00:49

uh, in the network.

play00:50

And the key here is to design features for a pair of nodes.

play00:54

And of course, what we can do, um, uh,

play00:57

as we have seen in the node level, uh,

play01:00

tasks, we could go and, uh,

play01:02

let say concatenate, uh,

play01:04

uh, the features of node number 1,

play01:07

features of the node number 2,

play01:08

and train a model on that type of, uh, representation.

play01:12

However, that would be very, um,

play01:14

unsatisfactory, because, uh, many times this would, uh,

play01:18

lose, uh, much of important information about the relationship between the two nodes,

play01:23

uh, in the network.

play01:24

So the way we will think of this link prediction task is two-way.

play01:29

We can formulate it in two different ways.

play01:31

One way we can formulate it is simply to say,

play01:34

links in the network are,

play01:35

let say missing at random,

play01:37

so we are given a network,

play01:38

we are going to remove, uh,

play01:40

at random some number of links and then trying to predict,

play01:43

uh, back, uh, those links using our,

play01:46

uh, machine learning algorithm.

play01:47

That's one type of a formulation.

play01:49

And then the other type of a formulation is that we are going to predict links over time.

play01:55

This means that if we have a network that naturally evolves over time, for example,

play01:59

our citation network, our social network,

play02:01

or our collaboration network,

play02:03

then we can say, ah,

play02:05

we are going to look at a graph between time zero and time zero,

play02:09

uh, prime, and based on the edges and the structure up to this uh,

play02:13

the time t 0 prime, uh,

play02:15

we are going then to output a ranked list L of

play02:18

links that we predict are going to occur in the future.

play02:22

Let's say that are going to appear between times T1 and T1 prime.

play02:26

And the way, uh, we can then evaluate this type of approach is to say ah,

play02:30

we know that in the future, um,

play02:32

n new links will appear, let's, uh,

play02:35

let's rank, uh, the- the potential edges outputed by

play02:39

our algorithm and let's compare it to the edges

play02:42

that actually really appeared, uh, in the future.

play02:45

Uh, this type of formulation is useful or natural

play02:49

for networks that evolve over time like transaction networks,

play02:52

like social networks, well, edges, uh, keep,

play02:55

um- keep adding, while for example,

play02:58

the links missing at random type formulation is more useful, for example,

play03:03

for static networks like protein- protein interaction networks,

play03:07

where we can assume,

play03:09

even though this assumption is actually heavily violated, that, you know,

play03:12

biologists are testing kind of at random connections between proteins,

play03:17

um, and we'd like to infer what other connections in the future, uh,

play03:21

are going for- for biologists are going to discover, uh, in the future,

play03:25

or which links should they probe with the,

play03:28

uh- that lab, uh, experiments.

play03:30

Of course, in reality,

play03:31

biologists are not exploring the physical, uh,

play03:34

protein-protein interaction network, uh, um, at random.

play03:38

Um, you know, they are heavily influenced by positive results of one another.

play03:42

So essentially some parts of this network suddenly well explored,

play03:46

while others are very much, uh, under explored.

play03:49

So with these two formulations, uh,

play03:52

let's now start thinking about, um,

play03:55

how are we going to, uh,

play03:57

provide a feature descriptor,

play03:59

uh, for a given, uh, pair of nodes?

play04:01

So the idea is that for a pair of nodes x, y,

play04:04

we are going to compute some score,

play04:06

um, uh, c, uh, x, y.

play04:08

For example, a score, uh,

play04:10

could be the number of common neighbors between nodes, uh, X and Y.

play04:14

And then we are going to sort all pairs x,

play04:17

y according to the decreasing, uh,

play04:19

score C, um, and we will predict top end pairs as the new links that are going to appear,

play04:25

uh, in the network.

play04:26

And then we can end, uh, the test-time, right,

play04:28

we can actually go and observe which links actually appear and compare these two lists,

play04:33

and this way determine how well our approach,

play04:35

our algorithm, um, is working.

play04:37

We are going to review, uh,

play04:39

three different ways how to, uh,

play04:42

featurize or create a descriptor of the relationship between two nodes in the network.

play04:47

We are going to talk about distance-based features,

play04:49

local neighborhood overlap features,

play04:51

as well as global neighbor- neighborhood overlap, uh, features.

play04:55

And the goal is that for a given pair of nodes,

play04:57

we are going to describe the relationship, um,

play05:00

between the two nodes, uh,

play05:02

so that from this relationship we can then predict or

play05:04

learn whether there exists a link between them or not.

play05:08

So first, uh, we talk about distance-based feature.

play05:13

Uh, this is very natural.

play05:14

We can think about

play05:15

the shortest path distance between the two nodes and characterize it in this way.

play05:19

So for example, if we have nodes B and H,

play05:22

then the shortest path length between them, uh, equals two.

play05:25

So the value of this feature would be equal to two.

play05:27

However, if you look at this, uh, this does,

play05:30

uh- what this- what this metric does not capture it, it captures the distance,

play05:33

but it doesn't measure,

play05:35

kind of capture the degree of neighborhood overlap or the strength of connection.

play05:40

Because for example, you can look in this network

play05:42

nodes B and H actually have two friends in common.

play05:45

So the- the connection here in some sense is stronger between them.

play05:49

Then for example, the connection between, uh,

play05:51

node D and, uh, node, uh,

play05:54

F, um, because they only have kind of- there is

play05:57

only one path while here there are two different paths.

play06:00

So the way we can, um,

play06:02

try to capture the strength of connection between two nodes would be to ask, okay,

play06:05

how many neighbors, uh,

play06:07

do you have in common, right?

play06:09

What is the number of common friends between a pair of nodes?

play06:12

And this is captured by the notion of local neighborhood overlap,

play06:16

which captures the number of neighboring nodes shared between two nodes,

play06:21

v and, uh- v1 and v2.

play06:23

Uh, one way to capture this is you simply say,

play06:26

what is the- what is the number of common neighbors, right?

play06:28

We take the neighbors of node V1,

play06:31

take the neighbors of node V2,

play06:32

and take the intersection of these two sets.

play06:36

Um, a normalized version of this, uh,

play06:39

same idea is Jaccard coefficient,

play06:42

where we take the intersection-

play06:44

the size of the intersection divided by the size of the union.

play06:47

The issue with common neighbors is that, of course,

play06:49

nodes that have higher degree are more likely to have neighbors with others.

play06:53

While here in the Jaccard coefficient, in some sense,

play06:56

we are norma- we are trying to normalize, um,

play07:00

by the degree, uh,

play07:02

to some degree by saying what is the union of the number of,

play07:05

um, neighbors of the two nodes.

play07:07

Uh, and then, uh,

play07:08

the other type of, uh,

play07:10

uh, local neighborhood overlap, uh,

play07:12

metric that is- that actually works quite well in practice is called,

play07:16

uh, Adamic- Adar index.

play07:17

And simply what this is saying is,

play07:19

let's go over the,

play07:20

um- let's sum over the neighbors that nodes v1 and v2 have in common,

play07:26

and let's take one over the log, uh, their degree.

play07:29

So basically, the idea here is that we count how many neighbors,

play07:33

um, the two nodes have in common,

play07:35

but the importance of uneven neighbor is,

play07:37

uh- is low, uh- decreases, uh, with these degrees.

play07:42

So if you have a lot of,

play07:43

um, neighbors in common that have low degree,

play07:47

that is better than if you have a lot of high-

play07:50

highly connected celebrities as a set of common neighbors.

play07:54

So this is a- a net- a feature that works really well in a social network.

play07:59

Of course, the problem with, uh, local, um,

play08:02

network neighborhood overlap is the limitation is that this, uh,

play08:07

metric always returns zero if two- two nodes are not- do not have any,

play08:13

uh, neighbors in common.

play08:14

So for example, in this case,

play08:16

if we would want to say what is the neighborhood overlap between nodes A and E,

play08:20

because they have no neighbors in common,

play08:22

they are more than, um,

play08:24

uh, two hops away from each other.

play08:26

Then the- if only in such cases,

play08:28

the return- the value of that it will be returned to will always be, zero.

play08:33

However, in reality, these two nodes may still potentially be connected in the future.

play08:38

So to fix this problem,

play08:40

we then define global neighborhood overlap matrix.

play08:44

That is all of this limitation by only, uh,

play08:47

focusing on a hop- two hop distances and

play08:50

two-hop paths between a pairs- pair of nodes and consider,

play08:54

um, all other distances or the entire graph as well.

play08:57

So let's now look at global neigh- neighborhood overlap type, uh, metrics.

play09:03

And the metric we are going to talk about is called Katz index,

play09:07

and it counts the number of all paths, uh,

play09:10

of all different lengths between a given pair of nodes.

play09:15

So now we need to figure out two things here.

play09:18

First is, how do we compute number of paths of a given length,

play09:22

uh, between, uh, two, uh, nodes?

play09:25

This can actually be very elegantly computed by

play09:28

using powers of the graph adjacency matrix.

play09:31

So let me give you a quick illustration or a quick proof why this is true.

play09:36

So the uh, first,

play09:38

I wanna give you the intuition around the powers of adjacency matrix, right?

play09:41

The point is that what we are going to show is

play09:44

that computing number of paths between uh, two nodes um,

play09:48

reduces down to computing powers of the graph adjacency matrix or

play09:52

essentially taking the graph adjacency matrix and multiplying it with itself.

play09:57

So first graph adjacency matrix recall,

play10:00

it has a value 1 at every entry uv if- if nodes u and v are connected.

play10:07

Then let's say that p,

play10:09

uv uh superscript capital K counts the number of paths of

play10:14

length K between nodes u and v. And our goal is to show that uh,

play10:20

uh, if we are interested in the number of paths uh,

play10:22

uh, of length K,

play10:25

then we have to uh,

play10:26

compute A to the power of k and that entry uv will tell us the number of pets.

play10:32

The capital K here is the same as uh,

play10:35

uh, a small case,

play10:37

so the Kth power of A measures the number of paths of a given length.

play10:42

And if you think about it right,

play10:44

how many paths of length 1 are there between a pair of

play10:47

nodes that is exactly captured by the graph adjacency matrix, right?

play10:51

If a pair of nodes is connected,

play10:53

then there is a value 1,

play10:54

and if a pair of nodes is not connected,

play10:57

then there is the value 0.

play10:59

Now that we know how to compute um,

play11:03

the number of paths of length 1 between a pair of nodes.

play11:07

Now we can ask how many- how do we compute

play11:10

the number of paths of length 2 between a pair of nodes u.

play11:13

And we are going to do this via the two- two-step procedure.

play11:16

Uh, and we are going to do this by decompose the path of

play11:20

length 2 into a path of length 1 plus another path of length 1.

play11:25

So the idea is that we compute the number of paths of length 1

play11:29

between each of u's neighbors and v,

play11:33

and then um, add one to that.

play11:36

So the idea is the following,

play11:38

the number of paths between nodes u and v of length 1- of length 2 is

play11:43

simply a summation over the nodes i that are the neighbors of the starting node u,

play11:49

um, times the number of paths now from

play11:52

this neighbor i to the target node v. And this will

play11:56

now give us the number of paths of length 2 between u and v. And now what you can see,

play12:03

you can see a substitute here back, the adjacency matrix.

play12:07

So all these is a sum over i,

play12:09

u- A _ui times A_iv.

play12:13

So if you see this,

play12:16

this is simply the product of matrices uh,

play12:19

of made- of adjacency matrix A_ iu itself.

play12:23

So this is now uh,

play12:24

the entry uv of the adjacency matrix A uh, squared.

play12:29

Um, this is uh,

play12:31

now by basically by induction,

play12:32

we can keep repeating this and get a higher powers that count paths of longer lengths um,

play12:41

as- as this is uh, increasing.

play12:43

Another way to look at this,

play12:45

here is a visual proof is that uh,

play12:47

what is A squared?

play12:49

A squared is A multiplied by itself,

play12:52

so when we are interested in a given,

play12:55

let's say entry, here these are entry,

play12:57

these are neighbors of Node 1.

play12:59

These are um, now the,

play13:02

the number of paths of length 1 between one- one's neighbors and node number 2.

play13:08

So after the multiplication,

play13:11

the value here will be 1,

play13:12

which we mean that there is one path of length 2 between node 1 uh, and Node 2.

play13:19

So this is how powers of adjacency matrix give

play13:23

account paths of length K between a pair of nodes uh, in the network.

play13:29

What this means is that now we can define the- we have developed

play13:34

the first component that will allow us to count- to compute the cuts index,

play13:39

because it allows us to count the number of paths between a pair of nodes for a given K.

play13:45

But what we still need to decide is how do we do this for all the path lengths,

play13:51

from one to infinity.

play13:53

So to compute the pets, as we said,

play13:55

we are going to use powers of the adjacency matrix uh,

play13:58

you know, uh, the adjacency matrix itself tells us powers of length 1,

play14:03

square of it tells us power of,

play14:05

uh squared tells us paths of length 2,

play14:09

and the adjacency matrix raised to

play14:12

the power l counts the number of paths of length l between a pair of nodes.

play14:17

NOW, the Katz index goes over from 1 path lengths all the way to infinity.

play14:24

So the- the Katz index uh,

play14:27

global neighborhood overlap between nodes v1 and

play14:30

v2 is simply a summation of l from one to infinity.

play14:34

We have this Beta raised to the power of l by basically

play14:37

a disc- discount factor that gives lower- lower importance

play14:42

to paths of longer lengths and A to b_l counts

play14:46

the number of paths of length l between nodes of v1 and v2.

play14:51

And now what is interesting about Katz index is that, um,

play14:56

um,uh, we can actually compute this particular expression in a closed-form.

play15:01

And here is- here is the- the formula for the Katz index again,

play15:05

and basically what uh,

play15:06

here is a closed form expression that will exactly compute the sum,

play15:10

and the reason um,

play15:12

why this is true or y there is inequality is this.

play15:15

We notice that this is simply uh,

play15:16

geometric series uh, for matrices,

play15:19

and for that there exists

play15:21

a closed form expression that all that it requires us is take the identity matrix

play15:26

minus Beta times adjacency matrix inverted that and

play15:29

then again then subtract the identity matrix again.

play15:33

And the entries of this matrix S will give us

play15:37

the Katz neighborhood overlap scores for any pair uh, of nodes.

play15:44

So to summarize, uh, link level features.

play15:47

Uh, we- we described three types of them.

play15:50

We talked about distance-based features that users, for example,

play15:53

shortest path between a pair of nodes and does not capture neighborhood overlaps.

play15:58

Then we talked about this neighborhood overlap metrics like common neighbors, Jaccard,

play16:03

and the Dmitry data that captures find- in a fine-grained wait,

play16:08

how many neighbors does a pair of nodes have in common?

play16:12

But the problem with this is that nodes that are more than two hops apart,

play16:15

nodes that have no neighbors in common,

play16:17

the metric will return value 0.

play16:20

So the global neighborhood overlap type metrics,

play16:23

for example, like Katz, uh,

play16:25

uses global graph structure to give us a score for a pair of nodes and Katz index counts

play16:31

the number of pets of all lands between a pair of

play16:35

nodes where these paths are discounted um,

play16:38

exponentially with their length.

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Machine LearningLink PredictionGraph AnalysisNetwork EvolutionSocial NetworksCitation NetworksCollaboration NetworksProtein InteractionFeature DescriptorGraph Adjacency