Stanford CS224W: ML with Graphs | 2021 | Lecture 5.1 - Message passing and Node Classification

Stanford Online
27 Apr 202118:33

Summary

TLDRThis lecture introduces semi-supervised node classification in graph networks, focusing on assigning labels to unlabeled nodes using message passing. It explores the concept of homophily, where similar nodes tend to connect, and leverages this to predict node labels. The discussion covers three techniques: relational classification, iterative classification, and belief propagation, providing foundational knowledge for understanding graph neural networks.

Takeaways

  • ๐Ÿ“š The lecture focuses on message passing and node classification as intermediate topics leading to graph neural networks.
  • ๐Ÿ” The challenge is to assign labels to all nodes in a network when only some nodes are labeled, such as identifying fraudsters in a social network.
  • ๐ŸŒ Node embeddings method was previously discussed, but the lecture introduces semi-supervised node classification with both labeled and unlabeled nodes.
  • ๐Ÿ’ก The concept of message passing is introduced to infer labels of unlabeled nodes by exploiting correlations within the network.
  • ๐Ÿค Correlations in networks are due to homophily, where similar nodes tend to connect, and social influence, where connections affect individual characteristics.
  • ๐Ÿ“Š The lecture uses an example of an online social network to illustrate how interests cluster due to homophily.
  • ๐Ÿ”„ The iterative process of message passing involves updating node labels based on neighbors' labels until convergence or a set number of iterations.
  • ๐Ÿ“ Three classical techniques are discussed: relational classification, iterative classification, and belief propagation.
  • ๐Ÿ”‘ The main assumption is homophily, where the labels of connected nodes tend to be the same, which is key to collective classification.
  • ๐Ÿ”ฎ The goal is to predict the labels of unlabeled nodes by considering both node features and network structure.
  • ๐Ÿ”„ Collective classification involves local classifiers for initial label assignment, relational classifiers to capture node correlations, and collective inference to propagate beliefs across the network.

Q & A

  • What is the main topic of the lecture?

    -The main topic of the lecture is message passing and node classification within the context of graph neural networks.

  • What is the goal when given a network with labels on some nodes?

    -The goal is to assign labels to all other nodes in the network, specifically identifying which nodes are trustworthy or untrustworthy.

  • What is semi-supervised node classification?

    -Semi-supervised node classification is a method where some nodes are labeled, and the task is to predict labels for the unlabeled nodes within the same network.

  • What is message passing in the context of this lecture?

    -Message passing refers to the process of inferring labels of unlabeled nodes by passing information across the network based on the labels of neighboring nodes.

  • What are the three classical techniques discussed for node classification?

    -The three classical techniques discussed are relational classification, iterative classification, and belief propagation.

  • What is the concept of homophily as it relates to networks?

    -Homophily is the tendency of individuals to associate or bond with similar others, meaning that nodes with similar characteristics or labels tend to be connected.

  • How does social influence relate to node classification?

    -Social influence can cause individual characteristics to change to become more similar to those of their connected nodes, thus influencing the node's label.

  • What is the 'guilt by association' principle in the context of node classification?

    -The 'guilt by association' principle suggests that if a node is connected to another node with a known label, it is likely to have the same label.

  • What is the role of the adjacency matrix in this context?

    -The adjacency matrix captures the structure of the graph, representing whether nodes are connected and the nature of those connections.

  • What is the Markov assumption in collective classification?

    -The Markov assumption in collective classification is that the label of a node only depends on the labels of its immediate neighbors.

  • How does the iterative classification process work?

    -The iterative classification process involves repeatedly applying a relational classifier to each node based on the updated labels of its neighbors until the labels converge or a set number of iterations is reached.

Outlines

00:00

๐ŸŒ Introduction to Semi-Supervised Node Classification

The speaker begins by expressing gratitude to the attendees and introduces the topic of message passing and node classification in the context of graph neural networks. The lecture aims to explore how to assign labels to unlabeled nodes within a network where some nodes are already labeled. An example scenario is presented where nodes represent individuals, some of whom are trustworthy and others are fraudsters. The goal is to identify the unlabeled nodes' trustworthiness. The speaker mentions that they previously discussed node embeddings but will now focus on semi-supervised node classification, where both labeled and unlabeled nodes are present. The concept of message passing is introduced as a framework to infer labels of unlabeled nodes by exploiting correlations within the network. The idea is that nodes connected to each other tend to share the same label, a concept known as homophily. The lecture will cover three classical techniques: relational classification, iterative classification, and belief propagation, which will provide intuition for understanding graph neural networks.

05:02

๐Ÿ”„ Understanding Correlations in Networks

The speaker delves into the concept of correlations in networks, explaining that nodes with similar characteristics, such as trustworthiness, tend to be connected. This phenomenon is known as homophily, where similar individuals are more likely to associate with each other. The speaker provides examples from social networks, such as researchers collaborating based on shared research interests. A visual example is given of a high school social network where nodes represent students, edges represent friendships, and colors denote interests in sports or arts. The visualization shows distinct groups forming based on interests, demonstrating homophily. The speaker also introduces the concept of social influence, where social connections can affect an individual's characteristics or behaviors. The idea is that if one person influences another's preferences, they become more similar, which strengthens their connection. The speaker emphasizes the importance of leveraging these correlations to predict node labels in a network.

10:03

๐Ÿ“Š Collective Classification and Its Principles

The speaker graphically illustrates the concept of collective classification, which involves classifying nodes in a network based on the assumption of homophily. The goal is to predict the class of unlabeled nodes given a few labeled nodes. The speaker introduces the idea of using an adjacency matrix to represent the graph structure and a vector of node labels to represent the class of each node. The problem is set up to predict the labels of unlabeled nodes, which are initially gray. The speaker explains that the label of a node depends on the properties of the node itself and the labels of its neighbors. This concept is referred to as 'guilt by association,' where a node's label is influenced by the labels of its connected nodes. The lecture will focus on developing an algorithm to predict node labels based on these principles.

15:05

๐Ÿ”Ž Components of Collective Classification

The speaker outlines the three components necessary for collective classification: a local classifier, a relational classifier, and collective inference. The local classifier assigns initial labels to unlabeled nodes based on their attributes or features, without using network information. The relational classifier captures correlations between nodes by predicting a node's label based on the labels or attributes of its neighbors, incorporating network information. The collective inference stage involves iteratively applying the relational classifier to update predictions until the network structure affects the predictions, and they converge to a stable state. The speaker sets the stage for discussing semi-supervised node classification techniques, using the intuition of homophily, and mentions that they will cover relational classification, iterative classification, and belief propagation.

Mindmap

Keywords

๐Ÿ’กMessage Passing

Message passing is a framework used in graph neural networks to update node representations by exchanging information with neighboring nodes. In the context of the video, message passing is a method to infer labels of unlabeled nodes in a network by leveraging the information from labeled nodes. The script mentions message passing as a technique to exploit correlations in the network, where nodes update their label beliefs based on their neighbors' labels.

๐Ÿ’กNode Classification

Node classification is the task of assigning labels to nodes in a network. The video discusses semi-supervised node classification, where some nodes are already labeled, and the goal is to predict labels for the unlabeled nodes. This is central to the video's theme as it sets the stage for discussing how to use the network's structure to predict node labels.

๐Ÿ’กGraph Neural Networks

Graph Neural Networks (GNNs) are a class of neural networks designed to process graph-structured data. They are mentioned as the ultimate application of the techniques discussed, such as message passing and node classification. GNNs learn node representations by aggregating features from their neighbors, which is directly related to the concept of message passing.

๐Ÿ’กHomophily

Homophily is the tendency of nodes to connect with similar others. It is a social science concept used in the video to explain why nodes with the same label tend to be connected. The script uses homophily to motivate the idea that connected nodes are more likely to share the same label, which is crucial for algorithms that predict node labels based on network structure.

๐Ÿ’กSemi-supervised Node Classification

Semi-supervised node classification refers to the scenario where some nodes in the network are labeled, and the task is to classify the remaining unlabeled nodes. The video uses this concept to describe the problem setting where both supervised (labeled nodes) and unsupervised (unlabeled nodes) signals are available to predict labels.

๐Ÿ’กNode Embeddings

Node embeddings are low-dimensional representations of nodes that capture their structural properties within a network. The script mentions node embeddings as a method to build a classifier for predicting node labels. They are a precursor to more complex techniques like message passing.

๐Ÿ’กRelational Classification

Relational classification is a technique mentioned in the script where the label of a node is predicted based on the labels or attributes of its neighboring nodes. It is part of the iterative process in collective classification, highlighting the relational aspect of nodes within a network.

๐Ÿ’กIterative Classification

Iterative classification is an approach where node labels are updated multiple times based on the labels of their neighbors until convergence. The video describes this process as part of collective classification, emphasizing the iterative nature of refining predictions in a network.

๐Ÿ’กBelief Propagation

Belief propagation is a probabilistic technique for performing inference on graphical models. In the video, it is presented as one of the classical techniques for semi-supervised node classification, where beliefs about node labels are updated based on the labels of neighboring nodes.

๐Ÿ’กCollective Classification

Collective classification is the process of classifying nodes in a network by considering the relationships between nodes. The video explains that this approach leverages the network structure to improve the accuracy of node classification, as nodes that are connected tend to share similar labels.

๐Ÿ’กSocial Influence

Social influence is the phenomenon where social connections can affect an individual's characteristics or behaviors. The script uses this concept to explain how correlations in networks can arise, influencing the labels of nodes. It provides a real-world context for why nodes might have similar labels.

Highlights

Introduction to message passing and node classification in graph neural networks.

The problem of assigning labels to unlabeled nodes in a network.

The concept of semi-supervised node classification.

The importance of exploiting correlations in the network.

The idea of collective classification based on node connections.

The three classical techniques for node classification: relational classification, iterative classification, and belief propagation.

The role of homophily in network correlations.

The influence of social connections on individual characteristics.

The motivation for using network data in social science.

The example of an online social network demonstrating homophily.

The concept of guilt by association in network classification.

The two factors determining node classification: node features and neighbor labels.

The probabilistic framework for collective classification.

The Markov assumption in node classification.

The iterative process of collective inference in node classification.

The definition and role of local classifiers in initial label assignment.

The function of relational classifiers in capturing node correlations.

The process of collective inference and label convergence.

The overview of semi-supervised node classification techniques.

Transcripts

play00:04

Thank you so much everyone for attending,

play00:07

exciting to be here and to talk about,

play00:09

uh, the next topic in this class.

play00:11

Today we are going to discuss, uh,

play00:13

message passing and node classification.

play00:16

Um, and this is an intermediate topic that we are going to use, um,

play00:20

uh, so that we can then move to,

play00:23

uh, graph, uh, neural networks.

play00:25

Um, so let's talk about how,

play00:28

uh, we are going to think about this.

play00:30

So the idea for today is that we are given a network with labels on some nodes.

play00:35

And the question is, how do we assign labels to all other nodes in the network?

play00:39

Um, so the example is that in a network,

play00:42

some nodes are, let's say,

play00:44

fraudsters or un- untrustworthy nodes,

play00:47

and some other nodes are trusted, fully trusted.

play00:50

The question becomes, how do we find other fraudsters or trustworthy nodes,

play00:55

uh, in the network?

play00:56

Um, and we have already, for example,

play00:58

discussed node embeddings method,

play01:00

um, in lecture 3.

play01:03

And we could say, let's use this node embedding methods to simply build

play01:06

a classifier that predicts which node is trusted,

play01:10

and which node is, uh, not trusted.

play01:13

Uh, however, today, we are going to think about this a bit differently.

play01:17

We are going to de- think about this in what is

play01:19

called semi-supervised node classification,

play01:23

where we will be given both, uh,

play01:25

some nodes that are labeled,

play01:27

let's say labeled with the green and the red color,

play01:30

and some other nodes that are unlabeled and they will be all part of the same network.

play01:34

So the question will be,

play01:35

how do we predict labels of the unlabeled node?

play01:39

Uh, and this is called,

play01:41

uh, semi-supervised node classification,

play01:43

semi-supervised because we are both given the supervised signal, so the labels,

play01:48

as well as the unsupervised signal,

play01:50

as well as the non-labels,

play01:52

uh, all at the same time.

play01:54

And the idea is that we wanna do this

play01:57

in what is called a message passing framework, right?

play02:00

We would like to- basically,

play02:02

given one big network partially labeled,

play02:04

we'd like to infer the labels,

play02:06

uh, of the unlabeled nodes.

play02:08

Uh, and we are going to do this by doing what is called message passing over the network.

play02:13

Uh, and the intuition here will be that we wanna exploit,

play02:16

uh, correlations that exist in the network.

play02:19

So what I mean by correlations,

play02:22

I mean that nodes that share labels tend to be connected, right?

play02:27

So we will have this notion of collective classification,

play02:30

where basically the idea will be that nodes are going to, uh, um,

play02:34

update what they believe is their own labels based on the labels of the neighbors,

play02:39

uh, in the network.

play02:40

And here, conceptually, this is also similar to what we were discussing, uh, in, uh,

play02:46

pager link lecture, where the pager link score has- was updated,

play02:50

uh, based on the scores,

play02:52

uh, of the neighbors.

play02:53

But here we are not going to update the score,

play02:55

we're going to update the belief,

play02:58

the prediction about what is the label about a given node.

play03:01

And we are going to talk about three classical techniques.

play03:06

One is called relational classification,

play03:08

the other one is iterative classification,

play03:11

and then last we are going to talk about belief propagation.

play03:14

And all these methods,

play03:16

are kind of a bit old school methods,

play03:18

but they give us a lot of intuition for what we are going to talk in

play03:21

the next few weeks when we are going to focus

play03:24

on deep learning on graphs and in particular,

play03:27

uh, graph neural networks.

play03:29

So today, kind of as a starter topic into that,

play03:32

we are going to talk about, um, uh,

play03:34

these three topics of, uh, collective classification.

play03:38

So when I said correlations exist in networks,

play03:41

what I mean by this is that individual behaviors

play03:44

are often correlated in the network structure.

play03:47

And correlation means that nearby nodes tend to have the same color,

play03:51

tend to have the same label.

play03:53

They tend to be- uh- uh,

play03:54

belong to the same class.

play03:56

Um, and there are two reasons, specially, for example,

play04:00

in social networks, but also in other types of networks,

play04:04

for why this might be happening.

play04:06

First is this notion of homophily,

play04:08

where basically individual characteristics, um,

play04:11

uh, um mean that people of similar characteristics tend to link each other.

play04:18

This is notion of homophily from social science.

play04:20

And then there is also this notion of influence,

play04:23

where the idea is that, uh,

play04:24

social connections influence our own characteristics or our own behaviors.

play04:29

So let me kind of give you a bit of motivation from the social science point of view,

play04:34

why these correlations exist in networks and why network data,

play04:38

um, is so useful.

play04:39

So homophily is defined as the tendency of

play04:43

individuals to associate or bond with similar others.

play04:48

And one way to think of this is to say,

play04:50

birds of feather flock together, so right?

play04:53

People that are similar tend to bond,

play04:55

they tend to link with each other.

play04:57

And this phenomenon has been observed in a vast array of different,

play05:01

uh, social networks studies, uh,

play05:03

on a variety of attributes in terms of age,

play05:06

gender, organization, a little social status, um,

play05:10

any kind of preferences,

play05:12

political preferences, food preferences, and so on.

play05:16

Um, and one example would be, for example,

play05:18

that researchers who focus, uh,

play05:20

on the same research area are more li- more likely to collaborate with each other.

play05:25

Or researchers who focus on the same area are more likely to know each other

play05:29

or be friends with each other naturally because they attend the same conference,

play05:33

they interact with each other, and, uh,

play05:35

connections between them, uh, get formed.

play05:37

So this is in terms of, um, homophily.

play05:40

And to give you an example,

play05:42

this is an online social network,

play05:45

uh, from a high school, uh, where,

play05:48

uh, nodes are people,

play05:49

uh, edges are friendships,

play05:51

and color denotes their interests in terms of sports and arts.

play05:56

And what you notice immediately from this visualization is that there

play05:59

seems to be kind of four groups, uh, of people.

play06:02

And it seems that they are very much grouped

play06:04

based on this node color or based on interests, right?

play06:07

Is that green nodes tend to link with each other and, um,

play06:11

and ye- uh, yellow nodes,

play06:14

uh, tend to leak- link, uh, with each other.

play06:16

So it means that people with same interests are more

play06:19

likely or more closely connected due to this,

play06:22

um, effect of or this phenomena called homophily.

play06:27

Another, um, phenomena, or another, uh,

play06:31

force that creates these correlations in networks is kind of the other way around, right?

play06:36

If in homophily we say people have characteristics

play06:40

and people with similar characteristics tend to link to each other,

play06:43

the notion of social influence kind of flips the a- the,

play06:48

uh, the- the arrow in some sense, right?

play06:50

So it says that social connections can

play06:52

influence the individual characteristics of a person, right?

play06:56

So for example, if I recommend my musical preferences to my friends and I'm, you know,

play07:01

very, very, uh, persistent,

play07:04

perhaps one of them will- will grow to like my favorite genres, my favorite music.

play07:08

So this means that now this person just became more similar to me, right?

play07:12

It means we're very connected.

play07:14

And I influence them to kind of change their behavior,

play07:17

to change their preference so that the two of us are more similar.

play07:20

Which- which one explanation would be is, you know,

play07:23

this makes our bond even stronger,

play07:25

even easier to maintain.

play07:26

So here it was the social connection that

play07:29

affected or influenced the individual characteristic.

play07:33

And both of these phenomena are very,

play07:36

very common and very strong in, um, social networks.

play07:40

Um, and the correlations also exist in many other types of networks.

play07:44

And this is really the main intuition that we

play07:46

are going to exploit in today's, uh, lecture.

play07:50

So the question will be,

play07:52

how do we leverage this notion of correlation across the edges of the network,

play07:56

observe the networks, uh,

play07:58

to help to predict node labels, right?

play08:00

When I say correlation, I mean nodes that are connected tend to have the same label,

play08:05

tend to have the same preferences.

play08:07

So the question is,

play08:09

given this partially labeled network, you know, green,

play08:12

let's call that a positive class a- a- a label 1,

play08:16

and red will be what I'll call our negative class,

play08:18

and let's label it with label 0.

play08:21

And the gray nodes are the nodes that don't have the color yet.

play08:24

And the question is, how would I come up with an algorithm to learn,

play08:29

uh, or to predict the colors of, um, gray nodes?

play08:34

So the motivation is that similar nodes are

play08:37

typically close together or directly connected in the network.

play08:41

So the principal we are going to, uh,

play08:44

use is also known as guilt by association,

play08:47

in a sense that if I'm connected to a node we'd label X,

play08:51

then I'm likely to have that label X as well.

play08:54

And that's this notion of correlation I was saying about, right?

play08:57

So if you could say about, let's say,

play08:59

um, the malicious and benign web pages, you could say,

play09:02

malicious web- webpages tend to link to one another to increase visibility,

play09:07

uh, and look credible,

play09:08

and rank higher in search engines.

play09:11

So we find out that one web page is malicious,

play09:13

then perhaps other pages that link towards it also tend to be,

play09:18

uh, malicious. All right?

play09:20

Um, that's intuition.

play09:22

So the way we are going to think of this is that we are going to, uh,

play09:25

determine the classification label of a node v in the network,

play09:29

that it will depend on two factors.

play09:32

It will depend on the properties,

play09:33

features of the node V. And it will also depend on the labels,

play09:38

um, of the neighbors of- of the node v of interest.

play09:44

Uh, and of course, because the v's label depends on the labels of, um,

play09:51

nodes in the neighborhood,

play09:52

those labels will also depend on the features of those nodes in the neighborhood.

play09:57

So- so kind of- um,

play09:58

this means that also the label of node v will depend on the features of the nodes,

play10:03

uh, in its, uh, neighborhood.

play10:05

So here is how we are thinking about this, uh, graphically.

play10:10

Given a graph and a few,

play10:11

uh, labeled nodes, um,

play10:13

right we want to find labeled class of, ah, remaining nodes.

play10:18

When I say label, in this case it will be positive or negative,

play10:21

it will be, um,

play10:22

green or it will be red.

play10:24

Um, and the main assumption,

play10:26

the main modeling assumption,

play10:27

the main inductive bias in this, in our approach,

play10:31

will be to assume that there is some degree of homophily in the network.

play10:35

So that basically these, uh,

play10:37

labels tend to cluster,

play10:39

meaning that nodes of the same label tend to link to each other.

play10:43

So to give you an example task,

play10:46

let A be an adjacency matrix over n nodes.

play10:51

This is basically captures the structure of the graph.

play10:53

This adjacency matrix can be unweighted,

play10:56

can be also weighted,

play10:58

all methods generalize to weighted graphs as well,

play11:01

can be undirected or directed.

play11:03

All the methods generalize to both types of graphs.

play11:06

And we will use the Y as a vector of node labels, right?

play11:10

So we'll say Y_v equals 1, if the, um,

play11:14

node V belongs to class 1,

play11:16

to the green color,

play11:17

and Y_v equals 0.

play11:20

If the node V belongs to the class 0,

play11:23

meaning it is labeled with a red color.

play11:25

And of course there may be also other unlabelled nodes

play11:29

that- that need to be- whose label needs to be predicted,

play11:32

whose labeling needs to be classified.

play11:35

And right now, they- we don't know, ah, the label.

play11:38

And the goal is predict which labeled nodes are likely to be of class

play11:42

1 and which ones are likely to be of class 0.

play11:46

So that's the idea of what we wanna do.

play11:49

Um, there are many examples of this notion of collective classification, right?

play11:55

You can think about, I wanna do

play11:56

document classification and documents linked to each other.

play12:00

I wanna do link prediction in, in, in graphs.

play12:04

And the links will depend on the properties,

play12:06

labels of the neighbors.

play12:08

Um, even like you can take certain other domains where you have

play12:11

optical character recognition and you can represent that as a graph and say, you know,

play12:16

the label of what character I am also

play12:19

depends on the labels what are the characters around me,

play12:22

meaning that I can,

play12:23

I-I know how to form let's say English,

play12:27

English words and you know what,

play12:29

some random character, sequence of characters is very unlikely.

play12:32

So, you know, whether I'm letter A or letter

play12:36

G will depends what my neighbors here in the,

play12:39

let's say the line graph think about.

play12:41

So there is a lot of different cases where you can basically want to

play12:45

make prediction about one object based on the relationships of the object to its,

play12:50

uh, nearby, uh, objects in terms of nodes,

play12:53

images, letters in OCR, part-of-speech tagging.

play12:57

In many other cases,

play12:59

knowing what- what- what are the labels of

play13:03

the nodes around you helps you determine your own label.

play13:06

That's essentially the idea.

play13:08

So collective classification that we are going to talk about today is going to have,

play13:14

um, three different parts, right?

play13:17

So the intuition we are going to have is that you wanna simultaneously classify, ah,

play13:22

linked nodes using address and propagate information across the edges of the network.

play13:28

And this will be our probabilistic framework.

play13:30

Where do we will be making what is called a Markov assumption.

play13:34

And a Markov assumption means that the label of a node,

play13:37

um, only depends on the labels of its neighbors, right?

play13:41

So this is a first-order Markov assumption because we only

play13:44

assume that a label depends on the label of the neighbors.

play13:48

And we don't- for example,

play13:50

assume that the label depends on the label of neighbors, of neighbors, right.

play13:54

Like degree 2 neighborhood.

play13:56

We only look at the degree 1 neighborhood.

play13:58

And this notion of collective classification,

play14:00

the reason why we use this experiment is because we are- we will be

play14:03

altogether classifying all the nodes on the graph because every,

play14:07

every nodes labeled depends on other nodes labeled.

play14:10

So we are going to iteratively,

play14:12

ah, reclassify, reclassify nodes.

play14:16

Nodes are going to update the belief for a prediction about

play14:19

the labels until the process will converge.

play14:22

And in order for us to do this kind of collective iterative classification,

play14:26

we will need three types of, ah, classifiers.

play14:30

We'll have these local classifier that assigns the initial label to the node,

play14:34

we'll then have what we call a relational classifier that

play14:38

captures between- correlations between nodes and you will basically say,

play14:42

aha, what are the labels of other nodes in

play14:44

the network of the neighbors of the node of interest.

play14:47

And then we'll have this notion of collective inference,

play14:50

where we will be- where we will be propagating these correlations,

play14:54

these beliefs over the network until the labels wi- will

play14:58

converge to some stable state or until some fixed number of iterations,

play15:03

um, will be achieved.

play15:05

So what are these three pieces that we need to define?

play15:10

First, we have this notion of our local classifier that

play15:13

will assign initial labels to unlabeled nodes.

play15:16

So this is used for initial label assignment.

play15:19

And it will predict label of node based on its attributes or features.

play15:25

Um, it is just a standard classification task where given a set of,

play15:29

ah, given features of a node,

play15:31

we wanna predict it's labeled.

play15:32

And these does not use the network information yet.

play15:35

So this is applied only once at the beginning to give initial labels to the gray nodes.

play15:42

Then we will define this notion of

play15:45

a relational classifier that will capture the correlations between the nodes.

play15:50

So what does this mean?

play15:52

Is that we learn another predictor that will predict a label of

play15:56

one node based on the labels or attributes of other nodes in its neighborhood.

play16:02

And this is where the network information will be used

play16:05

because this relational classifier will say what is,

play16:08

what is given nodes label based on the labels of the nodes that are connected to it.

play16:13

And this is where the network information is used.

play16:16

And then we won't only apply this relational classifier once,

play16:21

but we are going to apply it in rounds.

play16:23

So we'll- we will have this collective inference stretch,

play16:26

where we are going to keep updating

play16:28

the predictions based on the updated predictions on the neighbors, right?

play16:32

So we are going to apply a relational classifier to each node iteratively

play16:37

and iterate until the inconsistency between neighboring nodes is minimize,

play16:42

meaning network structure is going to affect the predictions and

play16:45

these predictions are going to converge and the predictions are going to stabilize.

play16:50

And usually we will either run this, ah,

play16:53

iteration until it stabilizes or until some maximum number of iterations, ah, is reached.

play16:59

And I will give you specific examples,

play17:01

ah, what I mean by that.

play17:03

So the problem setting is how do we predict labels Y_v of unlabeled node V?

play17:11

Here denoted in Grey color.

play17:13

Each node V will have a feature vector F_v. Labels of some nodes will be given to us.

play17:19

Ah, you know, we'll use label 1 for green nodes and labeled 0 for red nodes.

play17:24

Ah, and the task is find the probability that a given node, um, is,

play17:31

let's say positive is green based on the features it has,

play17:35

as well as the network structure and the colors,

play17:38

ah, of the nodes around it.

play17:40

So that's the problem we are trying to solve.

play17:42

And we are going to solve this by propagating the beliefs,

play17:46

the propagating the information, ah,

play17:48

across the underlying network structure in an iterative way.

play17:53

So what's the overview of what is coming?

play17:56

We are going to focus on this notion of semi-supervised node classification.

play18:00

Semi-supervised in a sense that we are given

play18:03

both labeled and unlabeled data at the same time,

play18:06

we are given a partially labeled network.

play18:08

Ah, we are going to use this notion- this intuition of the notion of homophily,

play18:12

that similar nodes are typically close together or directly connected in the network.

play18:18

And we are going to talk about three techniques,

play18:20

about the relational classification,

play18:22

iterative classification, and then last,

play18:25

I'm going to talk about belief propagation.

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
Graph TheoryMachine LearningSemi-supervisedNode ClassificationHomophilySocial NetworksData ScienceNetwork AnalysisBelief PropagationInformation Propagation