Stanford CS224W: Machine Learning with Graphs | 2021 | Lecture 1.3 - Choice of Graph Representation​

Stanford Online
13 Apr 202120:27

Summary

TLDRThis lecture from Stanford's CS 224W course on machine learning with graphs covers essential concepts in graph theory. It introduces the basic components of graphs: nodes (vertices) and edges (links), and discusses different graph representations, such as directed, undirected, and bipartite graphs. The speaker explains the importance of choosing the right graph representation for different types of data, such as social networks, protein interactions, and citation networks. Additionally, the lecture explores topics like adjacency matrices, node degrees, graph sparsity, and graph connectivity, including strong and weak connections in directed graphs.

Takeaways

  • 📊 A network or graph consists of nodes (entities) and edges (interactions) that define the relationships between nodes.
  • 🔄 Directed graphs have edges with specific directions, while undirected graphs represent reciprocal relationships without direction.
  • 📈 Node degree refers to the number of edges connected to a node, with undirected graphs counting each edge twice.
  • ⚖️ Bipartite graphs are composed of two distinct sets of nodes where edges only exist between nodes from different sets, such as authors linked to papers or customers linked to products.
  • 🧠 Choosing the correct graph representation (nodes and edges) is crucial for accurate machine learning predictions in different contexts, from social networks to biological systems.
  • 🔗 Strongly connected components in directed graphs mean that there is a path from every node to every other node in the component.
  • 📋 Graphs can be represented by adjacency matrices, edge lists, or adjacency lists, each having different strengths depending on the size and sparsity of the network.
  • 🧩 Sparse networks, common in real-world applications, have few edges compared to the number of possible connections, impacting matrix representations and graph analysis.
  • 📚 Graphs can have attributes associated with nodes and edges, such as weights, rankings, or types, which can add depth to the data represented by the network.
  • 🎯 The connectivity of a graph (whether it's connected or disconnected) is crucial in understanding how nodes interact and in identifying isolated or clustered components.

Q & A

  • What are the two main components of a graph or network?

    -A graph or network is composed of nodes (or vertices), which represent objects or entities, and edges (or links), which represent interactions or relationships between these nodes.

  • Why is the choice of graph representation important?

    -The choice of graph representation is crucial because it determines how well the graph captures the underlying relationships of the data. A proper representation allows for more accurate predictions and insights, while an improper representation might lead to misleading or less meaningful conclusions.

  • What are directed and undirected graphs, and when are they used?

    -Directed graphs have edges with a specific direction, indicating a one-way relationship between nodes (e.g., phone calls, financial transactions). Undirected graphs have bidirectional edges, representing reciprocal relationships (e.g., friendships, collaborations).

  • What is the significance of node degree in a graph?

    -Node degree refers to the number of edges connected to a node. In undirected graphs, it is the total number of connections. In directed graphs, it is split into in-degree (edges pointing to the node) and out-degree (edges originating from the node). The node degree helps to understand the node's connectivity and importance in the network.

  • What is a bipartite graph, and can you give an example?

    -A bipartite graph consists of two types of nodes, where edges only connect nodes of different types. An example is an author-paper graph, where authors are linked to the papers they have written, but authors are not directly connected to each other.

  • How does projecting a bipartite graph work?

    -Projecting a bipartite graph involves creating a new graph using only one type of node. Nodes are connected if they share at least one neighbor in the original bipartite graph. For example, projecting an author-paper graph can result in a co-authorship network where authors are connected if they have co-authored a paper.

  • What are the three common ways to represent a graph computationally?

    -The three common ways are: 1) Adjacency matrix: a square matrix indicating connections between nodes; 2) Edge list: a list of node pairs representing edges; 3) Adjacency list: a list of each node's neighbors. Adjacency matrices are good for dense networks, while adjacency lists are better for large, sparse networks.

  • What is the concept of graph sparsity, and why is it important?

    -Graph sparsity refers to the fact that most real-world networks have a low average degree compared to the total number of nodes, meaning they have few connections relative to the number of possible connections. This sparsity is important because it affects the efficiency of graph storage and computation.

  • What are strongly connected components in directed graphs?

    -Strongly connected components are subsets of nodes in a directed graph where there is a directed path between any two nodes within the subset. This means that every node can be reached from every other node in the component.

  • What role do attributes play in graphs, and how can they be represented?

    -Attributes provide additional information about nodes, edges, or the entire graph. For example, an edge can have a weight indicating the strength of a relationship. These attributes can be represented in the adjacency matrix, where entries can hold values other than binary to indicate connection strengths.

Outlines

00:00

🧠 Introduction to Graph Representations and Network Components

The instructor discusses the components of a graph, which consist of nodes (also called vertices) and edges (or links). These can represent various entities such as people, actors, or molecules, and the relationships between them. The same mathematical representation of a graph can be applied across different domains, whether analyzing social networks, protein interactions, or collaboration networks. The choice of graph representation greatly influences the quality of analysis, making it crucial to decide on how to define nodes and edges based on the dataset and domain.

05:00

🔗 Understanding Node Degree and Bipartite Graphs

This paragraph explains node degree in both undirected and directed graphs. In undirected graphs, node degree is the number of edges connected to a node, while in directed graphs, in-degree and out-degree indicate edges pointing to and away from the node, respectively. The concept of bipartite graphs is introduced, where nodes are split into two groups, and edges only connect nodes between these groups. Examples include scientific papers and authors, or customers and products.

10:01

📊 Graph Projections and Adjacency Matrices

The instructor discusses how bipartite graphs can be projected onto one of the partitions to create collaboration networks or co-rating networks. This projection creates connections between nodes in one partition based on shared neighbors. The paragraph also introduces adjacency matrices, a method to represent graphs using a matrix format. The matrix is binary, with 1s indicating connections between nodes and 0s for no connection. In undirected graphs, this matrix is symmetric, while in directed graphs it is not.

15:02

🔢 Graph Sparsity and Representation

The concept of sparse networks is highlighted, where most nodes have very few connections, resulting in sparse adjacency matrices. This characteristic is common in real-world networks such as social networks or scientific collaboration networks. Sparse matrices require special representation methods, such as edge lists or adjacency lists, to efficiently handle large, sparse graphs. Adjacency lists are particularly useful for analyzing large, sparse networks because they allow quick retrieval of a node's neighbors.

20:02

⚖️ Weighted Graphs, Self-loops, and Connectivity

This paragraph introduces weighted graphs, where edges can have weights representing the strength of a relationship. It also discusses self-loops, where nodes connect to themselves, and multi-graphs, which allow multiple edges between nodes. Connectivity is another important concept, where a graph is connected if all nodes can be reached via paths. In directed graphs, weak and strong connectivity are differentiated, with strong connectivity requiring a path between every pair of nodes.

📐 Summary of Graph Representations and Theories

The lecture concludes by summarizing various graph representation choices, including directed and undirected graphs, bipartite graphs, weighted graphs, and adjacency matrices. Important graph theory concepts such as connectivity, weak and strong connectivity, and node degree are also revisited. The importance of selecting the right graph representation based on the problem and dataset is emphasized, influencing the types of machine learning tasks and predictions that can be performed using graph structures.

Mindmap

Keywords

💡Graph

A graph is a mathematical representation consisting of nodes (also called vertices) and edges (also referred to as links), used to model relationships between entities. In the video, the concept of a graph is central to the discussion of machine learning with graphs, where the nodes could represent people, molecules, or objects, and the edges represent the connections between them.

💡Nodes

Nodes, also called vertices, are the individual entities or objects in a graph. For example, in a social network, nodes represent people, and in a molecular network, nodes might represent proteins. The choice of nodes helps define the structure and purpose of the graph, as mentioned in the video when discussing the design of graph representations.

💡Edges

Edges, also known as links, are the connections between nodes in a graph. They represent the relationships or interactions between entities, such as phone calls between people or interactions between proteins. The video emphasizes the importance of selecting the appropriate relationships to form edges, which affects the overall quality of the graph.

💡Directed vs. Undirected Graphs

A directed graph has edges with a specific direction, such as a one-way relationship like following someone on Twitter, while an undirected graph has edges without direction, representing mutual relationships like friendships. In the video, directed and undirected graphs are used to model different types of interactions, depending on whether the relationships are reciprocal or not.

💡Adjacency Matrix

An adjacency matrix is a square matrix used to represent a graph, where each cell indicates whether a pair of nodes is connected by an edge. In the video, it is explained as a useful tool for representing graphs, especially when determining node degrees or identifying connections. The matrix is symmetric for undirected graphs and may contain weights if the edges have associated values.

💡Node Degree

Node degree is the number of edges connected to a node in a graph. For undirected graphs, it refers to the total number of connections, while for directed graphs, there are in-degrees (incoming edges) and out-degrees (outgoing edges). The video explains how node degrees can help describe the structure of a network, such as counting how many people a person is connected to in a social network.

💡Bipartite Graph

A bipartite graph is a type of graph where nodes can be split into two distinct sets, and edges only exist between nodes in different sets. Examples provided in the video include authors and papers, or customers and products, where each set interacts exclusively with the other set, making bipartite graphs useful for modeling certain types of networks.

💡Sparse Matrix

A sparse matrix is a matrix in which most of the elements are zero, often used to represent large, real-world graphs where only a small number of nodes are actually connected. The video emphasizes that real-world networks, like social or communication networks, are typically sparse, with only a small fraction of all possible connections being realized.

💡Weighted Graph

A weighted graph is a graph in which edges have weights or values associated with them, indicating the strength or capacity of the connection. For example, in a network of friendships, weights might represent the strength of the relationship. The video discusses how adjacency matrices of weighted graphs can store these values to represent varying degrees of connection between nodes.

💡Strongly Connected Components

In a directed graph, a strongly connected component is a subset of nodes where every node can be reached from every other node via directed edges. The video describes how identifying these components is important for understanding the structure of directed graphs and determining whether the graph is strongly or weakly connected.

Highlights

Introduction to graph representation and its importance in machine learning with graphs.

A graph is composed of two main components: nodes (or vertices) and edges (or links).

Graph representations can vary depending on the problem; nodes and edges represent different objects and relationships in various datasets.

The choice of a proper graph representation is crucial for successful analysis and predictions.

Graphs can be directed or undirected. Directed graphs are useful for modeling relationships with direction, like phone calls or financial transactions.

Node degree refers to the number of edges adjacent to a given node, and it can be calculated differently for directed and undirected graphs.

Bipartite graphs consist of two different types of nodes, where connections only occur between different node types, e.g., authors linked to papers or customers linked to products.

Projected or folded networks are created from bipartite graphs, such as collaboration networks formed by co-authorship between authors.

Adjacency matrices represent graphs as square matrices where the presence or absence of edges is indicated by binary values (0 or 1).

Real-world networks tend to be sparse, meaning large portions of the adjacency matrix will be empty (no connections).

Sparse graphs are efficiently represented using edge lists or adjacency lists, making it easier to manipulate and analyze.

Nodes and edges can have attributes, such as weights, directions, or other properties, which can be directly represented in the adjacency matrix.

Self-loops and multi-graphs are special graph types where a node may have edges connected to itself, or multiple edges between node pairs.

Connectivity in graphs determines whether nodes can be reached from one another, with concepts of strong and weak connectivity in directed graphs.

Machine learning applications of graphs span across node, edge, and graph-level prediction tasks, all depending on the chosen representation.

Transcripts

play00:04

In this part of, uh, Stanford, CS 224 W, um,

play00:09

machine-learning with graphs course,

play00:11

I wanna talk about the choice of graph representation.

play00:15

[NOISE] So what are components of a graph or a network?

play00:20

So network is composed of two types of objects.

play00:23

We- first, we have objects or entities themselves called,

play00:26

uh, referred to as nodes, uh,

play00:29

and vertices, and then we have interactions or edges between them,

play00:33

uh, called links or, uh, edges.

play00:36

And then the entire system, the entire, um,

play00:39

domain we then call a- a network,

play00:41

uh, or a graph.

play00:43

Usually, for nodes, we will use, uh, uh,

play00:45

the word- the letter capital N or capital V,

play00:49

um, and then for edges,

play00:50

we- we are usually using the-

play00:53

the letter capital E so that the graph G is then composed of a set of nodes,

play00:57

uh, N and a set of edges, uh,

play01:00

E. What is important about graphs is that graphs are a common language,

play01:05

meaning that I can take, for example, uh,

play01:07

actors and connect them based on which movies they appeared in,

play01:11

or I can take people based on the relationships they have with each other,

play01:16

or I can take molecules, like proteins,

play01:19

and build a network based on which proteins interact with each other.

play01:23

If I look at what is the structure of this network,

play01:25

what is the underlying mathematical representation,

play01:28

in all these cases,

play01:29

we have the same underlying mathematical representation,

play01:32

which means that the same machine learning algorithm will be

play01:36

able to make predictions be it that these nodes,

play01:40

um, uh, correspond to actors, correspond to, uh,

play01:44

people, or they correspond to molecules like proteins.

play01:47

[NOISE] Of course, choosing a proper graph representation is very important.

play01:54

So for example, if you have a set of people,

play01:57

we can connect individuals that work with each

play01:59

other and we will have a professional network.

play02:02

However, we can also take the same set of individuals

play02:05

and connect them based on sexual relationships, but then,

play02:09

we'll ab- creating a sexual network, or for example,

play02:12

if we have a set of scientific papers,

play02:14

we can connect them based on citations,

play02:17

which paper cites which other paper.

play02:19

But for example, if we were to connect them based

play02:21

on whether they use the same word in the title,

play02:24

the- the quality of underlying network and the underlying,

play02:28

uh, representations might be, uh, much worse.

play02:31

So the choice of what the- the nodes are and what the links are is very important.

play02:37

So whenever we are given a data set,

play02:39

then we need to decide how are we going to design the underlying graph,

play02:44

what will be the objects of interest nodes,

play02:46

and what will be the relationships between them,

play02:49

what will be the edges.

play02:50

The choice of this proper network representation of a given domain or a given problem

play02:55

deter- will determine our ability to use networks, uh, successfully.

play03:00

In some cases, there will be a unique,

play03:03

unambiguous way to represent this, um, problem,

play03:07

this domain as a graph,

play03:09

while in other cases,

play03:10

this representation may, by no means, be unique.

play03:14

Um, and the way we will assign links between the objects will determine, uh,

play03:19

the nature of the questions we will be able to study and the nature of the,

play03:24

um, predictions we will be able to make.

play03:28

So to show you some examples of design choices we are faced with when co-creating graphs,

play03:35

I will now go through some concepts and different types of graphs,

play03:38

uh, that we can- that we can create from data.

play03:41

First, I wi- I will distinguish between directed and undirected graphs, right?

play03:46

Undirected graphs have links,

play03:48

um, that- that are undirected, meaning,

play03:50

that they are useful for modeling symmetric or reciprocal relationships,

play03:55

like collaboration, friendship, um,

play03:58

and interaction between proteins,

play04:01

and so on, while directed, um,

play04:03

relationships are captured by directed links,

play04:07

where every link has a direction, has a source,

play04:10

and has a destination denoted by a- by an arrow.

play04:14

And examples of these types of, um,

play04:16

links occurring in real-world would be phone calls,

play04:19

financial transactions, uh, following on Twitter,

play04:22

where there is a source and there is a destination.

play04:27

The second type of, um- um, uh,

play04:30

graphs that we are going to then talk about is that as we have, um,

play04:35

created undirected graphs, then,

play04:39

um, we can talk about the notion of a node degree.

play04:41

And node degree is simply the number of edges,

play04:45

um, adjacent to a given, uh, node.

play04:48

So for example, the node a in this example has degree 4.

play04:51

The average node degree is simply the- is

play04:55

simply the average over the degrees of all the nodes in the network.

play04:58

And if- if you work this out,

play05:00

it turns out to be twice number of edges divided by the number of nodes,

play05:04

uh, in the network.

play05:05

The reason there is this number 2 is

play05:08

because when we are computing the degrees of the nodes,

play05:10

each edge gets counted twice, right?

play05:13

Each endpoint of the n- of the edge gets

play05:16

counted once because the edge has two end points,

play05:19

every edge gets counted twice.

play05:21

This also means that having a self edge or self-loop, um,

play05:25

adds a degree of two to the node,

play05:28

not a degree of one to the node because both end points attach to the same, uh, node.

play05:34

This is for undirected networks.

play05:37

In directed networks, we distinguish between, uh,

play05:39

in-degree and out-degree, meaning

play05:42

in-degree is the number of edges pointing towards the node.

play05:45

For example, node C has in-degree 2 and the out-degree, um, 1,

play05:49

which is the number of edges pointing outside- outward

play05:53

from the- from the node, uh, c. Um,

play05:58

another, uh, very popular type of graph structure

play06:01

that is- that is used a lot and it's very natural in different domains,

play06:05

it's called a bipartite graph.

play06:07

And bipartite graph is a graph generally of nodes of two different types,

play06:12

where nodes only interact with the other type of node,

play06:16

but not with each other.

play06:17

So for example, a bipartite graph is a graph where nodes can be split

play06:22

into two partitions and the- the edges only go from left,

play06:27

uh, to the right partition and not inside the same partition.

play06:31

Examples of, uh, bipartite graphs that naturally occur are,

play06:35

for example, uh, scientific authors linked to the papers they authored,

play06:40

actors linked to the movies they appeared in,

play06:43

users linked to the movies they rated or watched,

play06:47

um, and so on.

play06:49

So- or for example,

play06:51

customers buying products, uh,

play06:53

is also a bipartite graph where we have a set of customers,

play06:57

a set of products,

play06:58

and we link, uh, customer to the product, uh, she purchased.

play07:03

Now that we have defined a bipartite network,

play07:06

we can also define the notion of a folded or projected network, where we can create,

play07:11

for example, author collaboration networks,

play07:14

or the movie co-rating network.

play07:17

And the idea is as follows: if I have a bipartite graph,

play07:20

then I can project this bipartite graph to either to the left side or to the right side.

play07:25

And when- and when I project it, basically,

play07:28

I only use the nodes from one side in my projection graph,

play07:32

and the way I connect the nodes is to say,

play07:35

I will create a connection between a pair of nodes

play07:37

if they have at least one neighbor in common.

play07:40

So if these are authors and these are scientific papers,

play07:44

then basically, it says,

play07:45

I will create a co- collaboration or a co-authorship graph where I will

play07:49

connect a pair of authors if they co-authored at least one paper in common.

play07:54

So for example, 1, 2,

play07:55

and 3 co-authored this paper,

play07:58

so they are all connected with each other.

play08:00

For example, 3 and 4 did not co-author a paper,

play08:04

so there is no link between them.

play08:06

But for example, 5 and 2 co-authored a paper,

play08:09

so there is a link between them because they co-authored this,

play08:12

uh, this paper here.

play08:14

And in analogous way,

play08:16

you can also create a projection of

play08:18

this bipartite network to the- to the right-hand side,

play08:21

and then you will- you would obtain a graph like this.

play08:24

And as I said, bipartite graphs or multipartite graphs,

play08:28

if you have multiple types of edges,

play08:30

are very popular, especially,

play08:32

if you have two different types of nodes,

play08:34

like users and products, um,

play08:36

uh, users and movies, uh,

play08:38

authors and papers, um,

play08:40

and so on and so forth.

play08:42

[NOISE] Another interesting point about graphs is how do we represent them,

play08:49

um, and representing graphs,

play08:51

uh, is an interesting question.

play08:53

One way to represent a graph is to represent it with an adjacency matrix.

play08:58

So essentially, if for a given,

play09:00

uh, undirected, for example, graph,

play09:02

in this case on end nodes, in our case,

play09:06

4, we will create a square matrix,

play09:08

where this matrix will be binary.

play09:10

It will o- only take entries of 0 and 1.

play09:13

And essentially, an entry of matrix ij will be set to 1 if nodes i and j are connected,

play09:20

and it will be set to 0 if they are not connected.

play09:23

So for example, 1 and 2 are connected,

play09:25

so at entry 1, row 1,

play09:27

column 2, there is a 1.

play09:29

And also, because 2 is connected to 1 at row 2,

play09:33

column 1, we also have a 1.

play09:35

So this means that adjacency matrices of,

play09:38

uh, undirected graphs are naturally symmetric.

play09:41

If the graph is directed,

play09:44

then the matrix won't be symmetric because 2 links to 1.

play09:48

We have a 1 here,

play09:49

but 1 does not link back to 2,

play09:52

so there is a 0.

play09:54

Um, and in similar way,

play09:57

we can then think of node degrees, um, uh,

play10:00

simply as a summation across a given row or

play10:03

across a given one column of the graph, uh, adjacency matrix.

play10:07

So rather than kind of thinking here how many edges are adjacent,

play10:11

we can just go and sum the- basically,

play10:14

count the number of ones,

play10:15

number of other nodes that this given node is connected to.

play10:19

Um, this is for, um, undirected graphs.

play10:22

For directed graphs, uh,

play10:24

in and out degrees will be sums over columns and sums over rows, uh,

play10:28

of the graph adjacency matrix,

play10:32

as- as I illustrate here, uh,

play10:34

with this, um, illustration.

play10:38

One important consequence of a real-world network is that they are extremely sparse.

play10:44

So this means if you would look at the adjacency matrix,

play10:48

series on adjacency matrix of a real-world network where basically for every, um, row I,

play10:54

column J, if there is an edge,

play10:55

we put a dot and otherwise the cell is empty, uh,

play10:59

you get these types of super sparse matrices where,

play11:03

where there are large parts of the matrix that are empty, that are white.

play11:07

Um, and this has important consequences for properties

play11:11

of these matrices because they are extremely, uh, sparse.

play11:14

To show you an example, right?

play11:17

Uh, if you have a network on n nodes,

play11:20

nodes, then the maximum degree of a node,

play11:23

the number of connections a node has is n minus one

play11:26

because you can connect to every oth- in principle,

play11:29

connect to every other node in the network.

play11:32

So for example, if you are a human and you think about human social network, uh,

play11:37

the maximum degree that you could have,

play11:40

the maximum number of friends you could have is every other human in the world.

play11:44

However, nobody has seven billion friends, right?

play11:48

Our number of friendships is much, much smaller.

play11:51

So this means that, let's say the human social network is extremely sparse,

play11:55

and it turns out that a lot of other,

play11:58

uh, different types of networks,

play12:00

you know, power-grids, uh, Internet connection,

play12:03

science collaborations, email graphs,

play12:06

uh, and so on and so forth are extremely sparse.

play12:09

They have average degree that these, you know,

play12:11

around 10 maybe up to, up to 100.

play12:15

So, uh, what is the consequence?

play12:17

The consequence is that the underlying adjacency matrices,

play12:21

um, are extremely sparse.

play12:23

So we would never represent the matrix as a dense matrix,

play12:27

but we've always represent it as a sparse matrix.

play12:31

There are two other ways to represent graphs.

play12:34

One is simply to represent it as a edge list,

play12:37

simply as a list of edges.

play12:39

Uh, this is a representation that is quite popular, um,

play12:42

in deep learning frameworks because we can simply

play12:45

represent it as a two-dimensional matrix.

play12:48

The problem of this representation is that it is very

play12:51

hard to do any kind of graph manipulation or

play12:54

any kind of analysis of the graph because even

play12:56

computing a degree of a given node is non-trivial,

play13:00

uh, in this case.

play13:02

A much, uh, better, uh,

play13:04

representation for a graph analysis and manipulation is the notion of adjacency list.

play13:10

Um, and adjacency lists are good because they are easier to

play13:13

work with if for large and sparse networks.

play13:17

And adjacency list simply allows us to

play13:19

quickly retrieve al- all the neighbors of a given node.

play13:22

So you can think of it, that for every node,

play13:24

you simply store a list of its neighbors.

play13:27

So a list of nodes that the,

play13:30

that the- a given node is connected to.

play13:32

If the graph is undirected,

play13:34

you could store, uh, neighbors.

play13:36

If the graph is connected,

play13:37

you could store both the outgoing neighbors,

play13:40

as well as, uh, incoming neighbors based on the direction of the edge.

play13:45

And the last important thing I want to mention here is that of course,

play13:50

these graph can- can have attached attributes to them.

play13:54

So nodes address, as well as

play13:57

entire graphs can have attributes or properties attached to them.

play14:01

So for example, an edge can have a weight.

play14:04

How strong is the relationship?

play14:06

Perhaps it can have my ranking.

play14:07

It can have a type.

play14:09

It can have a sign whether this is a friend-based relationship or whether it's animosity,

play14:14

a full distrust, let say based relationships.

play14:17

Um, and edges can have di- many different types of properties,

play14:21

like if it's a phone call, it's,

play14:23

it's duration, for example.

play14:25

Nodes can have properties in- if these are people,

play14:27

it could be age, gender,

play14:29

interests, location, and so on.

play14:32

If a node is a, is a chemical,

play14:34

perhaps it is chemical mass,

play14:37

chemical formula and other properties of the- of

play14:40

the chemical could be represented as attributes of the node.

play14:44

And of course, also entire graphs can have features or, uh,

play14:48

attributes based on, uh,

play14:51

the properties of the underlying object that the graphical structure is modeling.

play14:55

So what this means is that the graphs you will be considering

play14:58

are not just the topology nodes and edges,

play15:01

but it is also the attributes,

play15:05

uh, attached to them.

play15:07

Um, as I mentioned,

play15:09

some of these properties can actually be

play15:12

represented directly in the adjacency matrix as well.

play15:15

So for example, properties of edges like

play15:18

weights can simply be represented in the adjacency matrix, right?

play15:21

Rather than having adjacency matrix to be binary,

play15:24

we can now have adjacency matrix to have real values where

play15:28

the strength of the connection corresponds simply to the value,

play15:32

uh, in that entry.

play15:33

So two and four are more strongly linked,

play15:36

so the value is four,

play15:37

while for example, one and three are linked with

play15:40

a weak connection that has weight only 0.5.

play15:43

Um, as a- um,

play15:46

another important thing is that when we create the graphs is that we also

play15:49

can think about nodes having self-loops.

play15:53

Um, for example, here,

play15:54

node four has a self-loop, uh,

play15:57

and now the degree of node four equals to three.

play16:00

Um, self-loops are simply correspond to

play16:03

the entries on the diagonal of the adjacency matrix.

play16:06

And in some cases,

play16:08

we may actually create a multi-graph where we

play16:11

allow multiple edges between a pair of nodes.

play16:15

Sometimes we can, we can think of a multi-graph as

play16:18

a weighted graph where the entry on the matrix counts the number of edges,

play16:22

but sometimes you want to represent every edge individually,

play16:25

separately because these edges might have different properties,

play16:29

um, and different, um, attributes.

play16:32

Both, um, the self-loops,

play16:35

as well as multi-graphs occur quite frequently in nature.

play16:39

Uh, for example, if you think about phonecalls transactions,

play16:43

there can be multiple transactions between a pair of nodes

play16:45

and we can accurately represent this as a multi-graph.

play16:49

Um, as we have these graphs, I,

play16:52

I also want to talk about the notion of connectivity,

play16:56

in a sense, whether the graph is connected or disconnected.

play16:59

And graph is connected if any pair of nodes in, uh, in this, uh,

play17:04

graph can be, can be connected via a path along the edges of the graph.

play17:10

So for example, this particular graph is

play17:12

connected while this other graph is not connected,

play17:15

it has three connected components.

play17:17

This is one connected component, second connected component,

play17:21

then a third connected component,

play17:23

the node h, which is an isolated node.

play17:26

This is the notion of connectivity for undirected graphs, uh,

play17:31

and what is interesting in this notion is,

play17:34

that when we, um,

play17:35

have graphs that are,

play17:37

for example, disconnect it and look at what is

play17:39

the structure of the underlying adjacency matrix,

play17:42

we will have these block diagonal structure, where, basically,

play17:45

if this is a graph that is composed of two components, then we will have,

play17:50

um, um, block diagonal structure where the edges only go between the,

play17:55

um, nodes inside the same, um, connected component,

play17:59

and there is no edges in the off-diagonal part,

play18:02

which would mean that there is no edge between,

play18:04

uh, red and blue,

play18:06

uh, part of the graph.

play18:08

The notion of connectivity also generalizes to directed graphs.

play18:13

Here, we are talking about two types of connectivity,

play18:16

strong and weak connectivity.

play18:19

A weakly connected directed graph is simply a graph that is connected,

play18:23

uh, in- if we ignore the directions of the edges.

play18:27

A strongly connected graph, um,

play18:30

or a graph is strongly connected if for every pair of

play18:33

nodes there exists a directed path between them.

play18:37

So, um, this means that there has to exist a directed path from, for example,

play18:43

from node A to node B,

play18:45

as well as from node B back to, uh,

play18:48

node A if the graph is strongly connected.

play18:51

What this also means is that we can talk about notion of

play18:56

strongly connected components where strongly connected components are,

play19:01

uh, sets of nodes in the graph, uh,

play19:03

such that every node, uh,

play19:05

in that set can visit each other via the- via a directed path.

play19:09

So for example, in this case here,

play19:12

nodes, uh, A, B,

play19:14

and C form a strongly connected component because they are on a cycle.

play19:20

So we ca- any- from any node we can visit, uh, any other node.

play19:24

Uh, the example here shows, uh,

play19:26

directed graph with two strongly connected component,

play19:29

again, two cycles on, um three nodes.

play19:33

So this concludes the discussion of the- er- the graph representations,

play19:40

um, that- and ways how we can create graphs from real data.

play19:44

Um, in this lecture,

play19:46

we first talked about machine-learning with graphs and various applications in use cases.

play19:52

We talked about node level, edge level,

play19:54

and graph level machine-learning prediction tasks.

play19:57

And then we discussed the choice of a graph representation in terms of directed,

play20:01

undirected graphs, bipartite graphs,

play20:04

weighted, uh, unweighted graphs,

play20:06

adjacency matrices, as well as some definitions from graph theory,

play20:11

like the connectivity, um, of graphs,

play20:14

weak connectivity, strong connectivity,

play20:16

as well as the notion of node degree.

play20:19

Um, thank you very much.

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
Machine LearningGraph TheoryNetwork AnalysisAdjacency MatrixBipartite GraphDirected GraphsUndirected GraphsConnectivitySparse NetworksNode Degree
Besoin d'un résumé en anglais ?