Stanford CS224W: Machine Learning with Graphs | 2021 | Lecture 1.1 - Why Graphs

Stanford Online
13 Apr 202111:54

Summary

TLDRCS224W, Machine Learning with Graphs, is an introductory course led by Associate Professor Jure Leskovec at Stanford University. The course emphasizes the significance of graph-structured data in various domains, such as social networks, biomedicine, and computer science. It explores the application of novel machine learning methods to model and analyze these complex relational structures. The curriculum covers traditional machine learning methods, node embeddings, graph neural networks, and their scalability. The course also delves into heterogeneous graphs, knowledge graphs, and their applications in logical reasoning, biomedicine, and industry, aiming to equip students with the tools to harness the power of graph data for accurate predictions and insights.

Takeaways

  • 🌟 Introduction to CS224W, a course on Machine Learning with Graphs, taught by Jure Leskovec, an Associate Professor at Stanford University.
  • πŸ“ˆ Graphs are a fundamental data structure for representing entities, their relations, and interactions, moving beyond isolated data points to a network perspective.
  • πŸ” Graphs can model various domains effectively, including computer networks, disease pathways, social networks, economic transactions, and more, capturing the relational structure of these domains.
  • 🧠 The importance of graph representation is highlighted by its ability to capture complex relationships, such as those in the brain's neurons or molecules' atomic structures.
  • πŸ“Š The course aims to explore how machine learning, particularly deep learning, can be applied to graph-structured data to improve predictions and model accuracy.
  • 🌐 The challenge of processing graphs in deep learning is addressed, noting their complex topology and lack of spatial locality compared to sequences and grids.
  • πŸ€– The development of neural networks for graph data is a new frontier in deep learning and representation learning research, focusing on end-to-end learning without manual feature engineering.
  • πŸ“ˆ The concept of representation learning is introduced, where the goal is to map graph nodes to d-dimensional embeddings for better data analysis and machine learning.
  • πŸŽ“ The course will cover a range of topics, from traditional machine learning methods for graphs to advanced deep learning approaches like graph neural networks.
  • πŸ”— Special attention will be given to graph neural network architectures, including Graph Convolutional Neural Networks, GraphSage, and Graph Attention Networks.
  • πŸ“š The course will also delve into heterogeneous graphs, knowledge graphs, logical reasoning, and applications in biomedicine, scientific research, and industry.

Q & A

  • What is the primary focus of CS224W, Machine Learning with Graphs course?

    -The primary focus of the CS224W course is to explore graph-structured data and teach students how to apply novel machine learning methods to it, with an emphasis on understanding and utilizing the relational structure of data represented as graphs.

  • Why are graphs considered a powerful language for describing and analyzing entities and their interactions?

    -Graphs are a powerful language because they allow us to represent the world or a given domain not as isolated data points but as networks with relations between entities. This representation enables the construction of more faithful and accurate models of the underlying phenomena in various domains.

  • Provide examples of different types of data that can be naturally represented as graphs.

    -Examples of data that can be represented as graphs include computer networks, disease pathways, networks of particles in physics, food webs, social networks, economic networks, communication networks, scene graphs, computer code, and molecules.

  • What are natural graphs or networks, and provide an example?

    -Natural graphs or networks are domains that can inherently be represented as graphs. An example is a social network, which is a collection of individuals and connections between them, such as societies with billions of people and their interactions through electronic devices and financial transactions.

  • How does the course address the challenges of processing graphs in deep learning?

    -The course discusses the challenges of processing graphs in deep learning, such as their arbitrary size, complex topology, and lack of spatial locality. It then explores how to develop neural networks that are more broadly applicable to complex data types like graphs and delves into the latest deep learning approaches for relational data.

  • What is representation learning in the context of graph-structured data?

    -Representation learning for graph-structured data involves automatically learning a good representation of the graph so that it can be used for downstream machine learning algorithms. It aims to map nodes of a graph to d-dimensional embeddings, capturing the structure and relationships within the data without the need for manual feature engineering.

  • Name some of the graph neural network architectures that will be covered in the course.

    -The course will cover graph neural network architectures such as Graph Convolutional Neural Networks (GCN), GraphSage, and Graph Attention Networks (GAT), among others.

  • What are the main differences between traditional machine learning approaches and representation learning?

    -Traditional machine learning approaches require significant effort in designing proper features and ways to capture the structure of the data. In contrast, representation learning aims to automatically extract or learn features in the graph, eliminating the need for manual feature engineering and allowing the model to learn from the graph data directly.

  • How will the course structure its content over the 10-week period?

    -The course will be structured week by week, covering traditional methods for machine learning and graphs, generic node embeddings, graph neural networks, expressive power and scaling of GNNs, heterogeneous graphs, knowledge graphs, logical reasoning, deep generative models for graphs, and various applications in biomedicine, science, and industry, with a particular focus on graph neural networks and representation learning.

  • What are some of the applications of graph-structured data and machine learning in biomedicine?

    -In biomedicine, graph-structured data and machine learning can be applied to model genes and proteins regulating biological processes, analyze connections between neurons in the brain, and understand complex disease pathways, among other applications.

  • Can you explain the concept of a scene graph as mentioned in the script?

    -A scene graph is a representation of relationships between objects in a real-world scene. It captures the interactions and spatial or functional relationships among various elements within the scene, organizing them into a graph structure where nodes represent objects and edges represent the relationships between those objects.

Outlines

00:00

🌟 Introduction to CS224W: Machine Learning with Graphs

The first paragraph introduces the course CS224W, Machine Learning with Graphs, and the instructor, Jure Leskovec, Associate Professor of Computer Science at Stanford University. The main theme is to motivate students about graph-structured data and the application of machine learning methods to it. Graphs are presented as a universal language for describing and analyzing entities and their relations and interactions, emphasizing the shift from isolated data points to a network perspective. The paragraph highlights the versatility of graph representation across various domains such as computer networks, disease pathways, social networks, and more. It also touches on the importance of capturing relationships for building accurate models and introduces the concept of scene graphs and abstract syntax trees as forms of graph representation. The paragraph sets the stage for the class's focus on leveraging relational data in graphs for improved predictions and modeling.

05:01

πŸ€– Harnessing the Power of Graphs in Machine Learning

This paragraph delves into the significance of utilizing the relational structure of data for better predictions and performance in machine learning. It contrasts the simplicity of traditional data types like sequences and grids with the complexity of graphs, which lack fixed sizes, topology, and spatial locality. The paragraph emphasizes the challenges of processing graphs and the dynamic, multi-modal nature of network data. It introduces the course's objective of developing neural networks applicable to complex data types like graphs, positioning relational data graphs as the new frontier in deep learning and representation learning. The concept of representation learning is explained, focusing on automatically learning a good representation of the graph data for downstream machine learning algorithms without the need for human feature engineering. The paragraph outlines the course's aim to explore the latest deep learning approaches for graph-structured data.

10:03

πŸ“š Course Curriculum: Topics and Methodologies

The final paragraph outlines the course curriculum, detailing the topics that will be covered over 10 weeks. It begins with traditional methods for machine learning and graphs, such as graphlets and graph kernels, and moves on to generic node embeddings methods like DeepWalk and Node2Vec. A significant portion of the course is dedicated to graph neural networks, including popular architectures like Graph Convolutional Neural Networks, GraphSage, and Graph Attention Networks, along with their expressive power and scalability. The course also touches on heterogeneous graphs, knowledge graphs, and their applications in logical reasoning, using methods like TransE and BetaE. Additionally, it discusses the development of deep generative models for entire newly generated graphs and their applications in biomedicine, scientific research, recommender systems, and fraud detection. The paragraph concludes with the course schedule, highlighting the comprehensive coverage of graph neural networks and representation learning in graphs.

Mindmap

Keywords

πŸ’‘Graphs

Graphs are mathematical structures used to model relationships between entities, consisting of nodes (entities) and edges (relationships). In the context of the video, graphs are a general language for describing and analyzing entities with relations and interactions, such as social networks, computer networks, and biological pathways. They allow for a more accurate and faithful representation of the underlying phenomena in various domains.

πŸ’‘Machine Learning

Machine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. In the video, machine learning methods are applied to graph-structured data to make predictions, find patterns, and build models that capture the complex relationships within the data.

πŸ’‘Structured Data

Structured data refers to data that is organized in a specific format, often in rows and columns, like in databases or spreadsheets. In contrast to unstructured data, structured data is easier to analyze because of its pre-defined organization. The video focuses on graph-structured data, which is a type of structured data where relationships between entities are explicitly represented.

πŸ’‘Networks

Networks, in the context of the video, refer to collections of nodes (entities) connected by edges (relationships) that form a graph. They are used to model various real-world systems where entities interact with each other, such as social networks, biological networks, and the Internet.

πŸ’‘Deep Learning

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn representations of data. It is particularly effective at handling complex patterns and large volumes of data. In the video, deep learning is discussed as a modern toolbox specialized for simple data types but is being extended to handle more complex structures like graphs.

πŸ’‘Representation Learning

Representation learning is a type of unsupervised learning that aims to represent data in a new space in a way that is useful for subsequent tasks. In the context of graphs, it involves learning a mapping from nodes to lower-dimensional vectors that capture the structure and relationships within the graph. This process allows machine learning algorithms to perform better by using these learned representations.

πŸ’‘Graph Neural Networks

Graph Neural Networks (GNNs) are a class of neural networks designed to operate on graph-structured data. They extend traditional neural networks by incorporating the graph topology into their computations, allowing them to learn from the relationships between data points directly. GNNs are essential for processing complex graph data and making predictions based on the graph structure.

πŸ’‘Heterogeneous Graphs

Heterogeneous graphs are graphs that contain multiple types of nodes and edges, each representing different kinds of entities and relationships. They are more complex than homogeneous graphs and can capture a wider variety of interactions and properties. In the video, heterogeneous graphs are discussed as a domain where graph neural networks can be applied to model complex relational structures.

πŸ’‘Knowledge Graphs

Knowledge graphs are a type of heterogeneous graph used to represent knowledge in the form of entities, their properties, and the relationships between them. They are used for semantic search, question answering, and other applications that require understanding the meaning and context of information. In the video, knowledge graphs are mentioned as one of the domains where graph-structured data and machine learning methods can be applied.

πŸ’‘Applications

Applications in the context of the video refer to the practical use of graph-structured data and machine learning methods in various fields. These applications demonstrate the versatility and importance of understanding and modeling relationships in data, from social networks to biomedicine.

Highlights

Introduction to CS224W, Machine Learning with Graphs, by Jure Leskovec, Associate Professor at Stanford University.

Motivation for studying graph-structured data and its applications with machine learning methods.

Graphs as a general language for describing and analyzing entities with relations and interactions.

The importance of considering the world in terms of networks and relations between entities.

Examples of naturally represented graph data, such as computer networks, disease pathways, and social networks.

The ability of graph representation to build more accurate models of underlying phenomena.

The concept of scene graphs for representing relationships between objects in real-world scenes.

The representation of computer code and software as graphs through function calls and abstract syntax trees.

Molecules as graphs with atoms as nodes and bonds as edges.

The distinction between natural graphs or networks and other domains with relational structures.

The challenge of making better, more accurate predictions using relational data represented as graphs.

The limitations of modern deep learning tools for simple data types compared to the complexity of graphs.

The dynamic and multi-modal nature of networks that lack spatial locality and fixed node ordering.

The goal of developing neural networks applicable to complex data types like graphs.

The concept of representation learning and its role in automatically extracting features from graph data.

Investigation of the latest deep learning approaches for graph-structured data during the course.

Coverage of traditional machine learning methods for graphs, such as graphlets and graph kernels.

Discussion of methods for generating generic node embeddings, like DeepWalk and Node2Vec.

In-depth exploration of graph neural networks and their popular architectures like Graph Convolutional Neural Networks and Graph Attention Networks.

Study of the expressive power and theory behind graph neural networks and scaling them for large graphs.

Focus on heterogeneous graphs, knowledge graphs, logical reasoning, and their applications.

Learning about deep generative models for graphs and the prediction of entirely new generated graphs.

Applications of graph-structured data in biomedicine, scientific research, recommender systems, and fraud detection.

Course outline provided, spanning 10 weeks with 20 lectures on graph neural networks and representation learning.

Transcripts

play00:05

Welcome to CS224W, Machine Learning with Graphs.

play00:09

My name is Jure Leskovec.

play00:11

I'm Associate Professor of Computer Science at

play00:13

Stanford University and I will be your instructor.

play00:16

What I'm going to do in the first lecture is to motivate and get you excited about graph,

play00:22

uh, structured data and how can we apply novel machine learning methods to it?

play00:28

So why graphs?

play00:30

Graphs are a general language for describing and an-

play00:33

analyzing entities with the relations in interactions.

play00:37

This means that rather than thinking of the world

play00:40

or a given domain as a set of isolated datapoints,

play00:44

we really think of it in terms of networks and relations between these entities.

play00:50

This means that there is

play00:51

the underla- ler- underlying graph of relations between the entities,

play00:56

and these entities are related, uh,

play00:58

to each other, uh,

play01:00

according to these connections or the structure of the graph.

play01:02

And there are many types of data that can naturally be

play01:06

represented as graphs and modeling these graphical relations,

play01:10

these relational structure of the underlying domain,

play01:13

uh, allows us to, uh,

play01:15

build much more faithful,

play01:16

much more accurate, uh,

play01:17

models of the underlying,

play01:19

uh, phenomena underlying data.

play01:21

So for example, we can think of a computer networks, disease pathways, uh,

play01:26

networks of particles in physics, uh,

play01:29

networks of organisms in food webs,

play01:32

infrastructure, as well as events can all be represented as a graphs.

play01:37

Similarly, we can think of social networks,

play01:40

uh, economic networks, communication networks,

play01:43

say patients between different papers,

play01:46

Internet as a giant communication network,

play01:49

as well as ways on how neurons in our brain are connected.

play01:53

Again, all these domains are inherently network or graphs.

play01:58

And that representation allows us to capture

play02:00

the relationships between different objects or entities,

play02:03

uh, in these different, uh, domains.

play02:06

And last, we can take knowledge and

play02:09

represent facts as relationships between different entities.

play02:13

We can describe the regulatory mechanisms in our cells,

play02:17

um, as processes governed by the connections between different entities.

play02:22

We can even take scenes from real world and presented them

play02:27

as graphs of relationships between the objects in the scene.

play02:32

These are called scene graphs.

play02:34

We can take computer code software and represent it as a graph of, let's say,

play02:39

calls between different functions or as

play02:42

a structure of the code captures by the abstract syntax tree.

play02:45

We can also naturally take molecules which are composed of nodes, uh,

play02:50

of atoms and bonds as a set of graphs, um,

play02:55

where we represent atoms as nodes and their bonds as edges between them.

play03:00

And of course, in computer graphics,

play03:02

we can take three-dimensional shapes and- and represent them, um, as a graphs.

play03:07

So in all these domains,

play03:09

graphical structure is the- is

play03:11

the important part that allows us to model the under- underlying domain,

play03:15

underlying phenomena in a fateful way.

play03:18

So the way we are going to think about graph

play03:21

relational data in this class is that there are essentially two big,

play03:25

uh, parts, uh, of data that can be represented as graphs.

play03:29

First are what is called natural graphs or networks,

play03:32

where underlying domains can naturally be represented as graphs.

play03:36

For example, social networks,

play03:38

societies are collection of seven billion individuals and connections between them,

play03:43

communications and transactions between electronic devices, phone calls,

play03:47

financial transactions, all naturally form, uh, graphs.

play03:51

In biomedicine we have genes,

play03:54

proteins regulating biological processes,

play03:57

and we can represent interactions between

play03:59

these different biological entities with a graph.

play04:03

And- and as I mentioned,

play04:05

connections between neurons in our brains are,

play04:08

um, essentially a network of, uh, connections.

play04:11

And if we want to model these domains,

play04:13

really present them as networks.

play04:16

A second example of domains that also have relational structure,

play04:21

um, where- and we can use graphs to represent that relational structure.

play04:26

So for example, information and knowledge is many times organized and linked.

play04:30

Software can be represented as a graph.

play04:33

We can many times take, uh,

play04:35

datapoints and connect similar data points.

play04:38

And this will create our graph,

play04:40

uh, a similarity network.

play04:41

And we can take other, um, uh,

play04:44

domains that have natural relational structure like molecules,

play04:48

scene graphs, 3D shapes, as well as,

play04:51

you know, in physics,

play04:52

we can take particle-based simulation to simulate how,

play04:56

uh, particles are related to each other through,

play04:58

uh, and they represent this with the graph.

play05:01

So this means that there are many different domains, either, uh,

play05:05

as natural graphs or natural networks,

play05:08

as well as other domains that can naturally be

play05:11

modeled as graphs to capture the relational structure.

play05:16

And the main question for this class that we are

play05:19

going to address is to talk about how do we take

play05:22

advantage of this relational structure to be- to make better, more accurate predictions.

play05:28

And this is especially important because

play05:30

couplings domains have reached a relational structure,

play05:34

uh, which can be presented, uh, with a graph.

play05:36

And by explicitly modeling these relationships,

play05:40

we will be able to achieve, uh,

play05:41

better performance, build more, uh,

play05:44

accurate, uh, models, make more accurate predictions.

play05:48

And this is especially interesting and important in the age of deep learning,

play05:53

where the- today's deep learning modern toolbox is specialized for simple data types.

play06:00

It is specialized for simple sequences, uh, and grids.

play06:03

A sequence is a, uh,

play06:06

like text or speech has this linear structure and there

play06:10

has- there are been amazing tools developed to analyze this type of structure.

play06:15

Images can all be resized and have this spatial locality so- so

play06:20

they can be represented as fixed size grids or fixed size standards.

play06:24

And again, deep learning methodology has been very good at processing this type of,

play06:29

uh, fixed size images.

play06:30

However, um, graphs, networks are much harder to process because they are more complex.

play06:38

First, they have arbitrary size and arb- and complex topology.

play06:42

Um, and there is also no spatial locality as in grids or as in text.

play06:47

In text we know left and right,

play06:50

in grids we have up and down, uh, left and right.

play06:53

But in networks, there is no reference point,

play06:56

there is no notion of,

play06:57

uh, uh, spatial locality.

play07:00

The second important thing is there is no reference point,

play07:03

there is no fixed node ordering that would allow us,

play07:07

uh, uh, to do, uh,

play07:09

to do deep learning.

play07:10

And often, these networks are dynamic and have multi-model features.

play07:15

So in this course,

play07:17

we are really going to, uh,

play07:19

talk about how do we develop neural networks that are much more broadly applicable?

play07:24

How do we develop neural networks that are applicable to complex data types like graphs?

play07:29

And really, it is relational data graphs that are the- the new frontier,

play07:34

uh, of deep learning and representation learning, uh, research.

play07:39

So intuitively, what we would like to do is we would like to do,

play07:43

uh, build neural networks,

play07:45

but on the input we'll take, uh, uh,

play07:48

our graph and on the output,

play07:50

they will be able to make predictions.

play07:51

And, uh, these predictions can be at the level of individual nodes,

play07:55

can be at the level of pairs of nodes or links,

play07:58

or it can be something much more complex like a brand new generated graph or, uh,

play08:03

prediction of a property of a given molecule that can be represented,

play08:08

um, as a graph on the input.

play08:10

And the question is,

play08:12

how do we design this neural network architecture

play08:14

that will allow us to do this end to end,

play08:17

meaning there will be no human feature engineering, uh, needed?

play08:21

So what I mean by that is that, um,

play08:24

in traditional, uh, machine learning approaches,

play08:26

a lot of effort goes into des- designing proper features,

play08:31

proper ways to capture the structure of the data so that machine learning models can,

play08:36

uh, take advantage of it.

play08:38

So what we would like to do in this class,

play08:41

we will talk mostly about representation learning

play08:44

where these feature engineering step is taken away.

play08:47

And basically, as soon as we have our graph,

play08:50

uh, uh, repr- graph data,

play08:51

we can automatically learn a good representation of the graph so that it can be used for,

play08:58

um, downstream machine learning algorithm.

play09:00

So that a presentation learning is about automatically extracting or learning features,

play09:05

uh, in the graph.

play09:07

The way we can think of representation learning is to map

play09:11

nodes of our graph to a d-dimensional embedding,

play09:14

to the d-dimensional vectors,

play09:16

such that seeming that are nodes in the network are

play09:19

embedded close together in the embedding space.

play09:22

So the goal is to learn this function f that will take

play09:25

the nodes and map them into these d-dimensional,

play09:28

um, real valued vectors,

play09:30

where this vector will call this, uh, representation, uh,

play09:34

or a feature representation or an embedding of a given node,

play09:38

an embedding of an entire graph,

play09:40

an embedding of a given link,

play09:42

um, and so on.

play09:43

So a big part of our class we'll be, uh,

play09:46

investigating and learning about latest presentation learning,

play09:51

deep learning approaches that can be applied,

play09:53

uh, to graph, uh, structured data.

play09:56

And we are going to, uh, uh,

play09:59

talk about many different topics in

play10:02

machine learning and the representation learning for graph structure data.

play10:06

So first, we're going to talk about traditional methods

play10:09

for machine learning and graphs like graphlets and graph kernels.

play10:13

We are then going to talk about methods to generate, um,

play10:17

generic node embeddings, methods like DeepWalk and Node2Vec.

play10:22

We are going to spend quite a bit of time talking about

play10:25

graph neural networks and popular graph neural network architectures like graph,

play10:30

uh, convolutional neural network,

play10:31

the GraphSage architecture or Graph Attention Network, uh, architecture.

play10:36

We are also going to study the expressive power of graph neural networks,

play10:41

um, the theory behind them,

play10:44

and how do we scale them up to very large graphs.

play10:47

Um, and then in the second part of this course,

play10:50

we are also going to talk about heterogeneous graphs,

play10:53

knowledge graphs, and applications,

play10:55

uh, to logical reasoning.

play10:57

We're learning about methods like TransE and BetaE.

play11:00

We are also going to talk about how do we build deep generative models for

play11:05

graphs where we can think of the prediction of the model to

play11:08

be an entire newly generated graph.

play11:11

And we are also going to discuss applications to biomedicine, um,

play11:15

various scientific applications, as well

play11:18

as applications to industry in terms of recommender systems,

play11:22

fraud detection, and so on.

play11:24

So here is the outline of this course.

play11:27

Week by week, 10 weeks, starting, uh,

play11:30

starting today and all the way to the middle of March,

play11:35

um, where, uh, the- the course will finish.

play11:37

We will have 20 lectures and we will cover all the topics,

play11:41

uh, that I have discussed,

play11:43

and in particular focus on

play11:44

graph neural networks and the representation learning in graphs.

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
GraphLearningMachineLearningStanfordUniversityJureLeskovecNetworkAnalysisDeepLearningRepresentationLearningGraphNeuralNetworksBiomedicalApplicationsIndustrySolutions