Stanford CS224W: ML with Graphs | 2021 | Lecture 2.2 - Traditional Feature-based Methods: Link

Stanford Online
15 Apr 202116:47

Summary

TLDRThe script discusses traditional machine learning approaches for graph-level predictions, focusing on link prediction tasks. It explores two methods: predicting missing links at random and forecasting links over time in evolving networks. The script introduces various features for node pairs, including distance-based, local neighborhood overlap, and global neighborhood overlap metrics, with examples like Katz index to predict future links based on the network's structure.

Takeaways

  • 🔍 The focus is on link prediction tasks in graph-level predictions, aiming to predict new links based on existing ones.
  • 📈 The task involves ranking all unlinked node pairs at test time and identifying the top k pairs as predicted links.
  • 🧠 Key to the task is designing features for node pairs that capture the structure of links in the network.
  • 🔗 A simple approach is to concatenate features of two nodes, but this can lose important relational information.
  • 📊 Two formulations for link prediction are discussed: 'missing at random' and 'predicting links over time'.
  • 🕒 The 'missing at random' formulation assumes links are removed randomly, while 'predicting over time' considers network evolution.
  • 📝 Evaluation of predictions is done by comparing the ranked list of predicted links with actual future links.
  • 📌 Three types of link level features are discussed: distance-based, local neighborhood overlap, and global neighborhood overlap.
  • 🛰 Distance-based features consider the shortest path between nodes but may lack depth in capturing connections.
  • 🌐 Local neighborhood overlap features like common neighbors and Jaccard coefficient focus on immediate connections between nodes.
  • 🌏 Global neighborhood overlap features, such as the Katz index, consider the entire graph structure for link prediction.

Q & A

  • What is the main focus of the investigation in the transcript?

    -The main focus of the investigation is on traditional machine learning approaches for graph-level predictions, specifically link prediction tasks and features that capture the structure of links in a given network.

  • What is the link level prediction task described in the transcript?

    -The link level prediction task is to predict new links based on the existing links in the network by evaluating all node pairs that are not yet linked at test time, ranking them, and proclaiming the top k node pairs as predicted links that are going to occur in the network.

  • How can the importance of information about the relationship between two nodes be lost?

    -The importance of information about the relationship between two nodes can be lost by simply concatenating the features of the two nodes and training a model on that representation, which often neglects the relational information between them.

  • What are the two different ways to formulate the link prediction task mentioned in the transcript?

    -The two ways to formulate the link prediction task are: 1) Assuming links in the network are missing at random, and 2) Predicting links over time in networks that naturally evolve, such as citation, social, or collaboration networks.

  • How is the performance of a link prediction algorithm evaluated in the context of networks that evolve over time?

    -The performance is evaluated by looking at a graph between two time points, t0 and t0', and based on the structure up to time t0', outputting a ranked list of predicted links that are expected to occur between times T1 and T1'. The algorithm's predictions are then compared to the actual links that appeared in that future time frame.

  • What is a feature descriptor for a pair of nodes in the context of link prediction?

    -A feature descriptor for a pair of nodes is a method to quantify the relationship between the two nodes in such a way that it can be used to predict or learn whether a link exists between them or not.

  • What is a distance-based feature in link prediction?

    -A distance-based feature in link prediction refers to the use of the shortest path distance between two nodes to characterize their relationship.

  • What is the local neighborhood overlap feature and how is it calculated?

    -The local neighborhood overlap feature captures the number of neighboring nodes shared between two nodes. It is calculated by taking the intersection of the neighbors of the two nodes, and can be normalized using the Jaccard coefficient or adjusted using the Adamic-Adar index.

  • What is the global neighborhood overlap feature and how does it differ from the local neighborhood overlap?

    -The global neighborhood overlap feature considers the entire graph structure to score the relationship between a pair of nodes, not just the immediate neighbors. It differs from the local neighborhood overlap by not being limited to two-hop distances and considering all possible paths between nodes.

  • How is the Katz index computed and what does it represent?

    -The Katz index is computed using the closed-form expression involving the inverse of the identity matrix minus beta times the adjacency matrix. It represents the global neighborhood overlap by counting the number of paths of all lengths between a pair of nodes, where paths are discounted exponentially with their length.

  • Why might the assumption of links missing at random not hold true in real-world scenarios?

    -The assumption of links missing at random might not hold true because in real-world scenarios, such as protein-protein interaction networks, the connections are often not tested at random but are influenced by previous positive results, leading to some areas of the network being more explored than others.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This

5.0 / 5 (0 votes)

Related Tags
Machine LearningLink PredictionGraph AnalysisNetwork EvolutionSocial NetworksCitation NetworksCollaboration NetworksProtein InteractionFeature DescriptorGraph Adjacency