Cheat Sheet
Here is the detailed cheat sheet with algorithm steps included:
1. GNN Basics, Centrality, PageRank, Community Detection¶
Graph Definition¶
- Basic Graph: Consists of nodes (vertices) and edges (connections between nodes).
- Types of Graphs:
- Undirected: Edges have no direction.
- Directed (Digraphs): Edges have direction.
- Weighted: Edges have weights representing the strength of connection.
- Unweighted: Edges are equal, no weights.
Centrality Measures¶
- Degree Centrality: Number of edges connected to a node.
- Betweenness Centrality: Number of shortest paths passing through a node.
- Closeness Centrality: Average length of the shortest path from a node to all other nodes.
- Eigenvector Centrality: Influence of a node in a network.
PageRank¶
- Algorithm: Measures the importance of each node within the graph.
- Steps:
- Initialize the Graph: Assign an initial rank to each node, typically 1/N where N is the total number of nodes.
- Distribute Rank: Distribute the rank of each node evenly across its outgoing edges.
- Rank Aggregation: Sum up the proportional ranks coming into each node.
- Apply Damping Factor: Adjust the ranks by applying a damping factor to account for random jumps.
- Iterate: Repeat the process until the ranks converge (change is below a threshold).
Community Detection¶
- Louvain Algorithm:
- Initialize the Graph: Each node starts in its own community.
- Modularity Optimization: Move each node to the community that maximizes modularity gain.
- Community Aggregation: Form a new graph where nodes represent communities.
- Repeat: Repeat the process until no further modularity improvement.
2. Topology Inference, Random Walk Based Techniques¶
Topology Inference¶
- Definition: The process of discovering the structure of a graph.
- Approaches to Topology Inference:
- Probabilistic Models: Infer topology based on observed data.
- Optimization Techniques: Use optimization algorithms to find the best fitting topology.
- Heuristic Approaches: Apply heuristic rules to infer structure.
- Graphical Lasso:
- Initialize the Graph: Define an initial sparse inverse covariance matrix.
- Iterative Optimization: Update the matrix by solving the lasso problem iteratively until convergence.
Random Walk Based Techniques¶
- Random walk:
- For each node in the graph, perform multiple random walks of fixed length. Each walk generates a sequence of nodes (like a sentence of words). Train a Skip-gram model to predict the context nodes (neighbors) for each node in the random walks
- DeepWalk:
- Uses random walks to generate sequences of nodes, which are then used to train a Skip-gram model to learn node embeddings. Inspired by Word2Vec from natural language processing.
- Node2Vec:
- Extends DeepWalk by introducing biased random walks that can balance between breadth-first search (BFS) and depth-first search (DFS). This method captures both homophily and structural equivalence.
- Residual2Vec:
- Aims to reduce bias towards hub nodes by adjusting the transition probabilities during random walks. This method helps to provide more balanced embeddings that are not overly influenced by high-degree nodes.
- Graph2Vec:
- Extends node embedding to graph embedding. It extracts rooted subgraphs from each graph and treats them as documents in a corpus. A Doc2Vec model is then trained to learn vector representations for entire graphs.
3. Convolutional Graph Neural Networks, Deep Graph Infomax¶
Convolutional Graph Neural Networks (ConvGNNs)¶
- Introduction: Generalize convolution operations to graph-structured data.
- Key Concepts:
- Graph Convolution: Aggregate feature information from a node’s neighbors.
- Pooling: Downsample nodes to reduce graph size.
- Algorithm Steps:
- Initialization: Initialize node features and parameters.
- Graph Setup: Construct the adjacency matrix and feature matrix.
- Feature Aggregation: Aggregate features from neighboring nodes using convolution operations.
- Update Node Features: Apply a non-linear transformation to aggregated features.
- Pooling: Optionally pool the nodes to reduce the graph size.
- Iteration: Repeat the process for a predefined number of layers.
Deep Graph Infomax¶
- Objective: Maximize mutual information between input and output representations.
- Algorithm Steps:
- Graph Encoder: Encodes the input graph into a high-dimensional representation.
- Discriminator: Distinguishes between true and corrupted representations.
- Mutual Information Maximization: Optimize the model to maximize mutual information between representations.
4. Graph Attention Networks, Graph Transformer Network, Working with Dynamic Graphs¶
Graph Attention Networks (GATs)¶
- Introduction: Apply attention mechanisms to graphs.
- Key Features:
- Attention Coefficients: Compute attention scores for edges.
- Weighted Aggregation: Aggregate neighbor features using attention coefficients.
- Algorithm Steps:
- Compute Attention Scores: Calculate attention coefficients for each edge.
- Weighted Aggregation: Aggregate the features of neighboring nodes using the attention coefficients.
- Update Node Features: Apply a non-linear transformation to the aggregated features.
Graph Transformer Networks¶
- Introduction: Combine transformer architecture with graph representation.
- Key Concepts:
- Self-Attention: Learn dependencies between nodes irrespective of their distance in the graph.
- Positional Encoding: Encodes structural information about nodes.
- Algorithm Steps:
- Input Embedding: Embed the nodes and positional information.
- Self-Attention Layers: Apply self-attention layers to capture node dependencies.
- Feed-Forward Layers: Apply feed-forward neural networks to transform node features.
- Output Layer: Generate the final node or graph-level predictions.
Working with Dynamic Graphs¶
- Definition: Graphs that change over time.
- Challenges: Capturing temporal changes and updating representations efficiently.
- EvolveGCN:
- Graph Convolution: Apply graph convolutional layers to extract features.
- Temporal Updates: Update node features based on temporal changes using RNNs or GRUs.
- Output Generation: Produce updated node or graph-level outputs.
5. Graph Autoencoders and Graph Generation¶
Graph Autoencoders¶
- Introduction: Unsupervised learning models to learn graph embeddings.
- Algorithm Steps:
- Graph Encoder: Encodes the input graph into a latent representation.
- Message Passing: Nodes aggregate information from their neighbors during encoding.
- Graph Decoder: Reconstructs the graph from the latent representation.
- Reconstruction Loss: Optimize the model to minimize the difference between the original and reconstructed graph.
Graph Generation¶
- Variational Autoencoders (VAEs):
- Graph Encoder: Encodes the input graph into a latent representation.
- Latent Space Sampling: Sample from the latent space to generate new representations.
- Graph Decoder: Decodes the sampled representations to generate new graphs.
- Optimization: Optimize the encoder and decoder to maximize the likelihood of the data.
6. Interpretability of Graph Neural Networks¶
Interpretability¶
- Importance: Understanding how GNNs make decisions.
- GNNExplainer:
- Simplified Input: Use a simplified input x' where features are either included or excluded.
- Rule Selection: Start with a set of candidate rules.
- Data Perturbation: Generate new samples by perturbing the original input.
- Neighborhood Sampling: Define the k-hop neighborhood of the node of interest.
- Identify Influential Subgraph: Determine the subgraph G_s(v) that most influences the prediction for node v.
- Motif Sampling: Sample motifs (repeating patterns) from the temporal graph.
This comprehensive cheat sheet includes the detailed algorithm steps for each topic as specified in the original documents. If you need more details or further clarification on any section, please let me know!