4. Graph Attention Networks, Graph Transformer Network, Working with Dynamic Graphs

Graph Clustering¶

What is Graph Clustering?¶

Graph clustering is the task of grouping the nodes of a graph into clusters such that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. This helps to uncover community structures or other meaningful groups within the graph.

Where and How is it Used?¶

Social Networks: Identifying communities or groups of users with similar interests.
Biological Networks: Discovering functional modules within biological systems such as protein interaction networks.
Recommendation Systems: Grouping items or users to provide better recommendations.

Main Methods¶

Node Level or Graph Level Embeddings: Use embeddings like node2vec or GraphSAGE followed by classical clustering algorithms (e.g., k-means, hierarchical clustering).
Specialized Algorithms: Tailored specifically for graphs, such as spectral clustering or modularity-based methods.

Desired Properties for Clustering Algorithms¶

End-to-End Training: Capture graph structure and attributes.
Unsupervised Training: Learn clusters without labeled data.
Node Aggregation: Aggregate node features effectively.
Sparsity: Use only a subset of neighbors to reduce computational time.
Soft Assignments: Allow nodes to belong to multiple clusters.
Stability: Provide consistent results for similar graphs.

Clustering Quality Metrics¶

Cut-Based Metrics¶

Minimal Cuts: Use cuts to identify ground truth communities.
Issue: Real-world communities often have overlaps, making cuts less effective.

Modularity Metrics¶

Modularity: Measures the strength of division of a network into clusters.
Formula:
- $A_{ij}$ : Adjacency matrix.
- $d_i$ : Degree of node $i$ .
- $m$ : Number of edges.
- $\delta(c_i, c_j)$ : 1 if nodes $i$ and $j$ are in the same cluster, 0 otherwise.

Clustering Algorithms¶

Spectral Modularity Maximization¶

Idea:

Maximizing modularity $Q$ is NP-hard, but can be approximated using spectral methods, which involve eigenvalue decomposition.

Algorithm:

Modularity Matrix: Compute the modularity matrix $B = A - \frac{d^T d}{2m}$ .
Cluster Assignment Matrix: Initialize the cluster assignment matrix $C$ .
Spectral Modularity: Maximize the trace of the matrix $C^T B C$ .
Formula:
- $C$ : Cluster assignment matrix (binary values indicating cluster membership).

DMoN: Deep Modularity Networks¶

Idea:

DMoN provides a framework to optimize cluster assignments using deep learning techniques.

Algorithm:

Cluster Assignment: Use a GCN with softmax to determine cluster assignments.
Formula: $C = \text{softmax}(\text{GCN}(A, X))$
Convolution Operation: Perform graph convolution.
Formula: $\tilde{A} = D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$ $X^{t+1} = \text{SeLU}(\tilde{A} X^t W + X W_{\text{skip}})$
Loss Function: Optimize the modularity with regularization.
Formula:
- Regularization term prevents trivial solutions where all nodes are assigned to one cluster.

Heterogeneous Graph Embeddings¶

What is Heterogeneous Graph Embedding?¶

Heterogeneous graph embedding involves representing nodes in a graph with multiple types of nodes and edges into a continuous vector space. This allows for the capture of complex relationships and interactions within the graph.

Where and How is it Used?¶

Recommendation Systems: Capturing user-item interactions.
Knowledge Graphs: Embedding entities and relationships for better inference.
Social Networks: Understanding interactions between different types of users and content.

Main Concept¶

Meta Path: A sequence of node types and edge types that capture relationships in heterogeneous graphs.
Example: $P = A \xrightarrow{R_1} B \xrightarrow{R_2} C$

MAGNN: Metapath Aggregated Graph Neural Network¶

Idea:

MAGNN aggregates information from multiple metapaths to capture the rich semantics in heterogeneous graphs.

Algorithm:

Node Attribute Transformation:
Convert node attributes to vectors of the same length.
- Formula: $h'_v = W_A \cdot x_v^A$
Intra-Metapath Aggregation:
Aggregate instances of the same metapath.
- Formula: $h_P(v, u) = f_\Theta(h'_v, h'_u, h'_{i_1}, \ldots, h'_{i_M})$
- Use mean with attention to combine instances.
Inter-Metapath Aggregation:
Combine different metapaths to form node representation.
- Formula: $s_P = \frac{1}{|V_A|} \sum_{v \in V_A} \tanh(M_A h_P(v) + b_A)$ $h_v = \sigma(W_o h_P^A)$
- Use attention to weigh different metapaths.

GNN with Attention¶

What is GNN with Attention?¶

GNNs with attention mechanisms allow the model to focus on the most relevant parts of the graph, improving the ability to capture important patterns and relationships.

Where and How is it Used?¶

Node Classification: Identifying important features for classifying nodes.
Graph Classification: Aggregating critical information from nodes to classify entire graphs.
Link Prediction: Focusing on relevant nodes to predict missing links.

What is Attention?¶

Attention is a mechanism that helps the model weigh the importance of different parts of the input. In the context of graphs, it helps determine the influence of neighboring nodes.

Attention Layer¶

Input: Features from a set of neighboring nodes $h = \{h_1, h_2, \ldots, h_N \}$ .
Attention Coefficients: Compute the importance of each neighbor.
Formula: $e_{ij} = a(Wh_i, Wh_j)$
$W$ : Learnable weight matrix.
Softmax Normalization: Transform coefficients to probabilities.
Formula: $\alpha_{ij} = \text{softmax}(e_{ij}) = \frac{\exp(e_{ij})}{\sum_k \exp(e_{ik})}$
Node Representation Update: Combine neighbor features weighted by attention coefficients.
Formula: $h'_i = \sigma \left( \sum_{j \in N(i)} \alpha_{ij} Wh_j \right)$

Graph Transformer Network¶

What is a Graph Transformer Network?¶

Graph Transformer Networks (GTNs) extend the transformer model to graphs, enabling the model to capture long-range dependencies and complex interactions within the graph.

Where and How is it Used?¶

Graph Classification: Classifying entire graphs based on complex patterns.
Node Classification: Leveraging long-range dependencies for better node classification.
Link Prediction: Capturing complex interactions for predicting links.

Fingerprinting¶

Concept: Using metapaths to sample the graph and generate unique fingerprints for different parts of the graph.
Metapath Example: $P = A \xrightarrow{R_1} B \xrightarrow{R_2} C$

Metapath Generation¶

Idea: Generate new metapaths by combining existing ones.
Selection: Soft select two metapaths using a learned parameter vector.
Formula:
- $W_\phi$ : Learnable parameter vector.
Combination: Multiply selected metapaths to create new ones.
Formula: $Q_1 Q_2$

Dynamic Graph Algorithms¶

What is a Dynamic Graph Algorithm?¶

Dynamic graph algorithms handle graphs that change over time, capturing evolving structures and relationships.

Where and How is it Used?¶

Social Networks: Analyzing evolving user interactions.
Financial Networks: Tracking changes in financial transactions.
Communication Networks: Monitoring dynamic communication patterns.

EvolveGCN: Evolving Graph Convolutional Networks¶

Idea:

EvolveGCN (Evolving Graph Convolutional Networks) is designed to handle dynamic graphs by capturing their evolving nature. It leverages Long Short-Term Memory (LSTM) networks to model temporal dependencies, enabling the model to adapt to changes in the graph structure over time.

Algorithm:

Graph Convolution:
Perform graph convolution on the graph at time $t$ to update the node embeddings.
- Formula: $H^{l+1}_t = \text{GCONV}(A_t, H^l_t, W^l_t) = \sigma(\hat{A_t} H^l_t W^l_t)$
- $\hat{A_t}$ : Normalized adjacency matrix.
Weight Evolution:
Update the weight matrices over time using an LSTM to capture temporal dependencies.
- Formula: $W^l_t = \text{LSTM}(W^l_{t-1})$
Evolving Graph Convolution Unit (EGCU):
Combine graph convolution and weight evolution to update node embeddings and weights simultaneously.