2. Topology Inference, Random Walk Based Techniques

Topology Inference¶

Definition¶

Topology inference is the process of deducing the structure of a graph from observed data. It aims to infer the relationships between nodes (vertices) based on observed states or measurements. This process is essential for understanding complex systems where the underlying network structure is not directly observable but can be inferred from data.

Usage¶

Topology inference is used in various domains, including: - Gene expressions: Inferring gene interaction networks to understand how genes regulate each other. - Computer service dependencies: Understanding dependencies and interactions between services in IT infrastructure to ensure reliability and efficiency. - fMRI data: Identifying functional connectivity in the brain to study brain activity and its correlation with cognitive functions and behaviors.

Approaches to Topology Inference¶

Statistical Approach¶

Concept:
- A statistical approach to topology inference involves constructing a graph $G$ where the structure (edges) determines the joint probability distribution of the observed data.
- Each vertex in the graph represents a random variable, and edges represent conditional dependencies between these variables.
Markov Random Field (MRF):
- An MRF models the relationships between random variables in a way that the probability of a random variable depends only on its immediate neighbors.
- Formula:
  - This means that if there is no edge between $v_i$ and $v_j$ , then the variables $x_i$ and $x_j$ are conditionally independent given the rest of the variables.
Gaussian Markov Random Fields (GMRFs):
- For continuous variables, GMRFs are used where the random variables follow a multivariate normal distribution.
- Formula:
  - Here, $\Theta$ is the precision (inverse covariance) matrix, and $x$ is the vector of observed values.
Covariance Estimation:
- The simplest approach to infer the graph structure is by estimating the covariance matrix of the observed data.
- Formula:
  - Where $X$ is the matrix of observed data, and $N$ is the number of observations.
  - Prune the smallest values in the matrix (up to some threshold) to introduce sparsity.

Physics Modelling Approach¶

Concept:
- This approach models the observations as outcomes of some physical phenomena. The inference problem consists of capturing the structure inherent to the physics of the observed data.
Applications:
- Network tomography: Inferring the internal network structure (e.g., internet topology) from end-to-end measurements.
- Epidemic models: Understanding the spread of diseases through contact networks.
- Information propagation models: Analyzing how information spreads through social networks.

Graphical Lasso¶

Graphical Lasso is a technique used to infer the structure of a graph by estimating a sparse inverse covariance matrix. This method helps identify which variables (nodes) are conditionally independent of each other, represented by the edges in the graph.

Why Use Graphical Lasso?¶

Identify important relationships between variables.
Make the graph sparse by removing weak connections, which simplifies the structure and makes it easier to understand.
Handle high-dimensional data efficiently.

Steps to Implement Graphical Lasso¶

Initialize the Graph
- Start with the data represented as a sample covariance matrix $S$ .
- Add a small regularization term to to initialize : $W = S + \rho I$
- Here, $I$ is the identity matrix, and $\rho$ is a regularization parameter that controls the sparsity of the graph.
Iterate to Update the Precision Matrix
- Repeat the following steps until the values in $W$ stop changing significantly (convergence):
Rotate the Matrix
- Rotate the matrix $W$ one step to the right and down. This step helps to systematically update the matrix.
Update Each Element
- For each element in the rotated matrix, update its value using the following simplified steps:
  1. Estimate Coefficients: Use the current values in the matrix to estimate the coefficients $\beta$ . Think of this step as finding the best way to combine the current values to predict a specific value.
  2. Thresholding: Apply a threshold function to make small values zero. This step ensures that the matrix becomes sparse by removing weak connections.
  3. Update: Replace the old values with the new estimates. This gradually improves the precision matrix $\Theta$ .
Optimize the Precision Matrix
- The goal is to find the precision matrix $\Theta$ that maximizes the likelihood of the observed data while keeping the matrix sparse. This is done by balancing the fit to the data (how well $\Theta$ represents the data) and the sparsity (how many zeros are in $\Theta$ ).

Random Walk Techniques¶

What is a Random Walk?¶

A random walk is a mathematical process that describes a path consisting of a succession of random steps. In the context of graphs, a random walk is a sequence of vertices, where each vertex is chosen randomly from the neighbors of the previous vertex.

Why is it Used?¶

Random walks are used to explore the structure of graphs. They help in tasks like:

Node classification: Identifying the category to which a node belongs.
Link prediction: Predicting the existence of a link between two nodes.
Graph embedding: Representing nodes or entire graphs in a continuous vector space for various machine learning tasks.

Where is it Used?¶

Random walk techniques are widely used in:

Social networks: Analyzing user interactions and influence.
Recommendation systems: Recommending items based on user behavior.
Biological networks: Understanding the interactions between genes or proteins.
Natural language processing: Learning word representations in NLP tasks.

DeepWalk¶

Idea¶

DeepWalk is inspired by natural language processing techniques, particularly Word2Vec. It uses random walks to explore the local neighborhood of nodes and captures their co-occurrence patterns to learn latent representations.

Algorithm¶

Generate Random Walks:
For each node in the graph, perform multiple random walks of fixed length.
Each walk generates a sequence of nodes (like a sentence of words).
Build the Corpus:
Treat each random walk as a sentence and each node as a word.
This corpus of "sentences" is used to train a Skip-gram model, similar to Word2Vec.
Train the Skip-gram Model:
The Skip-gram model predicts the context nodes (neighbors) for each node in the random walks.
The objective is to maximize the probability of observing the neighborhood nodes given a node.

Node2Vec¶

Idea¶

Node2Vec extends DeepWalk by introducing biased random walks, which can trade-off between breadth-first search (BFS) and depth-first search (DFS) strategies. This allows Node2Vec to capture both homophily (similar nodes) and structural equivalence (nodes with similar roles).

Algorithm¶

Preprocess Transition Probabilities:
Calculate the transition probabilities for each node, controlling the likelihood of revisiting nodes (return parameter $p$ ) or exploring new nodes (in-out parameter $q$ ).
Generate Biased Random Walks:
For each node, perform random walks with biased transition probabilities.
Train the Skip-gram Model:
Use the generated walks to train a Skip-gram model similar to DeepWalk.

Residual2Vec¶

Idea¶

Residual2Vec aims to reduce bias towards "hub" nodes (nodes with many connections) in the graph. It adjusts the probability of visiting nodes during random walks to correct for the overrepresentation of hub nodes.

Algorithm¶

Adjust Transition Probabilities:
Modify the transition probabilities to reduce the likelihood of visiting hub nodes excessively.
Generate Adjusted Random Walks:
Perform random walks with the adjusted transition probabilities.
Train the Skip-gram Model:
Use the adjusted random walks to train a Skip-gram model.

Graph2Vec¶

Idea¶

Graph2Vec extends the concept of node embedding to entire graphs. It learns a vector representation for a whole graph, inspired by the Doc2Vec model used in natural language processing.

Algorithm¶

Extract Rooted Subgraphs:
For each node, extract a subgraph rooted at that node, capturing its local structure up to a certain depth.
Generate a Corpus:
Treat each rooted subgraph as a "document" and each node label/feature as a "word".
Train a Doc2Vec Model:
Use the generated corpus to train a Doc2Vec model, learning vector representations for the entire graphs.

Summary¶

DeepWalk: Uses random walks to generate sequences of nodes, which are then used to train a Skip-gram model to learn node embeddings. Inspired by Word2Vec from natural language processing.
Node2Vec: Extends DeepWalk by introducing biased random walks that can balance between breadth-first search (BFS) and depth-first search (DFS). This method captures both homophily and structural equivalence.
Residual2Vec: Aims to reduce bias towards hub nodes by adjusting the transition probabilities during random walks. This method helps to provide more balanced embeddings that are not overly influenced by high-degree nodes.
Graph2Vec: Extends node embedding to graph embedding. It extracts rooted subgraphs from each graph and treats them as documents in a corpus. A Doc2Vec model is then trained to learn vector representations for entire graphs.

Random Walk Graph Neural Networks (Random Walk GNN)¶

Idea¶

Random Walk Graph Neural Networks (Random Walk GNN) aim to leverage random walks to learn effective graph representations. The core idea is to combine information from the original graph with additional "hidden" graphs to create a more robust and informative representation. This approach enhances the interpretability of the learned representations by extracting important patterns, or "fingerprints," from the graph.

Algorithm¶

Initialize the Graphs:
Start with a set of graphs $G$ and a set of hidden graphs $\hat{G}$ .
Each graph $G$ has an adjacency matrix $A$ , and each hidden graph $\hat{G}$ has an adjacency matrix $\hat{A}$ .
Define Random Walk Kernel:
The random walk kernel for a walk of length $p$ between graphs $G$ and $\hat{G}$ is defined to capture the similarity between the graphs based on their random walk behavior.
Compute Random Walks:
Perform random walks on the graphs $G$ and the hidden graphs $\hat{G}$ .
For a graph $G$ with adjacency matrix $A$ , a random walk of length $p$ can be represented by $A^p$ .
Direct Product of Graphs:
Compute the direct product of the original graph $G$ and the hidden graph $\hat{G}$ .
The direct product graph $G \times \hat{G}$ has a vertex set consisting of pairs of vertices from $G$ and $\hat{G}$ .
Calculate the Random Walk Kernel:
Use the adjacency matrices to calculate the random walk kernel: $k^{(p)}(G, \hat{G}) = \sum_{i,j} \left( A^p \times \hat{A}^p \right)_{i,j}$
Here, $A^p \times \hat{A}^p$ represents the Kronecker product of the adjacency matrices.
Incorporate Attributes:
Extend the algorithm to attributed graphs where each node has associated real-valued attributes.
Let $X$ be the attribute matrix for graph $G$ , and $\hat{X}$ be the attribute matrix for hidden graph $\hat{G}$ .
Kernel with Attributes:
Incorporate node attributes into the kernel calculation: $k^{(p)}(G, \hat{G}) = \sum_{i,j} \left( \hat{X}^T A^p X \right)_{i,j}$
Train the Model:
Use the computed kernels as input to a multi-layer perceptron (MLP) or other machine learning models.
Multiple hidden graphs and different walk lengths $p$ can be used to create various kernels.
Backpropagate the error through the network to update the parameters of the hidden graphs.