# Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Distance Encoding is a general class of graph-structure-related features that can be utilized by graph neural networks to improve the structural representation power. Given a node set whose structural representation is to be learnt, DE for a node over the graph is defined as a mapping of a set of landing probabilities of random walks from each node of the node set of interest to this node. Distance encoding generally includes measures such as shortest-path-distances and generalized PageRank scores. Distance encoding can be merged into the design of graph neural networks in simple but effective ways: First, we propose DEGNN that utilizes distance encoding as extra node features. We further enhance DEGNN by allowing distance encoding to control the aggregation procedure of traditional GNNs, which yields another model DEAGNN. Since distance encoding purely depends on the graph structure and is independent from node identifiers, it has inductive and generalization ability.

The power for structural representation learning based on distance encoding can be rigorously characterized. Specifically, we may prove that both DEGNN and DEAGNN are able to distinguish two non-isomorphic equal-sized node sets (including nodes, node-pairs, ..., entire-graphs) that are embedded in almost all regular graphs where traditional GNNs always fail if no discriminatory node/edge attributes are available. We also prove that these two models are not more powerful than traditional GNNs without discriminatory node/edge attributes to learn the structural representations of nodes over distance regular graphs, which implies the limitation of distance encoding. However, we may show that both models have extra power to learn the structural representations of node-pairs over distance regular graphs.

Both DEGNN and DEAGNN are evaluated on three levels of tasks including roles of node-role classification (node-level), link prediction (node-pair-level), triangle prediction (node-triad-level). Note that triangle prediction is to predict higher-order network motifs that are known to be challenging and very few works have been proposed before for this task. Both models significantly outperform traditional GNNs on all three tasks by up-to 15% improvement in prediction average accuracy. Both methods also outperform other state-of-the-art baselines specifically designed for these tasks.

## Motivation

Structural representation learning of a node set has many applications such as node-role prediction based on a single node representation, link prediction based on node-pair representation, molecule prediction based on entire-graph representation. Graph neural networks (GNNs), have recently become almost the default choice to learn structural representations. Despite their great success, recent works proved that the structural representation power of most GNNs, such as GCN, GAT, GraphSAGE, GIN, MPNN and many well-known GNNs, is bounded by 1-Weisfeiler-Lehman test (Weisfeiler & Lehman, 1968). We refer these GNNs as WLGNN.

The figure below shows a case for this. The graph is a 3-regular graph with 8 nodes and no attributes. The nodes with same colors are structurally equivalent because of the horizontal reflexivity and the node permutation shown in the right subfigure. The nodes with different colors are not structurally equivalent. However, WLGNN will assign all nodes with same representations. Furthermore, WLGNN cannot distinguish all the node-pairs highlighted by the dashed circles no matter whether these node-pairs correspond to edges or not.

Using shortest-path-distances between nodes as features, we may distinguish these three types of nodes. To distinguish blue nodes from the red or green ones, we may use the fact that there is another node that has shortest path distance 3 to each blue node (a pair of nodes with such distance are highlighted by red boxes), while all shortest-path-distances between nodes and red or green nodes are less than 3. Our distance encoding provides a principled way to leverage this type of structural features.

To distinguish the green nodes and the red nodes, one needs more careful analysis. Here we span the computation graphs (actually trees) of GNNs to compute the representations of a green node and a red node. We draw two graphs below with one black node in each to denote the green node and the red node of interest respectively. Other nodes are colored according to their shortest-path-distances (one type of distance encoding), denoted as D(istance)E(ncoding)-1, to the black nodes. Check the spanning computation graphs and we may find that difference between two computation graphs amazingly appears (highlighted by the red boxes), which means that GNNs plus distance encoding (DE-1) may distinguish the green nodes from the red nodes with 2 layers.

If the node set of interest contains multiple nodes, the distance encoding of any node in the graph is defined as a set of the distance encodings between each node in the node set and this node. The distance encoding can establish the correlation between the representations of nodes within the node set of interest, which further yields a more powerful representation of this node set.

To illustrate this, consider two node-pairs that correspond to two edges of these two graphs below respectively. The left is Shrikhande graph while the right is $4\times4$ Rook's graph. We use GNNs to learn the representations of the node-pairs colored by black. Each node is colored with distance encoding denoted by D(istance)E(ncoding)-2 that is a set of shortest-path-distances to either node in the node-pairs of interest. Note the neighbors of nodes with DE-2$=\{1,1\}$ (covered by dashed boxes) that are highlighted by red ellipses. As these neighbors have different DE-2's, after one layer of DEGNN, the intermediate representations of nodes with DE-2$=\{1,1\}$ are different between these two graphs. Using another layer, DEGNN can distinguish the representations of the two node-pairs of interest.

Note that distinguishing the node-pairs of the above two graphs is really hard, because these two graphs are strongly regular graphs and even the 2-WL test will fail in this case, which means the recently proposed provably more powerful GNNs, Ring-GNN and PPGN, by mimicking the 2-WL test, will fail as well.

The formal definition of distance encoding and relevant theoretical analysis and empirical evaluation can be found in our paper.

## Code

A reference implementation of DEGNN and DEAGNN in Python is available on GitHub.

## Datasets

The datasets used by Distance Encoding project are included in the code repository.

## Contributors

The following people contributed to Distance Encoding project:
Pan Li
Yanbang Wang
Hongwei Wang
Jure Leskovec

## References

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. Pan Li, Yanbang Wang, Jure Leskovec. NeurIPS , 2020.