ID-GNNs are the first class of message passing GNNs that have greater expressive power than 1-Weisfeiler-Lehman (1-WL) graph isomorphism test. ID-GNNs offer consistent performance gains over existing GNNs on node, edge and graph level tasks.
Messaging passing GNNs (MP-GNNs), such as GCN, GraphSAGE, and GAT, are dominantly used today due to their simplicity, efficiency and strong performance in real-world applications. The central idea behind message passing GNNs is to learn meaningful node embeddings via the repeated aggregation of information from local node neighborhoods using non-linear transformations.
However, it has been shown that the expressive power of existing MP-GNNs is upper-bounded by the 1-Weisfeiler-Lehman (1-WL) test.
Concretely, a fundamental limitation of existing MP-GNNs is that two nodes with different neighborhood structure can have the same computational graph, thus appearing indistinguishable. As is shown in the figure below, such failure cases are abundant: in node classification tasks, existing GNNs fail to distinguish nodes that reside in $d$-regular graphs of different sizes; in link prediction tasks, they cannot differentiate node candidates with the same neighborhood structures but different shortest path distance to the source node; and in graph classification tasks, they cannot differentiate $d$-regular graphs.
Figure 1: Overview of the proposed ID-GNN model. Across all examples, the task requires an embedding that allows for the differentiation of the label $A$ vs. $B$ nodes in their respective graphs. However, across all tasks, existing GNNs, regardless of depth, will always assign the same embedding to both classes of nodes, because for all tasks the computational graphs are identical. In contrast, the colored computation graphs provided by ID-GNNs allows for clear differentiation between the nodes of class $A$ and class $B$
We propose two versions of ID-GNNs.
ID-GNN-Full Identity information is incorporated by applying rounds of heterogeneous message passing. Specifically, to embed a given node, ID-GNNs first extract the ego network centered at that node, then apply message passing, where the messages from the center node (colored nodes in Figure 1) and the rest of the nodes are computed using different sets of parameters. This approach naturally applies to applications involving node or edge features. The algorithm is described below.
ID-GNN-Fast A simplified version of ID-GNN, where we inject identity information via cycle counts originating from a given node as augmented node features. These cycle counts capture node identity information by counting the colored nodes within each layer of the ID-GNN computational graph, and can be efficiently computed by powers of a graph's adjacency matrix.
We compare ID-GNNs against GNNs across 8 datasets and 6 different tasks.
We consider a collection of challenging graph property prediction tasks where existing GNNs fail, including predicting node clustering coefficient, predicting shortest path distance, and differentiating random graphs. The results are shown below.
We further apply ID-GNNs to real-world datasets, on node/edge/graph level tasks. The results are shown below.
Please refer to our paper for detailed explanations and more results.