GraphSAGE: Inductive Representation Learning on Large Graphs

GraphSAGE is a framework for inductive representation learning on large graphs. GraphSAGE is used to generate low-dimensional vector representations for nodes, and is especially useful for graphs that have rich node attribute information.

Motivation

Low-dimensional vector embeddings of nodes in large graphs have numerous applications in machine learning (e.g., node classification, clustering, link prediction). However, most embedding frameworks are inherently transductive and can only generate embeddings for a single fixed graph. These transductive approaches do not efficiently generalize to unseen nodes (e.g., in evolving graphs), and these approaches cannot learn to generalize across different graphs. In contrast, GraphSAGE is an inductive framework that leverages node attribute information to efficiently generate representations on previously unseen data.

To run GraphSAGE, it needs to train on an example graph or set of graphs. After training, GraphSAGE can be used to generate node embeddings for previously unseen nodes or entirely new input graphs, as long as these graphs have the same attribute schema as the training data.

Code

GraphSAGE is implemented in TensorFlow and can be easily integrated into other machine learning pipelines. Code and implementation details can be found on GitHub.

Datasets

Links to datasets used in the paper: Please see the GitHub code page for details on the data format. The Web of Science citation data used in the paper can be made available to groups or individuals with valid WoS licenses.

Contributors

The following people contributed to GraphSAGE:
William L. Hamilton
Rex Ying
Jure Leskovec

References

Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec arXiv:1706.02216 [cs.SI], 2017.