GraphSAGE: Inductive Representation Learning on Large Graphs
GraphSAGE is a framework for inductive representation learning on large graphs.
GraphSAGE is used to generate low-dimensional vector representations for nodes, and is especially useful for graphs that have rich node attribute information.
Motivation
Low-dimensional vector embeddings of nodes in large graphs have numerous applications in machine learning (e.g., node classification, clustering, link prediction).
However, most embedding frameworks are inherently
transductive and can only generate embeddings for a single fixed graph.
These transductive approaches do not efficiently generalize to unseen nodes (e.g., in evolving graphs), and these approaches cannot learn to generalize across different graphs.
In contrast, GraphSAGE is an
inductive framework that leverages node attribute information to efficiently generate representations on previously unseen data.
To run GraphSAGE, it needs to train on an example graph or set of graphs.
After training, GraphSAGE can be used to generate node embeddings for previously unseen nodes or entirely new input graphs, as long as these graphs have the same attribute schema as the training data.
Code
GraphSAGE is implemented in TensorFlow and can be easily integrated into other machine learning pipelines.
Code and implementation details can be found on
GitHub.
Datasets
Links to datasets used in the paper:
Please see the
GitHub code page for details on the data format.
The Web of Science citation data used in the paper can be made available to groups or individuals with valid WoS licenses.
Contributors
The following people contributed to GraphSAGE:
William L. Hamilton
Rex Ying
Jure Leskovec
References
Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec
arXiv:1706.02216 [cs.SI], 2017.