# GraphSAGE: Inductive Representation Learning on Large Graphs

** GraphSAGE** is a framework for inductive representation learning on large graphs.
GraphSAGE is used to generate low-dimensional vector representations for nodes, and is especially useful for graphs that have rich node attribute information.

## Motivation

Low-dimensional vector embeddings of nodes in large graphs have numerous applications in machine learning (e.g., node classification, clustering, link prediction).
However, most embedding frameworks are inherently

**transductive** and can only generate embeddings for a single fixed graph.
These transductive approaches do not efficiently generalize to unseen nodes (e.g., in evolving graphs), and these approaches cannot learn to generalize across different graphs.
In contrast, GraphSAGE is an

**inductive** framework that leverages node attribute information to efficiently generate representations on previously unseen data.

To run GraphSAGE, it needs to train on an example graph or set of graphs.
After training, GraphSAGE can be used to generate node embeddings for previously unseen nodes or entirely new input graphs, as long as these graphs have the same attribute schema as the training data.

## Code

GraphSAGE is implemented in TensorFlow and can be easily integrated into other machine learning pipelines.
Code and implementation details can be found on

GitHub.

## Datasets

Links to datasets used in the paper:

Please see the

GitHub code page for details on the data format.
The Web of Science citation data used in the paper can be made available to groups or individuals with valid WoS licenses.

## Contributors

The following people contributed to GraphSAGE:

William L. Hamilton

Rex Ying

Jure Leskovec

## References

Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec * arXiv:1706.02216 [cs.SI]*, 2017.