node2vec: Scalable Feature Learning for Networks
node2vec is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks.
Motivation
Learning useful representations from highly structured objects such as graphs is useful for a variety of machine learning applications. Besides reducing the engineering effort, these representations can lead to greater predictive power.
The
node2vec framework learns low-dimensional representations for nodes in a graph by optimizing a neighborhood preserving objective. The objective is flexible, and the algorithm accomodates for various definitions of network neighborhoods by simulating biased random walks. Specifically, it provides a way of balancing the exploration-exploitation tradeoff that in turn leads to representations obeying a spectrum of equivalences from homophily to structural equivalence.
After transitioning to node
v from
t, the return hyperparameter,
p and the inout hyperparameter,
q control the probability of a walk staying inward revisiting nodes (
t), staying close to the preceeding nodes (
x1), or moving outward farther away (
x2,
x3).
For example, the graph visualization above depicts the color-coded communities exhibiting homophily discovered by
node2vec in the Les Misérables Network.
Code
A reference implementation of
node2vec in Python is available on
GitHub.
A high performance implementation is included in SNAP and available on
GitHub as well.
Datasets
Links to datasets used in the paper:
Contributors
The following people contributed to node2vec:
Aditya Grover
Jure Leskovec
Vid Kocijan
References
node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.