node2vec: Scalable Feature Learning for Networks

node2vec is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks.

Motivation

Learning useful representations from highly structured objects such as graphs is useful for a variety of machine learning applications. Besides reducing the engineering effort, these representations can lead to greater predictive power.

The node2vec framework learns low-dimensional representations for nodes in a graph by optimizing a neighborhood preserving objective. The objective is flexible, and the algorithm accomodates for various definitions of network neighborhoods by simulating biased random walks. Specifically, it provides a way of balancing the exploration-exploitation tradeoff that in turn leads to representations obeying a spectrum of equivalences from homophily to structural equivalence.


After transitioning to node v from t, the return hyperparameter, p and the inout hyperparameter, q control the probability of a walk staying inward revisiting nodes (t), staying close to the preceeding nodes (x1), or moving outward farther away (x2, x3).

For example, the graph visualization above depicts the color-coded communities exhibiting homophily discovered by node2vec in the Les Misérables Network.

Code

A reference implementation of node2vec in Python is available on GitHub. A high performance implementation is included in SNAP and available on GitHub as well.

Datasets

Links to datasets used in the paper:

Contributors

The following people contributed to node2vec:
Aditya Grover
Jure Leskovec
Vid Kocijan

References

node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.