Snap.py - SNAP for Python

About Snap.py

Snap.py is a Python interface for SNAP. SNAP is a general purpose, high performance system for analysis and manipulation of large networks. SNAP is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges.

Snap.py provides performance benefits of SNAP, combined with flexibility of Python. Most of the SNAP functionality is available via Snap.py in Python.

Download and Installation of Snap.py

The latest version of Snap.py is 3.0 (Sep 14, 2016). Packages for Mac OS X, Linux (as CentOS) and Windows 64-bit are available at Snap.py download.

Snap.py requires that Python 2.7.x is installed on your machine. Prebuilt Snap.py packages support the following versions of Python:

Make sure that your operating system is 64-bit and that you are using 64-bit versions of Python 2.7.x.

On Windows, Snap.py requires a 64-bit operating system version. Visual C++ Redistributable for Visual Studio 2012 must be installed on the system. You need to download and install the 64-bit version vcredist_x64.exe, not 32-bit version vcredist_x86.exe.

To install Snap.py, download and unpack the package for your platform and run setup.py. See below for OS specific instructions.

Snap.py is largely self-contained and requires external packages only for drawing and visualization. The following packages need to be installed on the system to support drawing and visualization in Snap.py:

Set the system PATH variable, so that Gnuplot and Graphviz are available, or put the executables in the working directory.

Installation of Snap.py on Mac OS X

On Mac OS X (supported releases are 10.7.5 or later), use the following commands:
tar zxvf snap-3.0.0-3.0-macosx10.7.5-x64-py2.7.tar.gz cd snap-3.0.0-3.0-macosx10.7.5-x64-py2.7 sudo python setup.py install # omit 'sudo' for Anaconda and Homebrew Python

Installation of Snap.py on Linux

On Linux, use the following commands:
tar zxvf snap-3.0.0-3.0-centos6.5-x64-py2.6.tar.gz cd snap-3.0.0-3.0-centos6.5-x64-py2.6 sudo python setup.py install

Installation of Snap.py on Windows 64-bit

On Windows, verify that your operating system is 64-bit and that a 64-bit version of Visual C++ Redistributable for Visual Studio 2012 is installed, then unzip the Snap.py package and install it with the following command in the Command Prompt:
cd snap-3.0.0-3.0-Win-x64-py2.7 python setup.py install

Local Install of Snap.py

If you want to use Snap.py in a local directory without installing it system-wide, then download the corresponding Snap.py package for your system, unpack, and copy files snap.py and _snap.so (or _snap.pyd) to your working directory.

Documentation and Support

Snap.py Tutorial and Manual are available.

Snap.py is a Python interface for SNAP, which is written in C++. Most of the SNAP functionality is supported. For more details, check out SNAP C++ documentation.

A tutorial on Large Scale Network Analytics with SNAP with a significant Snap.py specific component was given at the WWW2015 conference in Florence.

Use SNAP and Snap.py users mailing list for any questions or a discussion about Snap.py installation, use, and development. To post to the group, send your message to snap-discuss at googlegroups dot com.

Quick Introduction to Snap.py

This document gives a quick introduction to a range of Snap.py operations.

Several programs are available to demonstrate the use of Snap.py. The programs are also useful as tests to confirm that your installation of Snap.py is working correctly:

The code from intro.py is explained in more details below. All the code assumes that Snap.py has been imported by the Python program as:

from snap import *

Graph and Network Types

Snap.py supports graphs and networks. Graphs describe topologies. That is nodes with unique integer ids and directed/undirected/multiple edges between the nodes of the graph. Networks are graphs with data on nodes and/or edges of the network. Data types that reside on nodes and edges are simply passed as template parameters which provides a very fast and convenient way to implement various kinds of networks with rich data on nodes and edges.

Graph types in SNAP:

TUNGraph: undirected graph (single edge between an unordered pair of nodes) TNGraph: directed graph (single directed edge between an ordered pair of nodes)

Network types in SNAP:

TNEANet: directed multigraph with attributes for nodes and edges

Graph Creation

Example of how to create and use a directed graph:

# create a graph PNGraph G1 = TNGraph.New() G1.AddNode(1) G1.AddNode(5) G1.AddNode(32) G1.AddEdge(1,5) G1.AddEdge(5,1) G1.AddEdge(5,32)

Nodes have explicit (and arbitrary) node ids. There is no restriction for node ids to be contiguous integers starting at 0. In TUNGraph and TNGraph edges have no explicit ids -- edges are identified by a pair node ids.

Prefix P in the class name stands for a pointer, while T means a type.

Networks are created in the same way as graphs.

Iterators

Many SNAP operations are based on node and edge iterators which allow for efficient implementation of algorithms that work on networks regardless of their type (directed, undirected, graphs, networks) and specific implementation.

Some examples of iterator usage in Snap.py are shown below:

# create a directed random graph on 100 nodes and 1k edges G2 = GenRndGnm(PNGraph, 100, 1000) # traverse the nodes for NI in G2.Nodes(): print "node id %d with out-degree %d and in-degree %d" % ( NI.GetId(), NI.GetOutDeg(), NI.GetInDeg()) # traverse the edges for EI in G2.Edges(): print "edge (%d, %d)" % (EI.GetSrcNId(), EI.GetDstNId()) # traverse the edges by nodes for NI in G2.Nodes(): for Id in NI.GetOutEdges(): print "edge (%d %d)" % (NI.GetId(), Id)

In general node iterators provide the following functionality:

GetId(): return node id GetOutDeg(): return out-degree of a node GetInDeg(): return in-degree of a node GetOutNId(e): return node id of the endpoint of e-th out-edge GetInNId(e): return node id of the endpoint of e-th in-edge IsOutNId(int NId): do we point to node id n IsInNId(n): does node id n point to us IsNbrNId(n): is node n our neighbor

For additional information on node and edge iterators, check out the Graph and Network Classes section in the Snap.py reference manual.

Input/Output

With SNAP it is easy to save and load networks in various formats. Internally SNAP saves networks in compact binary format but functions for loading and saving networks in various other text and XML formats are also available (see gio.h).

For example, Snap.py code for saving and loading graphs looks as follows:

# generate a network using Forest Fire model G3 = GenForestFire(1000, 0.35, 0.35) # save and load binary FOut = TFOut("test.graph") G3.Save(FOut) FOut.Flush() FIn = TFIn("test.graph") G4 = TNGraph.Load(FIn) # save and load from a text file SaveEdgeList(G4, "test.txt", "Save as tab-separated list of edges") G5 = LoadEdgeList(PNGraph, "test.txt", 0, 1)

Manipulating Graphs and Networks

SNAP provides rich functionality to efficiently manipulate graphs and networks. Most functions support all graph/network types.

For example:

# generate a network using Forest Fire model G6 = GenForestFire(1000, 0.35, 0.35) # convert to undirected graph G7 = ConvertGraph(PUNGraph,G6) WccG = GetMxWcc(G6) # get a subgraph induced on nodes {0,1,2,3,4,5} SubG = GetSubGraph(G6, TIntV.GetV(0,1,2,3,4)) # get 3-core of G Core3 = GetKCore(G6, 3) # delete nodes of out degree 10 and in degree 5 DelDegKNodes(G6, 10, 5)

For more details on Snap.py functionality, check out the Snap.py Manual.

Computing Structural Properties

SNAP provides rich functionality to efficiently compute structural properties of networks. Most functions support all graph/network types.

For example:

# generate a Preferential Attachment graph on 1000 nodes and node out degree of 3 G8 = GenPrefAttach(1000, 3) # vector of pairs of integers (size, count) CntV = TIntPrV() # get distribution of connected components (component size, count) GetWccSzCnt(G8, CntV) # get degree distribution pairs (degree, count) GetOutDegCnt(G8, CntV) # vector of floats EigV = TFltV() # get first eigenvector of graph adjacency matrix GetEigVec(G8, EigV) # get diameter of G8 GetBfsFullDiam(G8, 100) # count the number of triads in G8, get the clustering coefficient of G8 GetTriads(G8) GetClustCf(G8)

For more details on Snap.py functionality, check out the Snap.py Manual.