CS224W:

Social and Information Network Analysis

Autumn 2012

What is SNAP?

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is also available through the NodeXL which is a graphical front-end that integrates network analysis into Microsoft Office and Excel.

Installation

You can go to http://snap.stanford.edu/snap/download.html to download latest version of SNAP (v1.9). Then, after unzipping, go into directory examples and open SnapExamples.sln (Visual Studio required). If your system is UNIX-based, there is also Makefile in the directory examples.

SNAP in Linux

For Linux system, you have to install g++ first. Then, take a look at the subdirectories in examples/. Copy the Makefile*.* to your folder and change the definition of variable MAIN in Makefile.ex to your program name.
Stanford has plenty of computing resources. http://itservices.stanford.edu/service/sharedcomputing/environments lists the available computing systems. We recommend corn.stanford.edu, which has 32GB of memory to handle most of the tasks. You can simply put the source code in your home directory and use the command make.

SNAP in Windows

Under Windows system, you should install Visual Studio first (from msdn.stanford.edu). Then, open Visual Studio and create a project, add "#include YOUR_SNAP_PATH/snap/Snap.h" in your main .cpp file and include the Snap.h/cpp into your project. Also, due to the settings of SNAP, the character set must be set to Multi-byte. You can right-click on your project and go to Properties --> Configuration Properties --> General --> Projects Defaults --> Character Set --> Select "Use Multi-Byte Character Set". Then you can enjoy the powerful arsenal of SNAP!

Data Structures in SNAP

SNAP contains a STL-like library, specified in directory glib. It contains basic data structures, like vectors, hash-tables and strings. These data structures perform efficiently and supports serialization for loading and saving.

Basic Data Structures

Integer: TInt

Real number: TFlt

String: TStr

TInt a=5; cout << a << endl; --- 5 //In C style, use printf("%d\n", a.Val); ************************************** TStr a="abc"; TStr b="ccc"; cout << a.CStr() << endl; --- abc cout << a.Len() << endl; --- 3 cout << a[0] << endl; --- a cout << (a==b) << endl; --- 0

Composite Data Structures

Pair: TPair<Type1, Type2>

Triple: TTriple<Type1, Type2, Type3>

Quadruple: TQuad<Type1, Type2, Type3, Type4>

Type can also be other complex structures like TVec, TPair

TPair<TInt, TFlt> a; a.Val1=1; a.Val2=2.3;

Containers

Vector: TVec<Type>

Hash table: THash<Key Type, Value Type>

Hash Set: THashSet<Key Type>

When key is string, use TStrHash<Value Type> instead of THash<TStr, Value Type>. TStrHash uses string pool and saves more space.

TVec<TInt> a; a.Add(10); a.Add(20); a.Add(30); cout << a[0] << endl; --- 10 cout << a.Len() << endl; --- 3 *********************************** THash<TInt, TStr> a; a.AddDat(12,"abc"); a.AddDat(34,"def"); cout << a.GetKey(0) << endl; --- 12 for (int i = 0; i < 2; ++i) cout << a[i].CStr() << endl; --- abc --- def cout << a.GetKeyId(12) << endl; --- 0 cout << a.GetDat(34).CStr() << endl; --- def *********************************** THashSet<TInt> a; a.AddKey(12); a.AddKey(34); a.AddKey(56); cout << a.GetKey(2) << endl; --- 56

Saving and Loading

SNAP supports serialization for all the data structures aforementioned, as well as the network/graph structures to be introduced. It is much more efficient to store a data structure to binary files than to text format. Saving and loading binary files for data structures in SNAP is very simple, just via function Save and Load.

TVec<TInt> a; a.Add(10); a.Add(20); a.Add(30); {TFOut fout("a.bin"); a.Save(fout);} {TFIn fin("a.bin"); a.Load(fin);}

Example .cpp for basic data structure and data type in SNAP: BasicDS_DT_SNAP.cpp

Graph Manipulation in SNAP

SNAP contains a powerful graph manupulation library, specified in directory snap. It can deal with generating, manipulating as well as analyzing graphs/networks.

Graph/Network Types

TUNGraph: undirected graph with no multi-edge
TNGraph: directed graph with no multi-edge
TNEGraph: directed graph with multi-edge
TNodeNet<TNodeData>: directed graph with TNodeData object for each node
TNodeEDatNet<TNodeData, TEdgeData>: directed graph with TNodeData on each node and TEdgeData on each edge
TNodeEdgeNet<TNodeData, TEdgeData>: directed multi-edge graph with TNodeData on each node and TEdgeData on each edge
If you use the T..Net class, make sure you implement saving and loading functions. Example: NodeNet.cpp

When creating a graph/network, use smart pointer (e.g. TPt<TNGraph> g) whenever possible to let the library automatically allocating/de-allocating the space for you. Furthermore, when adding an edge, make sure the two nodes have already been added.

PNGraph Graph = TNGraph::New(); //PNGraph is defined in SNAP as TPt<TNGraph> Graph->AddNode(1); Graph->AddNode(5); Graph->AddEdge(1,5); cout << Graph->GetNodes() << endl; --- 2 //number of nodes cout << Graph->GetEdges() << endl; --- 1 //number of edges ****************************** PNGraph Graph = TSnap::GenRndGnm<PNGraph>(100, 1000); //create a directed random graph on 100 nodes and 1,000 edges

Playing with a Graph

We can use SNAP library to explore and analyze a graph easily.

1. Traverse a graph //traverse the nodes for (TNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) printf("%d %d %d\n", NI.GetId(), NI.GetOutDeg(), NI.GetInDeg()); //traverse the edges for (TNGraph::TEdgeI EI = Graph->BegEI(); EI < Graph->EndEI(); EI++) printf("edge (%d, %d)\n", EI.GetSrcNId(), EI.GetDstNId()); //we can traverse the edges also like this for (TNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) for (int e = 0; e < NI.GetOutDeg(); e++) printf("edge (%d %d)\n", NI.GetId(), NI.GetOutNId(e)); 2. Get properties of a graph // generate a network using Forest Fire model PNGraph G = TSnap::GenForestFire(1000, 0.35, 0.35); //get the largest weakly connected component of G PNGraph WccG = TSnap::GetMxWcc(G); //get a subgraph induced on nodes {0,1,2,3,4} PNGraph SubG = TSnap::GetSubGraph(G, TIntV::GetV(0,1,2,3,4)); TVec<TPair<TInt, TInt> > CntV; // vector of pairs of integers (size, count) //get distribution of connected components (component size, count) TSnap::GetWccSzCnt(G, CntV); //get degree distribution pairs (degree, count) TSnap::GetOutDegCnt(G, CntV); //convert to undirected graph TUNGraph PUNGraph UG = TSnap::ConvertGraph<PUNGraph, PNGraph>(G);

Example:
1. Generate a G(n,m) graph (Erdos-Renyi model): Gnm.cpp
2. Get size for connected components in an undirected graph: getCC.cpp

Datasets in SNAP

SNAP contains a collection of about 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges. It includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks. A full list is available at http://snap.stanford.edu/data/index.html, with downloadable datasets. Some example networks include:

Social networks: online social networks, edges represent interactions between people
Citation networks: nodes represent papers, edges represent citations
Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)
Amazon networks : nodes represent products and edges link commonly co-purchased products
Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets

Loading and Saving

Loading and saving network data mentioned above is similar to that of data structures. Here's an example for data as20graph.txt in the directory example of SNAP.

Top lines of as20graph.txt: # Directed Node Graph # Autonomous systems (graph is undirected, each edge is saved twice) # Nodes: 6474 Edges: 26467 # SrcNId DstNId 1 3 1 6 1 32 1 48 1 70 1 63 ... Loading: PUNGraph g=TSnap::LoadEdgeList<PUNGraph>("as20graph.txt",0,1); //0 is the column id for source node //1 is the column id for target node Saving: TSnap::SaveEdgeList<PUNGraph>(g, "as20graph.txt", "");

Plotting and Visualization in SNAP

Plotting in SNAP

Making a plot in SNAP requires installing the software Gnuplot. Gnuplot is a plotting software with its own grammar. Make sure that the path containing wgnuplot.exe (for Windows) or gnuplot (for Linux) is in your environmental variable $PATH. Fortunately, SNAP contains interfaces to help you directly generate Gnuplot scripts and run Gnuplot for you. Here's a typical example of plotting two curves with known (x,y) coordinates:

TVec<TPair<TFlt,TFlt> > XY1, XY2; for (int i=0; i<10; ++i) { XY1.Add(TPair<TFlt,TFlt>(i+0.0,i+0.0)); XY2.Add(TPair<TFlt,TFlt>(i+0.0,i*i+0.0)); } TGnuPlot Gp("fileName", "titleName"); Gp.AddPlot(XY1, gpwLinesPoints, "curve1"); Gp.AddPlot(XY2, gpwPoints, "curve2"); Gp.SetXYLabel("x-axis name", "y-axis name"); Gp.SavePng(); //or Gp.SaveEps(); //You can also use log-log scale via Gp.SetScale(gpsLog10XY);

Here's the result: Gnuplot example

Fig. 1 Plotting line y=x and points y=x^2 using Gnuplot

Typically, the plotting interface of SNAP will generate three types of files for each plot:

.plt file contains the plotting command for gnuplot

.tab file contains the data

.png or .eps is the plot

You can manipulate .plt and .tab and then run wgnuplot.exe xx.plt (for Windows) or gnuplot xx.plt (for Linux) to replot.
Example .cpp to plot exponential and Poisson distribution using Gnuplot: Gnuplot_example.cpp

Graph Visualization in SNAP

Visualize a graph in SNAP requires installing the software GraphViz. GraphViz is a graph-visualization software. Make sure that the path of bin in your GraphViz directory is in your environmental variable $PATH. Like TGnuplot, SNAP contains interfaces to help you directly manipulate GraphViz. Here's a typical example of visualizing a simple directed graph:

PNGraph g=TNGraph::New(); g->AddNode(1); g->AddNode(2); g->AddNode(3); g->AddNode(4); g->AddEdge(1,2); g->AddEdge(2,3); g->AddEdge(1,3); g->AddEdge(2,4); g->AddEdge(3,4); TIntStrH name; //node labels name.AddDat(1)="David"; name.AddDat(2)="Rachel"; name.AddDat(3)="Jim"; name.AddDat(4)="Sam"; TSnap::DrawGViz<PNGraph>(g, gvlDot, "gviz_plot.png", "", name);

Here's the result: GraphViz example

Fig. 2 Visualizing a graph in SNAP using TGraphViz

Example .cpp to plot network using GraphViz: GraphViz_example.cpp