Stanford CS224W: Resources

Social and Information Network Analysis
Autumn 2011

Pointers to data and code


Stanford Large Network Dataset Collection

Coauthorship and Citation Networks

Internet Topology

  • AS Graphs: AS-level connectivities inferred from Oregon route-views, Looking glass data and Routing registry data

Stack Overflow

Yelp Data

  • Yelp Review Data: reviews of the 250 closest businesses for 30 universities for students and academics to explore and research

Prosper peer to peer money lending dataset

  • Money Lending Data: Lenders ask for loans and people bid (price, interest rate) on loans to fund.

Youtube dataset

  • Youtube data: YouTube videos as nodes. Edge a->b means video b is in the related video list (first 20 only) of a video a.

Amazon product copurchasing networks and metadata

  • Amazon Data: The data was collected by crawling Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes).


  • Wikipedia page to page link data: A list of all page-to-page links in Wikipedia
  • DBPedia: The DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia.
  • Edits and talks: Complete edit history (all revisions, all pages) of Wikipedia since its inception till January 2008.

Movie Ratings

Who trusts whom data at Trustlet

Mark Newman's pointers

Munmun De Choudhury's pointers

  • Network data: Flickr Image Dataset, YouTube Dataset, Digg Dataset (Social Media), Engadget Dataset (online communities), Dataset (Social bookmarking)
Note: Jure Leskovec will have to apply for any sets you want, and we must agree not to distribute them further.
There may be a delay, so get requests in early.

Software Tools

C++ libary for working with massive network datsets (Windows, Linux, Mac)

Program for large network analysis (Windows or Linux via Wine)

Python package for the study of the structure of complex networks

Graph visualization software

Exploratory data analysis and visualization tool for graphs and networks

Software framework for information visualization (Linux, MacOSX, Windows)

Software for social network analysis (Windows)

Large-scale network analysis, modeling and visualization toolkit

Tools for fitting heavy-tailed distributions to data


Some websites that may be interesting to do analysis on:

Similar Courses