Social and Information Network Analysis
Pointers to data and code
Stanford Large Network Dataset Collection
Coauthorship and Citation Networks
- AS Graphs:
AS-level connectivities inferred from Oregon route-views, Looking glass data and Routing registry data
- Yelp Review Data:
reviews of the 250 closest businesses for 30 universities for students and academics to explore and research
Prosper peer to peer money lending dataset
- Money Lending
Data: Lenders ask for loans and people bid (price, interest rate) on
loans to fund.
- Youtube data:
YouTube videos as nodes. Edge a->b means video b is in the related video list (first 20 only) of a video a.
Amazon product copurchasing networks and metadata
Data: The data was collected by crawling Amazon website and contains
product metadata and review information about 548,552 different products
(Books, music CDs, DVDs and VHS video tapes).
page to page link data: A list of all page-to-page links in Wikipedia
- DBPedia: The
DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia.
- Edits and
talks: Complete edit history (all revisions, all pages) of Wikipedia since its inception till January 2008.
Who trusts whom data at Trustlet
Mark Newman's pointers
Munmun De Choudhury's pointers
Note: Jure Leskovec will have to apply for any sets you want, and we must agree not to distribute them further.
data: Flickr Image Dataset, YouTube Dataset, Digg Dataset
(Social Media), Engadget Dataset (online communities), Del.icio.us
Dataset (Social bookmarking)
There may be a delay, so get requests in early.
C++ libary for working with massive network datsets (Windows, Linux, Mac)
Program for large network analysis (Windows or Linux via Wine)
Python package for the study of the structure of complex networks
Graph visualization software
Exploratory data analysis and visualization tool for graphs and networks
Software framework for information visualization (Linux, MacOSX, Windows)
Software for social network analysis (Windows)
Large-scale network analysis, modeling and visualization toolkit
Tools for fitting heavy-tailed distributions to data
Some websites that may be interesting to do analysis on: