CS224W:
Social and Information Network Analysis
Autumn 2011
Pointers to data and code
Datasets
Stanford Large Network Dataset Collection
Coauthorship and Citation Networks
Internet Topology
- AS Graphs:
AS-level connectivities inferred from Oregon route-views, Looking glass data and Routing registry data
Stack Overflow
Yelp Data
- Yelp Review Data:
reviews of the 250 closest businesses for 30 universities for students and academics to explore and research
Prosper peer to peer money lending dataset
- Money Lending
Data: Lenders ask for loans and people bid (price, interest rate) on
loans to fund.
Youtube dataset
- Youtube data:
YouTube videos as nodes. Edge a->b means video b is in the related video list (first 20 only) of a video a.
Amazon product copurchasing networks and metadata
- Amazon
Data: The data was collected by crawling Amazon website and contains
product metadata and review information about 548,552 different products
(Books, music CDs, DVDs and VHS video tapes).
Wikipedia
- Wikipedia
page to page link data: A list of all page-to-page links in Wikipedia
- DBPedia: The
DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia.
- Edits and
talks: Complete edit history (all revisions, all pages) of Wikipedia since its inception till January 2008.
Movie Ratings
Who trusts whom data at Trustlet
Mark Newman's pointers
Munmun De Choudhury's pointers
- Network
data: Flickr Image Dataset, YouTube Dataset, Digg Dataset
(Social Media), Engadget Dataset (online communities), Del.icio.us
Dataset (Social bookmarking)
Note: Jure Leskovec will have to apply for any sets you want, and we must agree not to distribute them further.
There may be a delay, so get requests in early.
Software Tools
C++ libary for working with massive network datsets (Windows, Linux, Mac)
Program for large network analysis (Windows or Linux via Wine)
Python package for the study of the structure of complex networks
Graph visualization software
Exploratory data analysis and visualization tool for graphs and networks
Software framework for information visualization (Linux, MacOSX, Windows)
Software for social network analysis (Windows)
Large-scale network analysis, modeling and visualization toolkit
Tools for fitting heavy-tailed distributions to data
Websites
Some websites that may be interesting to do analysis on:
Similar Courses