Data

We release cascade data for different topics and world news for the 5,000 most active sites from four million sites from March 2011 to February 2012. This dataset is part of the SNAP web and blog datasets.

Download:


Data format:

memes-w5-all-2011-03-2012-02-n5000-call-nc10-cl2-quotes.tgz: this compressed file contains a single file with the associated id and text for every meme that was tracked.

<meme id>,<meme text>


Example:

129423843;il est temps qu'ils commencent ? construire une ?quipe avec de nouveaux talents 134423843; rest assured, don't underestimate your daughter. 135423843;our guys thought it was good,


memes-w5-all-2011-03-2012-02-n5000-call-nc10-cl2-cascades-all.tgz: this compressed file contains a single file with all cascades. The file has two blocks separated by a blank line. Each line in the first block contains the id and name of a site:

<website id>,<website name>


Each line in the second block contains information about one meme (cascade). The time is in UNIX time in hours.

<meme id>;<website id>,<timestamp>,<website id>,<timestamp>,<website id>,<timestamp>...


Example:

... 4262,wthr.com 8588,klkntv.com 9995,presseportal.de 10361,wnyc.org 8709,kswt.com 7954,woi-tv.com ... ... 115642731;2838,366110.853056,5344,366113.987500,5726,366113.987500,... 32875877;533,362176.518611,24963,362176.519722,1086,362176.519722,... 32875878;533,362176.518611,24963,362176.519722,1086,362176.519722,... 93254134;115,365054.000000,1214,365054.000000,1086,365054.004722,... 48060773;5004,362899.355833,14638,362899.366667,1086,362899.366667,... ...


memes-w5-all-2011-03-2012-02-n5000-call-nc10-cl2-cascades-keywords.tgz: this compressed file contains several files, one per keyword. Each of the files contains cascades built from quotes that were mentioned in posts containing a particular keyword. The format of each file is the same as the file for all cascades (above)