This dataset is a collection of monthly user interaction networks from the year 2014 for 2046 subreddit communities from reddit.com. There are two types networks: chain-based interaction networks have link users who comment within a linear chain (and are separated by at most 2 other comments); reply-based interaction networks only connect users when one has directly replied to the other. The 2046 subreddits were selected by removing subreddits that fell below certain activity thresholds (need to have at least 100 comments in every week) and discarding two subreddits, /r/counting and /r/CatsStandingUp, that are significantly anomolous in their commenting patterns. Only users who commented at least 50 times to Reddit in 2014 are included in these networks, representing about the top-20% of users.
Each subreddit has a json file, which contains a list of networks defined as adjacency lists with username strings. These raw adjacency lists are directed; the replier links to the individuals she/he is responding to.
Dataset statistics | |
---|---|
Number of subreddits | 2046 |
Median number of users per monthly network | 504 |
Timespan | Jan. 27, 2014 - Nov. 30 2014 |
File | Description |
---|---|
reddit_chain_networks.tar.gz | Networks constructed from comment chains |
reddit_reply_networks.tar.gz | Networks constructed from direct replies |
Directed json adjacency lists with usernames as identifiers. Each subreddit has a file "[subreddit].json" that contains 11 "monthly" interaction networks in a list (corresponding to ISO 4-week periods starting from Jan. 27, 2014 and ending on Nov. 30, 2014). Each network is represented as directed adjacency lists (dictionaries mapping users to lists of people they replied to). The December/January holiday periods are excluded due to data quality issues.