Social Network: Reddit Hyperlink Network
Dataset information
The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017.
Subreddit Hyperlink Network: the subreddit-to-subreddit hyperlink network is extracted from the posts that create hyperlinks from one subreddit to another. We say a hyperlink originates from a post in the source community and links to a post in the target community. Each hyperlink is annotated with three properties: the timestamp, the sentiment of the source community post towards the target community post, and the text property vector of the source post. The network is directed, signed, temporal, and attributed.
Note that each post has a title and a body. The hyperlink can be present in either the title of the post or in the body. Therefore, we provide one network file for each.
Subreddit Embeddings: We have also provided embedding vectors representing each subreddit. These can be found in this dataset link: subreddit embedding dataset. Please note that some subreddit embeddings could not be generated, so this file has 51,278 embeddings.
Project website: These files have been generated as part of the research project on how subreddits attack one another. The details of the project can be found here.
Dataset statistics |
Number of nodes (subreddits) | 55,863 |
Number of edges (hyperlink between subreddits) | 858,490 |
Edge weights (label of hyperlink) | -1 or +1 |
Edge attributes | Text property vectors |
Timespan | Jan 2014 - April 2017 |
Source (citation)
The following BibTeX citation can be used:
@inproceedings{kumar2018community,
title={Community interaction and conflict on the web},
author={Kumar, Srijan and Hamilton, William L and Leskovec, Jure and Jurafsky, Dan},
booktitle={Proceedings of the 2018 World Wide Web Conference on World Wide Web},
pages={933--943},
year={2018},
organization={International World Wide Web Conferences Steering Committee}
}
Files
Data format
The data file is in tab separated format.
SOURCE_SUBREDDIT tab TARGET_SUBREDDIT tab POST_ID tab TIMESTAMP tab POST_LABEL tab POST_PROPERTIES
leagueoflegends teamredditteams 1u4nrps 2013-12-31 16:39:58 1 345.0,298.0,0.75652173913,0.0173913043478,0.0869565217391,0.150724637681,0.0753623188406,57.0,53.0,10.0,4.78947368421,15.0,0.315789473684,1.0,1.0,345.0,57.0,35.5778947368,0.073,0.08,0.1748,0.3448275862068966,0.05172413793103448,0.034482758620689655,0.0,0.034482758620689655,0.0,0.0,0.0,0.017241379310344827,0.05172413793103448,0.10344827586206896,0.05172413793103448,0.0,0.10344827586206896,0.0,0.034482758620689655,0.034482758620689655,0.06896551724137931,0.017241379310344827,0.034482758620689655,0.0,0.0,0.10344827586206896,0.0,0.0,0.0,0.05172413793103448,0.017241379310344827,0.034482758620689655,0.0,0.0,0.017241379310344827,0.1896551724137931,0.034482758620689655,0.0,0.034482758620689655,0.034482758620689655,0.0,0.0,0.06896551724137931,0.05172413793103448,0.034482758620689655,0.034482758620689655,0.0,0.0,0.017241379310344827,0.017241379310344827,0.0,0.0,0.0,0.06896551724137931,0.017241379310344827,0.05172413793103448,0.0,0.05172413793103448,0.06896551724137931,0.034482758620689655,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
theredlion soccer 1u4qkd 2013-12-31 18:18:37 -1 101.0,98.0,0.742574257426,0.019801980198,0.049504950495,0.0594059405941,0.178217821782,14.0,14.0,2.0,5.71428571429,1.0,0.0714285714286,2.0,0.0,49.5,7.0,16.0492857143,0.472,0.0,0.5538,0.06666666666666667,0.06666666666666667,0.06666666666666667,0.06666666666666667,0.0,0.0,0.0,0.0,0.0,0.0,0.06666666666666667,0.0,0.0,0.06666666666666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.13333333333333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06666666666666667,0.0,0.0,0.06666666666666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06666666666666667,0.06666666666666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06666666666666667,0.0,0.06666666666666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
inlandempire bikela 1u4qlzs 2014-01-01 14:54:35 1 85.0,85.0,0.752941176471,0.0235294117647,0.0823529411765,0.0117647058824,0.211764705882,10.0,10.0,2.0,7.2,0.0,0.0,1.0,0.0,85.0,10.0,23.605,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09090909090909091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09090909090909091,0.09090909090909091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09090909090909091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
where
- SOURCE_SUBREDDIT: the subreddit where the link originates
- TARGET_SUBREDDIT: the subreddit where the link ends
- POST_ID: the post in the source subreddit that starts the link
- TIMESTAMP: time time of the post
- POST_LABEL: label indicating if the source post is explicitly negative towards the target post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.
- POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. The vector elements are the following:
1. Number of characters
2. Number of characters without counting white space
3. Fraction of alphabetical characters
4. Fraction of digits
5. Fraction of uppercase characters
6. Fraction of white spaces
7. Fraction of special characters, such as comma, exclamation mark, etc.
8. Number of words
9. Number of unique works
10. Number of long words (at least 6 characters)
11. Average word length
12. Number of unique stopwords
13. Fraction of stopwords
14. Number of sentences
15. Number of long sentences (at least 10 words)
16. Average number of characters per sentence
17. Average number of words per sentence
18. Automated readability index
19. Positive sentiment calculated by VADER
20. Negative sentiment calculated by VADER
21. Compound sentiment calculated by VADER
22. LIWC_Funct
23. LIWC_Pronoun
24. LIWC_Ppron
25. LIWC_I
26. LIWC_We
27. LIWC_You
28. LIWC_SheHe
29. LIWC_They
30. LIWC_Ipron
31. LIWC_Article
32. LIWC_Verbs
33. LIWC_AuxVb
34. LIWC_Past
35. LIWC_Present
36. LIWC_Future
37. LIWC_Adverbs
38. LIWC_Prep
39. LIWC_Conj
40. LIWC_Negate
41. LIWC_Quant
42. LIWC_Numbers
43. LIWC_Swear
44. LIWC_Social
45. LIWC_Family
46. LIWC_Friends
47. LIWC_Humans
48. LIWC_Affect
49. LIWC_Posemo
50. LIWC_Negemo
51. LIWC_Anx
52. LIWC_Anger
53. LIWC_Sad
54. LIWC_CogMech
55. LIWC_Insight
56. LIWC_Cause
57. LIWC_Discrep
58. LIWC_Tentat
59. LIWC_Certain
60. LIWC_Inhib
61. LIWC_Incl
62. LIWC_Excl
63. LIWC_Percept
64. LIWC_See
65. LIWC_Hear
66. LIWC_Feel
67. LIWC_Bio
68. LIWC_Body
69. LIWC_Health
70. LIWC_Sexual
71. LIWC_Ingest
72. LIWC_Relativ
73. LIWC_Motion
74. LIWC_Space
75. LIWC_Time
76. LIWC_Work
77. LIWC_Achiev
78. LIWC_Leisure
79. LIWC_Home
80. LIWC_Money
81. LIWC_Relig
82. LIWC_Death
83. LIWC_Assent
84. LIWC_Dissent
85. LIWC_Nonflu
86. LIWC_Filler