Reddit User and Subreddit Embeddings
Dataset information
This dataset contains two files: user embeddings and subreddit embeddings on Reddit. The user and subreddit embeddings represent a vector representation of each user and each subreddit. (A subreddit is a community on Reddit.) The data is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. The vectors are generated from the user-to-subreddit posting network using a word2vec-style objective function. Please see the reference paper below for details on how the vectors are generated.
User embeddings: This file generates one numerical vector in low dimensional space (a.k.a. embeddings) for each user. The embeddings are 300 dimensions each. Two user embeddings are similar if they who post in similar subreddits.
Subreddit embeddings: This file generates one numerical vector in low dimensional space (a.k.a. embeddings) for each subreddit. The embeddings are 300 dimensions each. Two subreddit embeddings are similar if the users who post in them are similar.
Project website: These files have been generated as part of the research project on how subreddits attack one another. The details of the project can be found here.
Other related datasets: We have also released two other datasets that are closely related:
- Reddit Hyperlink Network: the subreddit hyperlink dataset contains the links between two subreddits.
- Reddit Posting Network: this network contains the network of who-posts-where. This network is used to create the user and subreddit embeddings available on this web page.
Dataset statistics |
Number of users | 118,381 |
Number of subreddits | 51,278 |
Embedding length | 300 |
Timespan of data | Jan 2014 - April 2017 |
Source (citation)
The following BibTeX citation can be used:
@inproceedings{kumar2019predicting,
title={Predicting dynamic embedding trajectory in temporal interaction networks},
author={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},
booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
pages={1269--1278},
year={2019},
organization={ACM}
}
@inproceedings{kumar2018community,
title={Community interaction and conflict on the web},
author={Kumar, Srijan and Hamilton, William L and Leskovec, Jure and Jurafsky, Dan},
booktitle={Proceedings of the 2018 World Wide Web Conference on World Wide Web},
pages={933--943},
year={2018},
organization={International World Wide Web Conferences Steering Committee}
}
Files
Data format
The data file is in comma separated format.
USER_ID,VECTOR
rotoreuters,-0.224305,0.034301,-0.082651,0.004676,0.00696,0.892179,-0.309423,0.570185,0.49211,0.667661,0.379927,-0.701833,0.494844,-0.112651,-0.499859,-0.03113,-0.17902,-0.307026,0.804202,-0.126007,0.298278,0.699318,-0.122089,-0.147698,0.347853,-0.171306,-0.324271,-0.599804,0.423248,-0.56949,-0.824675,-0.568197,-0.515359,-0.281378,-0.631208,0.31375,0.43415,0.314626,0.219685,0.177992,0.476424,-0.303418,-0.40719,-0.099023,0.12914,0.437157,0.19942,-0.400879,-0.83451,-0.399204,-0.735938,0.633666,-0.195332,0.006758,-1.091519,0.41688,0.055319,0.40614,-0.087184,0.721328,0.585882,-0.441858,0.011837,-0.358463,-0.323385,0.573054,-0.008566,-0.110555,-0.111838,-0.628141,-0.37604,0.539726,0.022366,0.479097,0.043697,0.132671,0.765249,0.700398,0.493926,0.241689,0.128558,0.253161,0.082354,0.247792,0.15935,-0.504183,0.283101,0.11646,0.109912,0.016254,-0.635325,0.083934,0.400957,0.33653,0.080672,-0.712021,0.02349,-0.499163,0.142773,-0.779104,-0.167535,-0.282673,0.417065,-0.253296,-0.041676,-0.220045,0.491036,-0.031163,0.355421,-0.912913,-0.132866,0.15732,-0.062805,-0.160181,0.041099,-0.245248,-0.054643,0.322623,-0.548176,0.166469,-0.057362,-0.230725,-0.419439,0.18926,-0.664602,-0.567163,-0.546665,-0.244773,-0.021004,-0.403838,-0.029683,-0.02533,0.350426,0.07477,0.065412,0.241725,-0.336525,-0.901883,0.534846,0.030413,-0.63059,-0.361515,-0.630254,-0.002442,-0.144353,0.318511,0.998885,-0.993112,0.701324,-0.352901,-0.257294,0.388479,0.109291,-0.57535,-0.510159,-0.638403,-0.549713,-0.415056,0.247532,0.066906,-0.676021,-0.39411,0.599426,0.896202,0.476426,0.496846,0.5276,-0.144111,-0.240765,0.49653,0.408169,0.165807,-0.210979,0.326131,0.538052,-0.368556,-0.378118,-0.221417,0.038478,-0.326394,-0.623636,-0.045483,-0.35498,0.024394,-0.134996,0.248642,0.708362,0.768013,-0.269403,-0.586033,-0.551153,-0.038667,-0.288946,0.030872,-0.229663,0.43991,-0.58382,-0.764331,0.49603,0.02332,-0.018123,-0.785993,0.336409,0.329915,0.019162,-0.156693,-0.046217,0.341809,0.216982,0.361256,0.765107,0.09945,0.566142,-0.380906,0.073389,-0.833633,-0.444517,-0.529169,-0.350931,-0.112044,0.032254,-0.314222,-0.670453,-0.003535,0.757898,-0.547555,0.356095,-0.237955,-0.169256,0.361111,-0.695178,0.128437,-1.013242,-0.038218,0.192656,-0.044316,0.413002,-0.112519,0.438106,-0.163539,-0.288049,1.116224,0.125394,0.456745,0.619035,-0.194935,0.393341,0.931975,0.101569,-0.384092,0.225502,-0.29988,-0.682437,0.208696,-0.343127,-0.132798,-0.565871,0.261739,-0.560174,-0.000564,0.299804,-0.120867,0.849765,-0.337365,-0.418125,-0.084188,-0.248032,0.35677,0.028407,-0.21356,0.06294,-0.188042,0.431441,-0.472865,0.222936,0.076625,0.285511,0.222161,0.284596,-0.158964,0.182507,0.711164,0.423767,-0.486449,0.403645,-0.716357,-0.359746,0.063134,0.646768,-0.287045,-0.380348,-0.14416,-0.289317,0.471727,-0.174092,0.534364,0.218821,0.269216,-0.412621,-0.469088
fiplefip,-0.306765,0.259314,-0.950335,0.560013,-0.364981,0.073359,-0.256642,-0.348088,-0.030323,-0.284338,0.377343,-0.358473,0.559322,0.062051,0.099554,0.46136,-0.273855,-0.274918,0.725871,-0.230823,-0.436114,0.186223,-0.004017,0.297142,-0.066631,0.16217,-0.364509,0.229731,-0.151828,0.22865,0.171403,-0.334804,-0.408777,-0.165566,0.274575,-0.265074,0.429774,-0.217675,0.195341,-0.343059,-0.232225,0.013402,-1.047794,-0.202717,0.275221,0.022242,0.409946,0.062818,0.061196,-0.250688,-0.633233,0.872642,-0.409842,0.481186,-0.799223,0.050954,-0.54631,0.381634,0.052297,0.402593,-0.433905,-0.739268,0.238811,0.229997,0.274061,-0.264081,0.040341,0.33944,-0.060263,-0.42728,-0.18308,0.076269,-0.233166,-0.049634,0.091474,-0.04185,-0.659204,0.075952,-0.137962,0.525735,-0.363403,-0.270721,-0.286186,0.75313,-0.251231,-0.05558,0.133998,-0.922978,-0.681682,0.379896,-0.114465,-0.403521,0.572923,0.437024,0.191971,-0.145903,0.161456,-0.463453,-0.683026,0.161966,-0.38077,-0.64148,0.344847,-0.537787,-0.515634,0.291856,1.349782,0.622313,0.377038,-0.213636,0.413977,0.6242,0.104531,-0.581911,-0.276961,-0.101371,0.624383,0.504247,0.561515,0.117927,0.614386,0.839709,0.20462,-0.480569,-0.068113,-0.11683,-1.055569,-0.629379,-0.158954,0.287536,0.780041,0.63561,-0.010422,0.192075,0.167964,-0.443677,-0.045857,-0.497096,0.202096,0.280315,-0.439252,0.113552,-0.177334,0.02243,-0.612739,-0.357007,1.117971,-0.476095,-0.036811,-0.524293,-0.441786,-0.076792,0.496846,-0.045843,0.284069,0.137884,0.029353,0.16189,-0.007264,-0.399116,-0.363432,0.110552,-0.224124,-0.134213,0.023973,-0.017608,-0.495291,0.71312,-0.507308,0.123047,-0.084328,0.531125,-0.105674,0.537314,0.325989,0.22315,-0.562248,-0.337955,-0.212933,-0.042747,-0.445113,-0.017054,-0.09796,0.11615,-0.295329,-0.008426,-0.12945,-0.557655,0.267662,-0.038402,-0.643162,0.027822,0.275612,0.260406,-0.157455,-0.505367,0.119877,-0.422747,-0.542611,-0.03355,0.768213,0.08271,0.752942,0.043498,-0.159485,-0.598905,0.219367,0.072352,-0.390551,-0.361184,0.074472,0.103684,0.892971,0.168095,0.359259,0.151044,-0.794605,-0.623525,-0.003719,-0.200213,0.867815,-0.881989,-0.765666,0.24489,0.198795,-0.012775,0.492104,0.911354,-0.553588,0.42052,-0.407224,-0.646224,-0.34462,-0.037624,-0.36168,0.669453,0.435865,0.409443,0.472808,0.175568,0.17398,0.295725,0.133073,-0.141865,0.166259,-0.305745,-0.306207,0.11455,-0.212828,-0.12571,0.241662,0.008276,0.435264,0.50296,-0.220416,0.385967,-0.198875,0.250335,-0.337965,0.026384,0.854753,0.323354,0.050374,-0.007571,-0.811709,0.228903,-0.525756,0.513215,-0.197298,-0.706438,-0.42467,0.410469,0.16759,0.003496,0.130472,-0.59432,-0.076453,0.671613,-0.53084,0.171624,-0.14149,0.264164,-0.338885,-0.125357,-0.206496,0.660746,-0.327274,0.188642,0.439133,-0.158999,-0.432182,-0.769793,-0.434484,0.268733,-0.163076,-0.455654,0.41656,-0.219805,-0.568944,-0.477788
where
- USER_ID: the user id of a Reddit user
- VECTOR: comma separated vector of 300 numbers