Open positions
Open research positions in SNAP group are available at undergraduate, graduate and postdoctoral levels.

Reddit Threads

Dataset information

Discussion and non-discussion based threads from Reddit which we collected in May 2018. Nodes are Reddit users who participate in a discussion and links are replies between them. The task is to predict whether a thread is discussion based or not (binary classification).

Properties
Number of graphs: 203,088
Directed: No.
Node features: No.
Edge features: No.
Graph labels: Yes. Binary-labeled.
Temporal: No.
StatsMinMax
Nodes 1197
Density 0.0210.382
Diameter 227

Possible tasks
Graph classification

Paper: https://arxiv.org/abs/2003.04819
Github Page: https://github.com/benedekrozemberczki/karateclub

Source (citation)

  • B. Rozemberczki, O. Kiss, R. Sarkar: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs 2019.
  •   @inproceedings{karateclub,
        title = {{Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs}},
        author = {Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
        year = {2020},
        pages = {3125–3132},
        booktitle = {Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)},
        organization = {ACM},
      }
    

    Files

    File Description
    reddit_threads.zipReddit Threads dataset