LiveJournal social network and ground-truth communities

Dataset information

LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the LiveJournal friendship social network and ground-truth communities.

We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

Dataset statistics
Nodes 3997962
Edges 34681189
Nodes in largest WCC 3997962 (1.000)
Edges in largest WCC 34681189 (1.000)
Nodes in largest SCC 3997962 (1.000)
Edges in largest SCC 34681189 (1.000)
Average clustering coefficient 0.2843
Number of triangles 177820130
Fraction of closed triangles 0.04559
Diameter (longest shortest path) 17
90-percentile effective diameter 6.5

Source (citation)


File Description
com-lj.ungraph.txt.gz Undirected LiveJournal network
com-lj.all.cmty.txt.gz LiveJournal communities
com-lj.top5000.cmty.txt.gz LiveJournal communities (Top 5,000)