Open positions
We have filled all the positions for this quarter. More info.

Orkut social network and ground-truth communities

Dataset information

Orkut is a free on-line social network where users form friendship each other. Orkut also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the Orkut friendship social network and ground-truth communities. This data is provided by Alan Mislove et al.

We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

Dataset statistics
Nodes 3072441
Edges 117185083
Nodes in largest WCC 3072441 (1.000)
Edges in largest WCC 117185083 (1.000)
Nodes in largest SCC 3072441 (1.000)
Edges in largest SCC 117185083 (1.000)
Average clustering coefficient 0.1666
Number of triangles 627584181
Fraction of closed triangles 0.01414
Diameter (longest shortest path) 9
90-percentile effective diameter 4.8

Source (citation)


File Description
com-orkut.ungraph.txt.gz Undirected Orkut network
com-orkut.all.cmty.txt.gz Orkut communities
com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)