Higgs Twitter Dataset

Dataset information

The Higgs dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered.

The four directional networks made available here have been extracted from user activities in Twitter as

  1. re-tweeting (retweet network)
  2. replying (reply network) to existing tweets
  3. mentioning (mention network) other users
  4. friends/followers social relationships among user involved in the above activities
It is worth remarking that the user IDs have been anonimized, and the same user ID is used for all networks. This choice allows to use the Higgs dataset in studies about large-scale interdependent/interconnected multiplex/multilayer networks, where one layer accounts for the social structure and three layers encode different types of user dynamics .

For more information about data collection, please refer to our paper.

Dataset statistics are calculated for the graph with the highest number of nodes and edges:

Social Network statistics
Nodes 456631
Edges 14855875
Nodes in largest WCC 456293 (0.999)
Edges in largest WCC 14855497 (0.999)
Nodes in largest SCC 360213 (0.789)
Edges in largest SCC 14102621 (0.949)
Average clustering coefficient 0.1887
Number of triangles 83023455
Fraction of closed triangles 0.0029
Diameter (longest shortest path) 10
90-percentile effective diameter 4.7
Retweet Network statistics
Nodes 425008
Edges 733647
Nodes in largest WCC 424075 (0.998)
Edges in largest WCC 733096 (0.999)
Nodes in largest SCC 13086 (0.031)
Edges in largest SCC 63537 (0.087)
Average clustering coefficient 0.0234
Number of triangles 111100
Fraction of closed triangles 0.0001
Diameter (longest shortest path) 11
90-percentile effective diameter 5.9
Reply Network statistics
Nodes 37366
Edges 30836
Nodes in largest WCC 12092 (0.324)
Edges in largest WCC 13922 (0.451)
Nodes in largest SCC 263 (0.007)
Edges in largest SCC 575 (0.019)
Average clustering coefficient 0.0051
Number of triangles 198
Fraction of closed triangles 0.0001
Diameter (longest shortest path) 19
90-percentile effective diameter 11.1
Mention Network statistics
Nodes 302975
Edges 449827
Nodes in largest WCC 270245 (0.892)
Edges in largest WCC 427573 (0.950)
Nodes in largest SCC 4786 (0.016)
Edges in largest SCC 20706 (0.046)
Average clustering coefficient 0.0944
Number of triangles 88469
Fraction of closed triangles 0.0002
Diameter (longest shortest path) 11
90-percentile effective diameter 7.2

Source (citation)


Files

File Description
social_network.edgelist.gz Friends/follower graph (directed)
retweet_network.edgelist.gz Graph of who retweets whom (directed and weighted)
reply_network.edgelist.gz Graph of who replies to who (directed and weighted)
mention_network.edgelist.gz Graph of who mentions whom (directed and weighted)