Open positions
Open research positions in SNAP group are available here.

High-energy physics citation network

Dataset information

Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.

The data was originally released as a part of 2003 KDD Cup.

Dataset statistics
Nodes 34546
Edges 421578
Nodes in largest WCC 34401 (0.996)
Edges in largest WCC 421485 (1.000)
Nodes in largest SCC 12711 (0.368)
Edges in largest SCC 139981 (0.332)
Average clustering coefficient 0.2848
Number of triangles 1276868
Fraction of closed triangles 0.05377
Diameter (longest shortest path) 12
90-percentile effective diameter 5

Source (citation)


File Description
cit-HepPh.txt.gz Paper citation network of Arxiv High Energy Physics category
cit-HepPh-dates.txt.gz Time of nodes (paper submission time to Arxiv)