Open positions
Our group has several open positions for the Autumn Quarter 2014-15. More info here.

Enron email network

Dataset information

Enron email communication network covers all the email communication within a dataset of around half million emails. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i sent at least one email to address j, the graph contains an undirected edge from i to j. Note that non-Enron email addresses act as sinks and sources in the network as we only observe their communication with the Enron email addresses.

The Enron email data was originally released by William Cohen at CMU.

Dataset statistics
Nodes 36692
Edges 183831
Nodes in largest WCC 33696 (0.918)
Edges in largest WCC 180811 (0.984)
Nodes in largest SCC 33696 (0.918)
Edges in largest SCC 180811 (0.984)
Average clustering coefficient 0.4970
Number of triangles 727044
Fraction of closed triangles 0.03015
Diameter (longest shortest path) 11
90-percentile effective diameter 4.8

Source (citation)


File Description
email-Enron.txt.gz Enron email network
Enron email data Complete Enron email dataset (includes full email message text and attachments)