Project in Mining Massive Data Sets
Spring 2012


When dealing with these datasets please be careful and responsible. The datasets are meant to be used strictly for the purposes of the class project and nothing else. This means: (1) Do not do anything ''funny'' with the dataset; (2) Do not try to break the anonymization; (3) Do not share that data outside the class; (4) do not copy the data off the Amazon EC2; (4) After the class is over destroy all data.

Datasets ''in progress''

These datasets are currently under preparation but you can find rough desciptions at http://bit.ly/CS341DATA.

Stanford CS341 only datasets

Let us know if you need more info on these datasets. We will upload the datasets to EC2.

Other datasets