When dealing with these datasets please be careful and responsible. The datasets are meant to be used strictly for the purposes of the class project and nothing else. This means: (1) Do not do anything ''funny'' with the dataset; (2) Do not try to break the anonymization; (3) Do not share that data outside the class; (4) do not copy the data off the Amazon EC2; (4) After the class is over destroy all data.
Litepoint makes test equipment for the wireless industry. The dataset is real but has been obfuscated and cleansed a bit. There are tests from more than 300,000 devices over a period of 1 month, each device has almost 900 tests each. So you'll find more than a quarter billion numbers in the csv file. Numbers that can be visualized in many ways. A quick intro to test data; some tests will have limits, either upper, lower, or both. The test result has to be within these limits to pass the test, if there are no limits the test is passed by default. The percentage of devices that pass all tests is called the yield and the station is the physical location where the device actually gets tested.
So what can be done with the data? We typically calculate yield over station and over time, we also create fail paretos, again we also typically do it over stations and time. We also make simple graphs and statistics over each test. This could for instance be histograms, line charts over time, scatter plots for comparison, etc. Our intent with this challenge is purely exploratory, meaning that you're free to explore as you please! Submit a video of you describing your analysis and why it’s useful. Also include a PDF file which shows the way you interpreted the data (images/graphs/tables).
Let us know if you need more info on these datasets. We will upload the datasets to EC2.