CS246
Mining Massive Data Sets
Winter 2016
Handouts
Sample Final Exams
Assignments
Gradiance (no late periods allowed):
- GHW 1: Due on 1/14 at 11:59pm.
- GHW 2: Due on 1/21 at 11:59pm.
- GHW 3: Due on 1/28 at 11:59pm.
- GHW 4: Due on 2/04 at 11:59pm.
- GHW 5: Due on 2/11 at 11:59pm.
- GHW 6: Due on 2/18 at 11:59pm.
- GHW 7: Due on 2/25 at 11:59pm.
- GHW 8: Due on 3/03 at 11:59pm.
- GHW 9: Due on 3/10 at 11:59pm.
Homeworks (1 late period allowed):
- HW0 (Hadoop tutorial) to help you set up Hadoop: Due on 1/12 at 11:59pm. Solutions: [PDF][Code].
- HW1: Due on 1/21 at 11:59pm. Submission Templates: [pdf | tex | docx] Solutions: [PDF][Code].
- HW2: Due on 2/04 at 11:59pm. Submission Templates: [pdf | tex | docx] Solutions: [PDF][Code].
- HW3: Due on 2/18 at 11:59pm. Submission Templates: [pdf | tex | docx] Solutions: [PDF][Code].
- HW4: Due on 3/03 at 11:59pm. Submission Templates: [pdf | tex | docx] Solutions: [PDF][Code].
Lecture notes
- 01/05: Introduction; MapReduce
Slides: [pptx], [pdf]
Reading: Ch1: Data Mining (Sect. 2.1-2.4) and Ch2: Large-Scale File Systems and Map-Reduce
- 01/07: Frequent Itemsets Mining
Slides: [pptx], [pdf]
Reading: Ch6: Frequent itemsets
- 01/12: Locality-Sensitive Hashing I
Slides: [pptx], [pdf]
Reading: Ch3: Finding Similar Items (Sect. 3.1-3.4)
- 01/14: Locality-Sensitive Hashing II
Slides: [pptx], [pdf]
Reading: Ch3: Finding Similar Items (Sect. 3.5-3.8)
- 01/19: Clustering
Slides: [pptx], [pdf]
Reading: Ch7: Clustering (Sect. 7.1-7.4)
- 01/21: Dimensionality Reduction
Slides: [pptx], [pdf]
Reading: Ch11: Dimensionality Reduction (Sect. 9.4)
- 01/26: Recommender Systems I
Slides: [pdf]
Reading: Ch9: Recommendation systems
- 01/28: Recommender Systems II
Slides: [pdf]
Reading: Ch9: Recommendation systems
- 02/02: Link Analysis I: PageRank
Slides: [pptx], [pdf]
Reading: Ch5: Link Analysis (Sect. 5.1-5.2)
- 02/04: Link Analysis II: Link Spam, HITS
Slides: [pptx], [pdf]
Reading: Ch5: Link Analysis (Sect. 5.3-5.5)
- 02/09: Analysis of Massive Graphs I
Slides: [pdf]
Reading: Ch10: Analysis of Social Networks
- 02/11: Analysis of Massive Graphs II
Slides: [pptx], [pdf]
Reading: Ch10: Analysis of Social Networks
- 02/16: Large-Scale Machine Learning I
Slides: [pptx], [pdf]
Reading: Ch12: Large-Scale Machine Learning
- 02/18: Large-Scale Machine Learning II
Slides: [pptx], [pdf]
Reading: Ch12: Large-Scale Machine Learning
- 02/23: Mining Data Streams I
Slides: [pptx], [pdf]
Reading: Ch4: Mining data streams (Sect. 4.1-4.3)
- 02/25: Mining Data Streams II
Slides: [pptx], [pdf]
Reading: Ch4: Mining data streams (Sect. 4.4-4.7)
- 03/01: Computational Advertising
Slides: [pptx], [pdf]
Reading: Ch8: Advertising on the Web
- 03/03: Computational Advertising, comparison between MapReduce-like systems and bulk-synchronous systems
Slides: [pptx], [pdf]
- 03/08: [Himabindu Lakkaraju, Tim Althoff]: Submodular Optimization
Slides: [pptx], [pdf]
- 03/10: [Caroline Suen]: Multi-arm Bandits, [Jeff Ullman]: Design of good MapReduce Algorithms
MapReduce Algorithms slides: [pptx], [pdf]
Multi-arm Bandits slides: [pptx], [pdf]
(Tentative) List of future lectures and readings
All readings have been derived from the Mining Massive Datasets by J. Leskovec, A. Rajaraman and J. Ullman.
Gradiance
Recitation sessions documents