CS246
Mining Massive Data Sets
Winter 2014
Handouts
Homeworks
All problem sets should include the cover sheet (PDF, LATEX). We will take off 2 points for failing to include the cover sheet.
Lecture notes
- 01/07: Introduction
Slides: Introduction and MapReduce
Reading: Ch1: Data Mining and Ch2: Large-Scale File Systems and Map-Reduce
- 01/09: Frequent itemsets and Association rules
Slides: Association Rules
Reading: Ch6: Frequent itemsets
- 01/14: Locality Sensitive Hashing
Slides: Finding Similar Items: Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/16: Theory of Locality Sensitive Hashing
Slides: Theory of Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/21: Clustering
Slides: Clustering
Reading: Ch7: Clustering
- 01/23: Dimensionality Reduction: SVD and CUR
Slides: Dimensionality Reduction: SVD and CUR
Reading: Ch11: Dimensionality Reduction
- 01/28: Recommender Systems 1
Slides: Recommender systems: Content-based and Collaborative filtering
Reading: Ch9: Recommendation systems and the The Long Tail in Wired.
- 01/30: Recommender Systems 2
Slides: Recommender systems: Latent Factor Models
Reading: Ch9: Recommendation systems
- 02/04: Link Analysis: PageRank
Slides: PageRank
Reading: Ch5: Link Analysis
- 02/06: Link Analysis: Web spam and TrustRank, Random Walks with Restarts
Slides: TrustRank and Web spam
Reading: Ch5: Link Analysis
- 02/11: Analysis of Massive Graphs
Slides: Clustering Networks
Reading: Ch10: Analysis of Social Networks
Using PageRank to Locally Partition a Graph by Andersen, Chung, Lang. FOCS 2006.
- 02/13: Analysis of Massive Graphs
Slides: Detecting overlapping communities
Reading: Ch10: Analysis of Social Networks
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by Yang, Leskovec. WSDM 2013.
- 02/18: Large-Scale Machine Learning: Support Vector Machines
Slides: Support Vector Machines
Reading: Ch12: Large-Scale Machine Learning
- 02/20: Large-Scale Machine Learning: Decision Trees
Slides: Decision Trees on MapReduce
Reading: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce by Panda, Herbach, Basu, Bayardo. VLDB 2009.
- 02/25: Mining Data Streams 1
Slides: Mining Data Streams (Part 1)
Reading: Ch4: Mining data streams
- 02/27: Mining Data Streams 2
Slides: Mining Data Streams (Part 2)
Reading: Ch4: Mining data streams
- 03/04: Web Advertising
Slides: Advertising on the Web
Reading: Ch8: Advertising on the Web
- 03/06: Learning through Experimentation
Slides: Multiarmed bandits
Reading: A Contextual-Bandit Approach to Personalized News Article Recommendation by Li, Chu, Langford, Schapier. WWW 2010.
- 03/11: Optimizing Submodular Functions
Slides: Submodular functions
Reading: Turning Down the Noise in the Blogosphere by El-Arini, Veda, Shahaf, Guestrin. KDD 2009.
- 03/13: Review
Slides: Review
All readings have been derived from the Mining Massive Datasets by Anand Rajaraman and Jeff Ullman.
Sections
-
01/10: Probability Theory Review
Slides: [pdf]
-
01/10: Proof Techniques Review
Slides: [pdf]
-
01/13: Linear Algebra Review Session
Slides: [pdf] [pptx]
Previous Finals
Gradiance