CS246
Mining Massive Data Sets
Winter 2013
Handouts
Homeworks
All problem sets should include the cover sheet (PDF, LATEX). We will take off 2 points for failing to include the cover sheet.
Lecture notes
- 01/08: Introduction
Slides: Introduction and MapReduce
Reading: Ch1: Data Mining and Ch2: Large-Scale File Systems and Map-Reduce
- 01/10: Frequent itemsets and Association rules
Slides: Association Rules
Reading: Ch6: Frequent itemsets
- 01/15: Locality Sensitive Hashing
Slides: Finding Similar Items: Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/17: Theory of Locality Sensitive Hashing
Slides: Theory of Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/22: Clustering
Slides: Clustering
Reading: Ch7: Clustering
- 01/24: Dimensionality Reduction: SVD and CUR
Slides: Dimensionality Reduction: SVD and CUR
Reading: Ch11: Dimensionality Reduction
- 01/29: Recommender Systems 1
Slides: Recommender systems: Content-based and Collaborative filtering
Reading: Ch9: Recommendation systems and the The Long Tail in Wired.
- 01/31: Recommender Systems 2
Slides: Recommender systems: Latent Factor Models
Reading: Ch9: Recommendation systems
- 02/05: Link Analysis: PageRank
Slides: PageRank
Reading: Ch5: Link Analysis
- 02/07: Link Analysis: Web spam and TrustRank, Random Walks with Restarts
Slides: TrustRank and Web spam
Reading: Ch5: Link Analysis
- 02/12: Analysis of Massive Graphs
Slides: Discovering Clusters in Networks
Reading: Ch10: Analysis of Social Networks
- 02/14: Large-Scale Machine Learning: k-Nearest Neighbors, Perceptron
Slides: Trawling and Perceptron
Reading: Ch12: Large-Scale Machine Learning
- 02/19: Large-Scale Machine Learning: Support Vector Machines
Slides: Support Vector Machines
Reading: Ch12: Large-Scale Machine Learning
- 02/21: Large-Scale Machine Learning: Decision Trees
Slides: Decision Trees on MapReduce
Reading: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce by Panda, Herbach, Basu, Bayardo. VLDB 2009.
- 02/16: Mining Data Streams 1
Slides: Mining Data Streams (Part 1)
Reading: Ch4: Mining data streams
- 02/28: Mining Data Streams 2
Slides: Mining Data Streams (Part 2)
Reading: Ch4: Mining data streams
- 03/05: Web Advertising
Slides: Advertising on the Web
Reading: Ch8: Advertising on the Web
- 03/07: Learning through Experimentation
Slides: Multiarmed bandits
Reading: A Contextual-Bandit Approach to Personalized News Article Recommendation by Li, Chu, Langford, Schapier. WWW 2010.
- 03/12: Optimizing Submodular Functions
Slides: Submodular functions
Reading: Turning Down the Noise in the Blogosphere by El-Arini, Veda, Shahaf, Guestrin. KDD 2009.
Cost-effective Outbreak Detection in Networks by Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance. KDD 2007.
- 03/14: Review
Slides: Review
All readings have been derived from the Mining Massive Datasets by Anand Rajaraman and Jeff Ullman.
Sections
-
01/13: Probability Theory Rewiew session
Slides: [pdf]
-
01/18: Linear Algebra Review Session
Slides: [pdf]