CS246
Mining Massive Data Sets
Winter 2015
Handouts
Assignments
- Course information handout
- Hadoop tutorial will help you set up Hadoop and get you started. Due on 01/13 at 5:00 pm.
- Homework 1: Out on 1/8. Due on 1/22 at 5:00 PM (max 1 late period allowed). (Solutions) (Code)
- Homework 2: Out on 1/22; Due on 2/5 at 5:00 PM (max 1 late period allowed). (Solutions) (Code)
- Homework 3: Out on 2/5; Due on 2/19 at 5:00 PM (max 1 late period allowed). (Solutions) (Code)
- Homework 4: Out on 2/19; Due on 3/5 at 5:00 PM (max 1 late period allowed). (Solutions) (Code)
Lecture notes
- 01/06: Introduction
Slides: Introduction and MapReduce
Reading: Ch1: Data Mining and Ch2: Large-Scale File Systems and Map-Reduce
- 01/08: Frequent itemsets and Association rules
Slides: Association Rules
Reading: Ch6: Frequent itemsets
- 01/13: Locality Sensitive Hashing
Slides: Finding Similar Items: Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/15: Theory of Locality Sensitive Hashing
Slides: Theory of Locality Sensitive Hashing
Reading: Ch3: Finding Similar Items
- 01/20: Clustering
Slides: Clustering
Reading: Ch7: Clustering
- 01/22: Dimensionality Reduction: SVD and CUR
Slides: Dimensionality Reduction: SVD and CUR
Reading: Ch11: Dimensionality Reduction
- 01/27: Recommender Systems 1
Slides: Recommender systems: Content-based and Collaborative filtering
Reading: Ch9: Recommendation systems and the The Long Tail in Wired.
- 01/29: Recommender Systems 2
Slides: Recommender systems: Latent Factor Models
Reading: Ch9: Recommendation systems
- 02/03: Link Analysis: PageRank
Slides: PageRank
Reading: Ch5: Link Analysis
- 02/05: Link Analysis: Web spam and TrustRank, Random Walks with Restarts
Slides: TrustRank and Web spam
Reading: Ch5: Link Analysis
- 02/10: Analysis of Massive Graphs
Slides: Clustering Networks
Reading: Ch10: Analysis of Social Networks
Using PageRank to Locally Partition a Graph by Andersen, Chung, Lang. FOCS 2006.
- 02/12: Analysis of Massive Graphs
Slides: Detecting overlapping communities
Reading: Ch10: Analysis of Social Networks
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by Yang, Leskovec. WSDM 2013.
- 02/17: Large-Scale Machine Learning: Support Vector Machines
Slides: Support Vector Machines
Reading: Ch12: Large-Scale Machine Learning
- 02/19: Large-Scale Machine Learning: Decision Trees
Slides: Decision Trees on MapReduce
Reading: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce by Panda, Herbach, Basu, Bayardo. VLDB 2009.
- 02/24: Mining Data Streams 1
Slides: Mining Data Streams (Part 1)
Reading: Ch4: Mining data streams
- 02/26: Mining Data Streams 2
Slides: Mining Data Streams (Part 2)
Reading: Ch4: Mining data streams
- 03/03: Web Advertising
Slides: Advertising on the Web
Reading: Ch8: Advertising on the Web
- 03/05: Learning through Experimentation
Slides: Multiarmed bandits
Reading: A Contextual-Bandit Approach to Personalized News Article Recommendation by Li, Chu, Langford, Schapier. WWW 2010.
- 03/10: Optimizing Submodular Functions
Slides: Submodular functions
Reading: Turning Down the Noise in the Blogosphere by El-Arini, Veda, Shahaf, Guestrin. KDD 2009.
- 03/12: Review
Slides: Review
(Tentative) List of future lectures and readings
All readings have been derived from the Mining Massive Datasets by J. Leskovec, A. Rajaraman and J. Ullman.
Previous Finals
Gradiance
Recitation sessions documents