CS345A:
Data Mining
Winter 2010
Handouts:
1/5: Introduction
1/7: MapReduce
1/12: Recommendation System
1/14: Near Neighbor Search in High Dimensional Data
1/19: Locality Sensitive Hashing (LSH)
1/21: Structure of the webgraph, PageRank and Project ideas
- Project ideas and available datasets [slides]
- Structure of the webgraph and PageRank [slides]
1/22: Section on Map-Reduce infrastructure
1/26: Link Analysis
1/28: HITS and web spam
2/2: Web spam
2/4: Proximity on Graphs
- Random Walks with Restarts and Center piece subgraphs [slides]
- Readings:
2/9: Dimensionality reduction
- SVD and CUR [slides]
- Readings:
- Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition by P Drineas, R Kannan, MW Mahoney, SIAM Journal of Computing 2007.
- Tensor-CUR Decompositions For Tensor-Based Data by M. W. Mahoney, M. Maggioni, and P. Drineas, KDD 2003.
- Less is More: Compact Matrix Decomposition for Large Sparse Graphs by J. Sun, Y. Xie, H. Zhang, C. Faloutsos, SDM 2007.
2/11: Clustering
2/16: Mining data streams
2/18: Mining data streams (Cont)
2/23: Large scale supervised machine learning (1)
- k-nearest neighbor, Perceptron[slides]
- Readings:
- Learning Using Large Datasets by L. Bottou, O. Bousquet, MMSD 2009.
- Map-Reduce for Machine Learning on Multicore, C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, K. Olukotun, NIPS 2006.
2/25: Large scale supervised machine learning (2) (guest lecture by Sugato Basu)
- Classification and regression trees [slides]
- Readings:
3/2: Large scale supervised machine learning (3)
- Support Vector Machines, Cutting plane algorithm, SVM for structured output prediction [slides]
- Readings:
3/4: Association Rules
3/9: Optimizing submodular functions
- Submodular functions, outbreak detection in networks, finding influencers in networks [slides]
- Readings:
- Near-optimal Nonmyopic Value of Information in Graphical Models by A. Krause, C. Guestrin. UAI, 2005.
- Maximizing the Spread of Influence through a Social Network by D. Kempe, J. Kleinberg, E. Tardos, KDD 2003.
- Cost-effective Outbreak Detection in Networks by J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, KDD 2007.
3/11: Mining the Web for Structured Data
- Mining the Web for Structured Data [slides]