The class will be next offered in Winter 2011.
The new course number is CS246.
See for more info.

Data Mining
Winter 2010

Course information:


Jure Leskovec
Office Hours: Wednesdays 9-10am, Gates 418

Anand Rajaraman
Office Hours: Tuesday/Thursday 5:30-6:30pm (after the class in the same room)


Tuesday, Thursday 4:15PM - 5:30PM in 200-203 (History Corner).

Teaching assistants:

Abhishek Gupta ( Office Hours: Mon 3.30-5 PM Gates B26A, Fri 3.30-5 pm Gates B24A

Roshan Sumbaly ( Office Hours: Monday 1PM-2.15PM in Gates B24B / Pup Cluster

Staff mailing list:

You can reach us at


CS145 or equivalent.


Readings have been derived from the book Mining of Massive Datasets. Also you will find Chapter 20.2, 22 and 23 of the second edition of Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom) relevant. Slides from the lectures will be made available in PDF format.

Students will use the Gradiance automated homework system for which a fee will be charged. Note: if you already have Gradiance (GOAL) privileges from CS145 or CS245 within the past year, you should also have access to the CS345A homework without paying an additional fee. Notes and/or slides will be posted on-line.

You can see earlier versions of the notes and slides covering 2008/09 CS345a Data Mining. Not all these topics will be covered this year.


There will be periodic homeworks (some on-line, using the Gradiance system), a final exam, and a project on web-mining. The homework will count just enough to encourage you to do it, about 20%. The project and final will account for the bulk of the credit, in roughly equal proportions.


Course outline

See Handouts for a list of topics and reading materials.


Important Dates

Challenge Problems (in addition to the Gradiance homeworks):
Gradiance homeworks:
Final Project: Finals: Alternate Finals: