Stanford CS246: Mining Massive Data Sets (Winter 2018)

CS246

Mining Massive Data Sets

Winter 2018

The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data.

In Spring 2018, we will be offering a project based course where students will apply data mining and machine learning techniques on real world datasets. CS341: Project in Mining Massive Data Sets

Announcements:

1/9: The first class will be held at 3pm on Tuesday, January 9, in NVIDIA Auditorium, Huang Engineering Center.
The info sheet for the course is available: [Info Sheet]
1/12: Spark tutorial slides from Thursday available: [Spark Slides]

Course information:

Lectures:

Tuesday & Thursday 3pm - 4:20pm in NVIDIA Auditorium, Jen-Hsun Huang Engineering Center.
Watch video lectures on SCPD. Stanford students can see them here.

Instructor:

Jure Leskovec
Office: 418 Gates
Office Hours: Tuesday 9:00-10:00am

Companion course CS246H:

There is a companion course CS246H, which is completely independent from CS246 and covers Spark programming. It meets Wednesdays 11:30AM - 1:20PM, in Skilling Auditorium, and the lecture videos can be viewed here.

Office hours:

All office hours for local students will be held in the Huang basement, except Jure's office hours which are in Gates 418. SCPD students can join the office hours via videoconferencing. The videoconferencing link is available on Piazza.

Office hours will be held on QueueStatus. Please join the queue to sign up for office hours. You may add your name to the queue once every two hours (when the queue is open), and all students in the queue will be given priority over students not in the queue. This is to ensure that all students get to see the TA at least once.

Heather and Hiroto are the Spark TAs; they may be able to help with Spark more than the other TAs. Heather, Jessica, and Kush are the Scala TAs; they may be able to help with Scala more than the other TAs.

Course materials:

Automated Quizzes: We will be using Gradiance. Everyone (on-campus as well as SCPD students) should create an account there (passwords are at least 10 letters and digits with at least one of each) and enter the class code 79D9D7F3. Please use your real first and last name, with the standard capitalization, e.g., "Jeffrey Ullman". Also please register using the same email you used for Gradescope so we can match your Gradiance score report to other class grades.

Books: Leskovec-Rajaraman-Ullman: Mining of Massive Datasets can be downloaded for free. It can also be purchased from Cambridge University Press, but you are not required to do so.

MOOC: You can watch videos from a past Coursera MOOC (similar to this course) on Youtube.

Piazza: Piazza Discussion Group for this class.

Course handouts: Available here.

Staff Email: You can reach us at cs246-win1718-staff@lists.stanford.edu (consists of the TAs and the professor). Please don't email us individually and always use the mailing list or Piazza.

Feedback form: Please reach out to us on the anonymous feedback form if you have comments about the class. We appreciate your feedback, and will use it to improve the class for you.