CS246H
Mining Massive Data Sets: Hadoop Labs
Winter 2017
This course is designed to give students a practical understanding of the tools in the Hadoop ecosystem with a focus on understanding MapReduce and Spark.
The focus of this course is on the practical application of big data technologies, rather than on the theory behind them.
This is a partner course to
CS246: Mining Massive Datasets and includes limited additional assignments.
The course is adapted from the professional courses taught by
Cloudera.
Announcements:
Important course information will be posted on this web page and announced
in class. You are responsible for all material that appears here and should
check this page for updates frequently.
- 1/11: The first class will be held at 11:30 on Wednesday 1/11, in Skilling Auditorium.
We look forward to seeing you there!
- 1/12: We are organizing a VM clinic to help students set up their VMs. Daniel Templeton will be at the session, assisted by several other TAs. Time and Location: January 16 (coming Monday), 6PM to 9PM in Gates 415.
Course information:
Lectures:
Wednesdays 11:30-13:20 in Skilling Auditorium
Instructors:
Daniel Templeton (daniel at cloudera dot com), Cloudera
Office Hours: By arrangement
Jure Leskovec
Office Hours: Wednesdays 9-10am, Gates InfoLab
You Will Learn to
- Implement and debug complex data processing applications in Hadoop
- Use some of the tools in the Hadoop ecosystem for data mining and machine learning
Topics Include
- Apache Hadoop
- Apache Spark
- Apache Hive
- Apache Impala
- Apache Kafka
- Other ecosystem tools, e.g. Apache Sqoop, Apache Pig, etc.
Automated Quizzes
This course will include eight weekly Gradiance quizzes to check that students are learning the concepts. Some of the quizzes will require students to complete short programming assignments to produce the answers. The Gradiance token for this class is
6A8C4765.
Lecture notes