CS246H -
Mining Massive Data Sets Hadoop Lab
Online
Overview
Build a solid framework for data mining by taking advantage of this lab course, which builds on the MapReduce framework Hadoop introduced in the first part of Mining Massive Data Sets, CS246. Hadoop will be covered in depth to give students a more complete understanding of the platform and its role in data mining. This is a partner course to CS246 and does not include additional assignments.
You Will Learn
- Implement data mining algorithms discussed in CS246 using Hadoop
- Implement and debug complex MapReduce jobs in Hadoop
- Use some of the tools in the Hadoop ecosystem for data mining and machine learning
Instructors
- Jure Leskovec Assistant Professor of Computer Science
- Daniel Templeton Lecturer in Computer Science
Topics Include
- Hadoop
- MapReduce
- Hive
- Cloudera ML/Oryx
- Mahout
- TF-IDF
- Pig, Sqoop, Oozie, HBase, Zookeeper, and Impala
Units
1.0
Prerequisites
Computer Organizations & Systems (Stanford Course CS107) or equivalent.
Other
Tuition & Fees
For course tuition, reduced tuition (SCPD member companies and United States Armed forces), and fees, please click Tuition & Fees.