Mining Massive Data Sets Hadoop Lab

Download as PDF

Course Description

Supplement to CS 246 providing additional material on the Apache Hadoop family of technologies. Students will learn how to implement data mining algorithms using Hadoop and Apache Spark, how to implement and debug complex data mining and data transformations, and how to use two of the most popular big data SQL tools. Topics: data mining, machine learning, data ingest, and data transformations using Hadoop, Spark, Apache Impala, Apache Hive, Apache Kafka, Apache Sqoop, Apache Flume, Apache Avro, and Apache Parquet. Prerequisite: CS 107 or equivalent.

Grading Basis

RSN - Satisfactory/No Credit

Min

1

Max

1

Course Repeatable for Degree Credit?

No

Course Component

Lecture

Enrollment Optional?

No

Programs

CS246H is a completion requirement for: