Hadoop, Data Warehousing, and ETL for Software Developers Training Course

Home Training Hadoop, Data Warehousing, and ETL for Software Developers

Hadoop, Data Warehousing, and ETL for Software Developers Training


We offer private customized training for groups of 3 or more attendees.
get pricing information
Course Description
This course enables participants to understand what the Hadoop platform is and provides hands-on lab exercises to apply the concepts, plan, run, and use the platform. Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads into the traditional BI analytics world. This course will introduce the participant to the core components of the Hadoop Eco System and its analytics, as well as planning, running, and administering a Hadoop Cluster. It will emphasize the use cases of Hadoop and Data Warehousing, and provide best practices and guidelines on combining the two. Course Length: 3 Days Course Tuition: $1190 (US)

Prerequisites
For software developers, business analysts, and IT administrators. Participants should be able to navigate the Linux command-line interface and have a basic knowledge of Linux editors, such as vi or nano. Also, basic knowledge of Java and understanding ETL are required.

Course Outline

Course Topics

• Introduction

• Data Access, Integration

• Transformation, Aggregation

• Feature Generation

• Join Various Data Sources

• Filter, Search, Transpose

• Binning and Smoothing

• More Topics

Course Objectives

Upon completion of this course, participants will be able to:

• Describe what the Hadoop platform is and its purpose.

• Describe the core components of the Hadoop Eco System.

• Plan, run, and use a Hadoop Cluster.

• Describe and apply best practices and guidelines on combining Hadoop and Data Warehousing.

Day 1:

I. Introduction

A. Hadoop Eco System Overview

B. HDFS

C. MapReduce

II. Data Access, Integration

A. Navigate in Hadoop

B. Access Data and Files in HDFS and Tables

C. Pig

Day 2: Hive

III. Transformation, Aggregation

A. Consume Large Datasets and Tables

B. Working with Dates, Timestamps, Arrays

C. Use Group By and Summarize Various Attributes

D. Converting Strings to Date/Time, Numbers

E. Concatenating Columns

F. Parsing Semi-Structured Data

IV. Feature Generation

A. Create New Attributes, Mathematical Calculations, Windowing Functions

B. Use Character and String Functions

V. Join Various Data Sources

A. Join Multiple Files and Tables in an Optimized Way

VI. Filter, Search, Transpose

A. Ways to Limit the Data Using Various Predicate Methods

B. Pivot the Data in Different Ways Wide to Long and Vice Versa

C. Find Missing Values

VII. Binning and Smoothing

A. Create Buckets and Groups for Categorization

Day 3:

VIII. More Topics

A. HBase

B. Others

Course Directory [training on all levels]

Technical Training Courses

Software engineer/architect, System Admin ... Welcome!

Business Training Courses

Project Managers, Business Analysts, Paralegals ... Welcome!

Upcoming Classes

Gain insight and ideas from students with different perspectives and experiences.

Docker
27 May, 2026 - 29 May, 2026
RED HAT ENTERPRISE LINUX SYSTEMS ADMIN II
29 June, 2026 - 2 July, 2026
AWS Certified Machine Learning: Specialty (MLS-C01)
20 July, 2026 - 24 July, 2026
Linux Troubleshooting
1 June, 2026 - 5 June, 2026
RED HAT ENTERPRISE LINUX SYSTEMS ADMIN I
18 May, 2026 - 22 May, 2026
See our complete public course listing