Hadoop, Data Warehousing, and ETL for Software Developers Training in Clifton
Enroll in or hire us to teach our Hadoop, Data Warehousing, and ETL for Software Developers class in Clifton, New Jersey by calling us @303.377.6176. Like all HSG
classes, Hadoop, Data Warehousing, and ETL for Software Developers may be offered either onsite or via instructor led virtual training. Consider looking at our public training schedule to see if it
is scheduled: Public Training Classes
Provided there are enough attendees, Hadoop, Data Warehousing, and ETL for Software Developers may be taught at one of our local training facilities.
We offer private customized training for groups of 3 or more attendees.
|
||
Course Description |
||
This course enables participants to understand what the Hadoop platform
is and provides hands-on lab exercises to apply the concepts, plan, run,
and use the platform. Apache Hadoop is the most popular framework for
processing Big Data. Hadoop provides rich and deep analytics capability,
and it is making in-roads into the traditional BI analytics world. This
course will introduce the participant to the core components of the
Hadoop Eco System and its analytics, as well as planning, running, and
administering a Hadoop Cluster. It will emphasize the use cases of
Hadoop and Data Warehousing, and provide best practices and guidelines
on combining the two.
Course Length: 3 Days
Course Tuition: $1190 (US) |
Prerequisites |
|
For software developers, business analysts, and IT administrators. Participants should be able to navigate the Linux command-line interface and have a basic knowledge of Linux editors, such as vi or nano. Also, basic knowledge of Java and understanding ETL are required. |
Course Outline |
Course Topics
• Introduction
• Data Access, Integration
• Transformation, Aggregation
• Feature Generation
• Join Various Data Sources
• Filter, Search, Transpose
• Binning and Smoothing
• More Topics
Course Objectives
Upon completion of this course, participants will be able to:
• Describe what the Hadoop platform is and its purpose.
• Describe the core components of the Hadoop Eco System.
• Plan, run, and use a Hadoop Cluster.
• Describe and apply best practices and guidelines on combining Hadoop and Data Warehousing.
Day 1:
I. Introduction
A. Hadoop Eco System Overview
B. HDFS
C. MapReduce
II. Data Access, Integration
A. Navigate in Hadoop
B. Access Data and Files in HDFS and Tables
C. Pig
Day 2: Hive
III. Transformation, Aggregation
A. Consume Large Datasets and Tables
B. Working with Dates, Timestamps, Arrays
C. Use Group By and Summarize Various Attributes
D. Converting Strings to Date/Time, Numbers
E. Concatenating Columns
F. Parsing Semi-Structured Data
IV. Feature Generation
A. Create New Attributes, Mathematical Calculations, Windowing Functions
B. Use Character and String Functions
V. Join Various Data Sources
A. Join Multiple Files and Tables in an Optimized Way
VI. Filter, Search, Transpose
A. Ways to Limit the Data Using Various Predicate Methods
B. Pivot the Data in Different Ways Wide to Long and Vice Versa
C. Find Missing Values
VII. Binning and Smoothing
A. Create Buckets and Groups for Categorization
Day 3:
VIII. More Topics
A. HBase
B. Others
|
Course Directory [training on all levels]
Technical Training Courses
Software engineer/architect, System Admin ... Welcome!
- .NET Classes
- Agile/Scrum Classes
- AI Classes
- Ajax Classes
- Android and iPhone Programming Classes
- Azure Classes
- Blaze Advisor Classes
- C Programming Classes
- C# Programming Classes
- C++ Programming Classes
- Cisco Classes
- Cloud Classes
- CompTIA Classes
- Crystal Reports Classes
- Data Classes
- Design Patterns Classes
- DevOps Classes
- Foundations of Web Design & Web Authoring Classes
- Git, Jira, Wicket, Gradle, Tableau Classes
- IBM Classes
- Java Programming Classes
- JBoss Administration Classes
- JUnit, TDD, CPTC, Web Penetration Classes
- Linux Unix Classes
- Machine Learning Classes
- Microsoft Classes
- Microsoft Development Classes
- Microsoft SQL Server Classes
- Microsoft Team Foundation Server Classes
- Microsoft Windows Server Classes
- Oracle, MySQL, Cassandra, Hadoop Database Classes
- Perl Programming Classes
- Python Programming Classes
- Ruby Programming Classes
- SAS Classes
- Security Classes
- SharePoint Classes
- SOA Classes
- Tcl, Awk, Bash, Shell Classes
- UML Classes
- VMWare Classes
- Web Development Classes
- Web Services Classes
- Weblogic Administration Classes
- XML Classes
Business Training Courses
Project Managers, Business Analysts, Paralegals ... Welcome!
Upcoming Classes
Gain insight and ideas from students with different perspectives and experiences.
- ASP.NET Core MVC (VS2022)
24 November, 2025 - 25 November, 2025 - RED HAT ENTERPRISE LINUX SYSTEMS ADMIN II
8 December, 2025 - 11 December, 2025 - Python for Scientists
8 December, 2025 - 12 December, 2025 - RHCSA EXAM PREP
17 November, 2025 - 21 November, 2025 - Fast Track to Java 17 and OO Development
8 December, 2025 - 12 December, 2025 - See our complete public course listing