info@npntraining.com   +91 8095918383 / +91 9535584691

Big Data Hadoop Training

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer, Hadoop Tester & Hadoop Analyst.

Course description: This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

Course Description

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer,Hadoop Tester & Hadoop Analyst.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Course Objectives

By the end of the course,  you will:

1. Understand Hadoop 1.x & 2.x Architecture.
2. Setup Hadoop Cluster and write Complex MapReduce programs.
3. Learn different Hadoop Commands.
4. Data Loading techniques using Sqoop.
5. Perform data analytics using Pig, Hive and YARN .
6. Understand NoSQL & HBase.
7. Implement best practices for Hadoop development.

Candidate Evaluation

We follow assessment and project based approach to make your learning maximized. For each of the module there will be multiple Assessment/Problem Statements.

Each of the Assessments in the E-Learning helps students to grasp the concepts thought in class and apply in business problem scenarios.

  • Module Test: You will have test for each of the topics covered in the previous class/week. These tests are usually for 15-20 minute duration.
  • Hands-on Test: Each candidate will be given a exercise for evaluation and candidate has to solve.
  • Coding Hackathon: Coding hackathon will be conducted during the middle of the course. This is conducted to test application of concepts to the given problem of statement with tools and techniques that have been covered and to solve a problem quickly, accurately.
  • Capstone Projects: At the end of each course there will be a Real-world Capstone Project that enables you to build and end-to-end solution to a real world problems. You will be required to write a project report and present to the audience.

Work on a real time project on Big Data

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

 

Hadoop 2.x – Distributed Storage + Batch Processing

Module 01 - Understanding Big Data & Hadoop 2.x

Learning Objectives – In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

Topics –

  • Understanding what is Big Data
  • Combined storage + computation layer
  • Bussiness Usecase – Telecom
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Introduction to Hadoop
  • Exploring Hadoop 2.x Core Components
  • Understanding Hadoop 2.x Daemon Services
    1. NameNode
    2. DataNode
    3. Secondary NameNode
    4. ResourceManager
    5. NodeManager
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read and File Write
  • Understanding HDFS Federation
  • Understanding High Availablity Feature in Hadoop 2.x
  • Exploring Big Data ecosystem

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Module 02 - Exploring Administration + File System + YARN Commands

Learning Objectives – In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics –

  • Analyzing ResourceManager and NameNode UI
  • Exploring HDFS File System Commands – [Hands-on]
  • Exploring Hadoop Admin Commands – [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program – [Hands-on]
  • Killing Job
  • Data Loading in Hadoop – [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS – [Hands-on]
  • Executing MapReduce Jobs
  • Different techniques to move data to HDFS – [Hands-on]
  • Backup and Recovery of Hadoop cluster – [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. – [Activity]
  • Understanding Hadoop Safe Mode – Maintenance state of NameNodeKey/value pairs – [Hands-on]
  • Configuring Trash in HDFS – [POC]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Module 03 - MapReduce Programming

Learning Objectives –In this module, you will understand how MapReduce framework works.

Topics –

  • Introduction to MapReduce
  • Understanding Key/Value in MapReduce
    1. What it means?
    2. Why key/value data?
  • Hadoop Topology Cluster
    1. The 0.20 MapReduce Java API
    2. The Reducer class
    3. The Mapper class
    4. The Driver class
  • Flow of Operations in MapReduce
  • Implementing Word Count Program – [Hands-on]
  • Exploring Default InputFormat – TextInputFormat
  • Submission & Initializing of MapReduce job – [Activity]
  • Handling MapReduce Job
  • Exploring Hadoop Datatypes
  • Understanding Data Locality
  • Serialization & DeSerialization

 View Module Presentation

Module 04 - Hive and Hive QL

Learning Objectives – In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics –

  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] – Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format – [Activity]
    2. SequenceFile Format – [Activity]
    3. RC File Format – [Activity]
    4. ORC Files in Hive – [Activity]
  • Using different file formats and capturing Performance reports – [POC]
  • Map-side join – [Hands-on]
  • Reduce-side join – [Hands-on]
  • [Use case] Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join – [Hands-on]
  • Writing custom UDF – [Hands-on]
  • Accessing Hive with JDBC – [Hands-on]

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 05 - Hive Optimization

Learning Objectives – In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics –

  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] – Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format – [Activity]
    2. SequenceFile Format – [Activity]
    3. RC File Format – [Activity]
    4. ORC Files in Hive – [Activity]
  • Using different file formats and capturing Performance reports – [POC]
  • Map-side join – [Hands-on]
  • Reduce-side join – [Hands-on]
  • [Use case] Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join – [Hands-on]
  • Writing custom UDF – [Hands-on]
  • Accessing Hive with JDBC – [Hands-on]

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 06 - Sqoop

Learning Objectives – In this module you will learn how to Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations.

Topics –

  • Sqoop Overview
  • Sqoop JDBC Driver and Connectors
  • Sqoop Importing Data
  • Various Options to Import Data
  • Understanding Sqoop Jobs
    1. Table Import
    2. Filtering Import
  • Incremental Imports using Sqoop

 

Module 07 - Apache Pig

Learning Objectives – In this module you will learn Apache Pig by contrasting it with MapReduce..

Topics –

  • Introduction to Apache Pig
  • MapReduce VS Pig
  • Exploring Pig Components and Pig Execution
  • Introduction to Pig Latin
  • Input and Output
    1. Load
    2. Store
    3. Dump
  • Relational Operators
    1. Foreach
    2. Filter
    3. Group
    4. Distinct
    5. Join
    6. Parallel
  • Multi Dataset Operators
    1. Techniques for combining Data sets
    2. Joining Data sets in Pig

Understanding & Building Data pipeline Architecture using Pig and Hive

Project Description – We will use the the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) data who tracks the ontime performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights appear in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.

 

Architect diagram – 

Module 08 - NoSQL & HBase

Learning Objectives – In this module you will learn about NoSql database and difference between HBase and relational databases. Explore features of the NoSQL databases, CAP theorem, and the HBase architecture. Understand the data model and perform various operations

Topics –

  • Understanding NoSQL Databases
  • Categories of NOSQL
    1. Key-Value Database
    2. Document Database
    3. Column Family Database
    4. Graph Database
  • What is HBase
  • Row Oriented VS Column Oriented Database
  • Features of HBas
  • Data Model in HBase
  • HBase Physical Storage
  • Exploring HBase Shell Commands
    1. PUT
    2. GET
    3. DELETE
    4. Filtering Records
  • HBase Client API

Quartz – Enterprise Job Scheduler

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module - Quartz Scheduler

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

Process we follow for project development

We follow Agile methodology for the project development,

  1. Each batch will be divided into scrum teams of size 4-5 members.
  2. We will start with a Feature Study before implementing a project.
  3. The Feature will be broken down into User Stories and Tasks.
  4. For each user story a proper Definition Of Done will be defined
  5. A Test plan will be defined for testing the user story

Real-time Project 01 - Data Model Development Kit

Project Description :

This project helps data model developer to manage Hive tables with different tables, storage types, column types and column properties required for different use case development.

Roles & Responsibility

  1. Building .xml files to define structures of hive tables to be used for storing process data generated.
  2. Actively involved in development to read .xml files, create data models and load data in hive.

Technologies Used

Java, JAXB, JDBC, Hadoop, Hive,

Sample User Stories

[Study User Story 01] – Come up with a design to represent data model required to handle the following scenarios

  • To Handle different operations like “CREATE”, “UPDATE”,”DELETE”
  • Way to define partition table
  • To Store columns in Orders
  • To Store column Name
  • To Handle Update of Column type and Name

[User Story 02] – HQL Generator – As a developer, we have to provide a functionality to create table

**Tasks**
– [ ] . Building Maven project and adding dependency
– [ ] . Integrate Loggers
– [ ] . Code Commit
– [ ] . Create a standard package structure.
– [ ] . Utility to read xml and create Java Object
– [ ] . Utility code to communicate to Hive DB
– [ ] . Check for Hive Service before executing queries
– [ ] . Code to construct HQL query for create.
– [ ] . Exception Handling.

Definition of Done
– [ ] Package structure should be created.
– [ ] Table has to be created in Hive
– [ ] Validate all required schema is created
– [ ] Validation of Hadoop + Hive Services

**Test Cases**
1.If table already exists we need to print “Table already exists”
2.Verify schema with xml
3.If services are not up and running,it should handle and log it.

Real-time Project 02 - Scheduled Data Export

As a telecom operator they want the data to be exported in scheduled interval of time from Hive Tables. Customer wants below mentioned features to be implemented.

  1. UI to configure the properties related to export
  2. Export of Hive Table to the specified location
  3. Compression of exported data
  4. Data to exported with encryption & Decryption feature of export data
  5. Utility for client to decrypt the data
  6. Feature to schedule the export jobs.

Sample screenshots

What are the system requirements for this course?

The system requirements include Windows / Mac / Linux PC, minimum 4GB RAM, 20 GB HDD Storage and processor, i3 or above.

Dec 01st

Batch: Weekend Sat & Sun
Duration: 3 Months
 12,000

Enroll Now

Jan 05st

Batch: Weekend Sat & Sun
Duration: 3 Months
 12,000

Enroll Now

Jan 26st

Batch: Weekend Sat & Sun
Duration: 3 Months
 12,000

Enroll Now

Batches not available

Course hours

30 hours extensive class room training.
10 sessions of 3 hours each.
Course Duration
: 3 Months

Assignments

For each module multiple Hands-on exercises, assignments and quiz are provided in the E-Learning

Real time project

We follow Agile Methodology for the project development. Each project will have Feature Study followed by User stories and Tasks.

Mock Interview

There will be a dedicated 1 to 1 interview call between you and a Big Data Architect. Experience a real Mock Interview.

Forum

We have a community forum for all our students wherein you can enrich their learning through peer interaction and knowledge sharing.

Certification

From the beginning of the course, you will be working on a project. On completion of the project NPN Training certifies you as a “Hadoop Developer” based on the project.

Is Java a pre-requisite to learn Big Data Masters Program?

Yes Java is a pre-requisite, there are institutes who says Java is not required all those are false information

Can I attend a demo session before enrollment?

Yes, You will be sitting in an actual live class to experience the quality of training.

How will I execute the Practicals?

We will help you to setup NPN Training’s Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

Who are the Instructor at NPN Training?

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

How do I access the eLearning content for the course?

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content

What If I miss a session?

The course validity will be one year so that you can attend the missed session in another batches.

Do I avail EMI option?

The total fees you will be paying in 2 installments

Are there any group discounts for classroom training programs?

Yes, we have group discount options for our training programs. Contact us using the form “Drop Us a Query” on the right of any page on the NPN Training website, or select the Live Chat link. Our customer service representatives will give you more details.

Do I need to bring my own laptop?

NPN Training will provide students with all the course material in hard copies. However, students should carry their individual laptops for the program. Please find the minimum configuration required:

  • Windows 7 / Mac OS
  • 8 GB RAM is highly preferred
  • 100 GB HDD
  • 64 bit OS
Chat with Instructor