info@npntraining.com   +91 8095918383 / +91 9535584691

Big Data Hadoop Training

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer, Hadoop Tester & Hadoop Analyst.

Course description: This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer,Hadoop Tester & Hadoop Analyst.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Course Objectives

By the end of the course,  you will:

1. Understand Hadoop 1.x & 2.x Architecture.
2. Setup Hadoop Cluster and write Complex MapReduce programs.
3. Learn different Hadoop Commands.
4. Data Loading techniques using Sqoop.
5. Perform data analytics using Pig, Hive and YARN .
6. Understand NoSQL & HBase.
7. Implement best practices for Hadoop development.

Work on a real life project on Big Data Analytics

As part of the course work, you will work on the below mentioned projects,where you will be using PIG, HIVE, HBase  and MapReduce to perform Big Data analytics.
Following are a few industry-specific Big Data case studies that are included in our Big Data and Hadoop Certification
e.g. Security Agency, Retail, Banking, Education, Media, Health care etc.

Project #1 : Analysis of Afghan War Dairies
Industry : Security Agency
The data comprises information gathered by soldiers and Intelligent officers of United States Military to examine events that involve explosive hazards and to find events that involve Improvised Explosive Devices (IEDs).

Project #2 : Customer Complaints Analysis about Products
Industry : Retail
The publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc. 
Problem Statement: Analyze the data in the Hadoop ecosystem to:
1. Get the number of complaints filed under each product
2. Get the total number of complaints filed from a particular location
3. Get the list of complaints grouped by location which has no timely response

Project #3 : Credit card Analysis
Industry : Banking
XYZ Bank is an Indian multinational banking and financial services company headquartered in Delhi, India. XYZ is a financial institution that provides various financial services, such as accepting deposits, issuing Credit Cards and loans. XYZ bank has range of investment products that offer like savings accounts and certificates of deposit. It offers a wide range of banking products and financial services for corporate and retail customers through a variety of delivery channels and specialised subsidiaries in the areas of investment banking, life, non-life insurance, venture capital and asset management.

Project #4 : Scholastic Assessment Analysis
Industry : Education
This data set is SAT (College Board) 2010 School Level Results which gives you the information about how the students perform in the tests from different schools. It consists of the below fields.
DBN, School Name, Number of Test Takers, Critical Reading Mean, Mathematics Mean, Writing Mean
Here DBN will be the unique field for this dataset. The students were given a test. Based on the results from the test.

Here we are trying to analyze this data and below are the few problem statements that we have chosen:
1. Find the total number of test takers.
2. Find the highest mean/average of the Critical Reading section and the school name.
3. Find the highest mean/average of the Mathematics section and the school name
4. Find the highest mean/average of the Writing section and the school name

Project #5 : Processing Movielens dataset using Pig
Industry : Entertainment
In this project, we will learn about Apache Pig and how to use it to process the Movielens dataset. We will get familiar with the various Pig operators used for data processing. We will cover how to use UDFs and write your own custom UDFs. Finally we will take a look at diagnostics and performance tunning.

Project #6 : Health care Analysis
Industry : Health care
Below are few of the problem statement that we have chosen to work on this dataset.
1.How many hospital centres got more than 60% patient satisfaction regarding cleanliness?
2.Which hospital centre got maximum overall rating between 9-10.

Hadoop 2.x – Distributed Storage + Batch Processing

Module 01 - Understanding Big Data & Hadoop 2.x

Learning Objectives – In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

Topics –

  • Understanding what is Big Data
  • Combined storage + computation layer
  • Bussiness Usecase – Telecom
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Introduction to Hadoop
  • Exploring Hadoop 2.x Core Components
  • Understanding Hadoop 2.x Daemon Services
    1. NameNode
    2. DataNode
    3. Secondary NameNode
    4. ResourceManager
    5. NodeManager
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read and File Write
  • Understanding HDFS Federation
  • Understanding High Availablity Feature in Hadoop 2.x
  • Exploring Big Data ecosystem

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Module 02 - Exploring Administration + File System + YARN Commands

Learning Objectives – In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics –

  • Analyzing ResourceManager and NameNode UI
  • Exploring HDFS File System Commands – [Hands-on]
  • Exploring Hadoop Admin Commands – [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program – [Hands-on]
  • Killing Job
  • Data Loading in Hadoop – [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS – [Hands-on]
  • Executing MapReduce Jobs
  • Different techniques to move data to HDFS – [Hands-on]
  • Backup and Recovery of Hadoop cluster – [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. – [Activity]
  • Understanding Hadoop Safe Mode – Maintenance state of NameNodeKey/value pairs – [Hands-on]
  • Configuring Trash in HDFS – [POC]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Module 03 - MapReduce Programming

Learning Objectives –In this module, you will understand how MapReduce framework works.

Topics –

  • Introduction to MapReduce
  • Understanding Key/Value in MapReduce
    1. What it means?
    2. Why key/value data?
  • Hadoop Topology Cluster
    1. The 0.20 MapReduce Java API
    2. The Reducer class
    3. The Mapper class
    4. The Driver class
  • Flow of Operations in MapReduce
  • Implementing Word Count Program – [Hands-on]
  • Exploring Default InputFormat – TextInputFormat
  • Submission & Initializing of MapReduce job – [Activity]
  • Handling MapReduce Job
  • Exploring Hadoop Datatypes
  • Understanding Data Locality
  • Serialization & DeSerialization

 View Module Presentation

Module 04 - Hive and Hive QL

Learning Objectives – In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics –

  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] – Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format – [Activity]
    2. SequenceFile Format – [Activity]
    3. RC File Format – [Activity]
    4. ORC Files in Hive – [Activity]
  • Using different file formats and capturing Performance reports – [POC]
  • Map-side join – [Hands-on]
  • Reduce-side join – [Hands-on]
  • [Use case] Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join – [Hands-on]
  • Writing custom UDF – [Hands-on]
  • Accessing Hive with JDBC – [Hands-on]

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 05 - Hive Optimization

Learning Objectives – In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics –

  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] – Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format – [Activity]
    2. SequenceFile Format – [Activity]
    3. RC File Format – [Activity]
    4. ORC Files in Hive – [Activity]
  • Using different file formats and capturing Performance reports – [POC]
  • Map-side join – [Hands-on]
  • Reduce-side join – [Hands-on]
  • [Use case] Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join – [Hands-on]
  • Writing custom UDF – [Hands-on]
  • Accessing Hive with JDBC – [Hands-on]

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 06 - Sqoop

Learning Objectives – In this module you will learn how to Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations.

Topics –

  • Sqoop Overview
  • Sqoop JDBC Driver and Connectors
  • Sqoop Importing Data
  • Various Options to Import Data
  • Understanding Sqoop Jobs
    1. Table Import
    2. Filtering Import
  • Incremental Imports using Sqoop

 

Module 07 - Apache Pig

Learning Objectives – In this module you will learn Apache Pig by contrasting it with MapReduce..

Topics –

  • Introduction to Apache Pig
  • MapReduce VS Pig
  • Exploring Pig Components and Pig Execution
  • Introduction to Pig Latin
  • Input and Output
    1. Load
    2. Store
    3. Dump
  • Relational Operators
    1. Foreach
    2. Filter
    3. Group
    4. Distinct
    5. Join
    6. Parallel
  • Multi Dataset Operators
    1. Techniques for combining Data sets
    2. Joining Data sets in Pig

Understanding & Building Data pipeline Architecture using Pig and Hive

Project Description – We will use the the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) data who tracks the ontime performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights appear in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.

 

Architect diagram – 

Module 08 - NoSQL & HBase

Learning Objectives – In this module you will learn about NoSql database and difference between HBase and relational databases. Explore features of the NoSQL databases, CAP theorem, and the HBase architecture. Understand the data model and perform various operations

Topics –

  • Understanding NoSQL Databases
  • Categories of NOSQL
    1. Key-Value Database
    2. Document Database
    3. Column Family Database
    4. Graph Database
  • What is HBase
  • Row Oriented VS Column Oriented Database
  • Features of HBas
  • Data Model in HBase
  • HBase Physical Storage
  • Exploring HBase Shell Commands
    1. PUT
    2. GET
    3. DELETE
    4. Filtering Records
  • HBase Client API

Quartz – Enterprise Job Scheduler

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module - Quartz Scheduler

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

 This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

What are the system requirements for this course?

The system requirements include Windows / Mac / Linux PC, minimum 4GB RAM, 20 GB HDD Storage and processor, i3 or above.

What are the system requirements for this course?

The system requirements include Windows / Mac / Linux PC, minimum 4GB RAM, 20 GB HDD Storage and processor, i3 or above.

What are the system requirements for this course?

The system requirements include Windows / Mac / Linux PC, minimum 4GB RAM, 20 GB HDD Storage and processor, i3 or above.

Aug 27th

07:00AM - 09:00AM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

July 23rd

08:30PM - 10:30PM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

July 23rd

08:30PM - 10:30PM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

Aug 27th

07:00AM - 09:00AM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

July 23rd

08:30PM - 10:30PM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

July 23rd

08:30PM - 10:30PM (1st)
Mon-Fri (21 Days)
 19,795

Enroll Now

Instructor-led Sessions

33 Hours of Online Live Instructor-Led Classes. Weekend Class : 11 sessions of 3 hours each.

 

Real-life Case Studies

Live project based on any of the selected use cases, involving implementation of CI/CD Methodologies using various Devops Tools.

 

Assignments

Live project based on any of the selected use cases, involving implementation of CI/CD Methodologies using various Devops Tools.

 

Can I attend a demo session before enrollment?

We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight about how are the classes conducted, quality of instructors and the level of interaction in a class.

Will I Get Placement Assistance?

More than 70% of NPN Learners have reported change in job profile (promotion), work location (onsite), lateral transfers & new job offers. NPN certification is well recognized in the IT industry as it is a testament to the intensive and practical learning you have gone through and the real life projects you have delivered.

What if I have more queries?

The more queries you come up with, more happy we are, as it is a strong indication of your effort to learn. Our Instructors will answer all your queries during classes, PLMs will be available to resolve any functional or technical query and we will even go to lengths of solving your doubts via screen sharing. If you are committed to learn, we are Ridiculously Committed to make you learn.

Who are the Instructor at NPN Training?

Our instructors are expert professionals with more than 10 years of experience, selected after a stringent process. Besides technology expertise, we look for passion and joy for teaching in our Instructors. After shortlisting, they undergo a 3 months long training program.
All instructors are reviewed by learners for every session they take, and they have to keep a consistent rating above 4.5+ to be a part of NPN Faculty.

How do I avail EMI option as a method of payment?

You no longer need a credit history or a credit card to purchase this course. Using ZestMoney, we allow you to complete your payment with a EMI plan that best suits you. It’s a simple 3 step procedure:
Fill your profile: Complete your profile with Aadhaar, PAN and employment details.
Verify your account: Get your account verified using netbanking, ekyc or uploading documents
Activate your loan: Setup automatic repayment using NACH to activate your loan

Chat with Instructor