+91 8095918383 / +91 9535584691

Data Science Training in Bangalore

The Big Data Architect Masters Program Training is designed to help you gain end to end coverage of Big Data technologies by learning the conceptual implementation of Hadoop 2.x + Quartz Scheduler + Spark using Scala + Kafka + Cassandra and Zepplein. The entire program is a highly recommended for any working professional who intends to land as a successful Big Data Developer/Architect.

Course Description

The Big Data Masters Program is designed to empower working professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.

Being a Big Data Architect requires you to be a master of multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Course Objectives

By the end of the course,  you will:

  1. Understand what is Big Data, the challenges of with Big Data and how Hadoop solves the Big Data problem
  2. Understand Hadoop 2.x Architecture, Replication, Single Point of Failure, YARN
  3. Learn HDFS + YARN Commands to work with cluster.
  4. Understand how MapReduce can be used to analyze big data sets
  5. Perform Structured Data Analysis using Hive
  6. Learn different performance tuning techniques in Hive
  7. Learn Data Loading techniques using Sqoop
  8. Use Scala with an intermediate level of proficiency
  9. Use the REPL (the Scala Interactive Shell) for learning
  10. Learn Functional Programming using Scala
  11. Learn Apache Spark 2.x
  12. Use DataFrames and Structured Streaming in Spark 2.x
  13. Analyze and Visualize data using Zeeplein
  14. Learn popular NoSQL Cassandra database.

Work on a real time project on Big Data

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

Work is the target audience?

  • Software engineers and programmers who want to understand the larger Big Data ecosystem, and use it to store and analyze.
  • Project, program, or product managers who want to understand the high-level architecture and projects of Big Data.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.

R – Programming

Course description: This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

Module 01 - Getting Started with R

Learning Objectives – In this module, you will learn NumPy which is one of the fundamental package for scientific computing with Python.

Topics –

  • Introduction to R
  • Arithmetic in R
  • Variables
  • R Data Structures
    1. Vectors
    2. Lists
    3. Matrices
    4. Data Frames
    5. Arrays
  • Data Structure : Vector
    1. Introduction to Vector
    2. Subsetting Vector
  • Data Structure : Lists
    1. Introduction to Lists
    2. Subsetting Lists
  • R Matrices
    1. Introduction to R Matrices
    2. Creating a Matrix
    3. Matrix Arithmetic
    4. Matrix Operations
  • R Data Frames
    1. Introduction to R Data Frames
    2. Data Frame Indexing and Selection
  • R Flow Control
    1. R Programming if…else
    2. R ifelse() Function
    3. R Programming for loop
    4. R while Loop
    5. R break & next
    6. R repeat Loop
  • User Defined Functions in R
    1. Declaring Function
    2. Function Return value
    3. Recursive Function
    4. Switch Function
  • In-built Functions
    1. Generating sequence
    2. Sorting
    3. Column Bind : cbind()
    4. Row Bind : rbind()
    5. Merge Functions
    6. Merge using dplyr()

Quartz – Enterprise Job Scheduler

Course description: This course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.

Module - Quartz Scheduler

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

Scala Programming – Multi Paradigm : Functional + Object Oriented

Module 03 - Functional Programming

Learning Objectives – In this module, you will learn functional programming construct fundamentals and moves to first class functions and high order methods.

Topics –

  • Imperative vs Declarative Style of coding
  • Introduction to Functional Programming
  • Principles of Functional Programming
    1. Pure Functions
    2. Functions are ‘First Class Values’
  • Pure and Impure functions
  • Implementing Pure functions
  • Exploring ‘First Class’ functions
    1. Storing Functions in a variables
    2. Method returning a function
    3. Method taking function as arguments
  • Functions vs Methods
  • Converting methods to functions using eta expansion
  • Invoking functions with tuples as parameters
  • Exploring High order functions

[Capstone Project] - Spark Streaming

E-Commerce Data Analysis – [Real-time industry use case]

Use case Description :

  • E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
  • This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
  • This Dash board will be very useful for operational intelligence

Zeppelin – Web Based Notebook

Module - Getting Started

Learning Objectives – In this modules, you will learn the basic functionalities of Zeppelin and you will create your own data analysis applications or import existent Zeppelin Notebooks; additionally, you will learn advanced features of Zeppelin like creating and binding interpreters, and importing external libraries.

Topics –

  • Getting Started
    1. Installation
    2. Configuration
    3. Exploring Zeppelin
  • The Zeppelin Interpreter
  • Installing and launching Zeppelin
  • Building Use cases in Zeppelin
  • Building Dynamic forms
  • Publish your Paragraph

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Kafka – Distributed Messaging System

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 01 - Getting Started

Learning Objectives – In this modules, you will learn the basic functionalities of Zeppelin and you will create your own data analysis applications or import existent Zeppelin Notebooks; additionally, you will learn advanced features of Zeppelin like creating and binding interpreters, and importing external libraries.

Topics –

  • Introduction to Messaging System
  • Integration without Messaging System
  • What is Kafka
  • Kafka Architecture
  • Kafka Core Concepts
    1. Producer
    2. Consumer
    3. Broker
    4. Cluster
    5. Topic
    6. Partitions
    7. Offset
    8. Consumer Groups


  • Starting Zookeeper
  • Starting Kafka Server
  • Starting Kafka Topic
  • Kafka Console Producer
  • Kafka Console Consumer
  • Publishing message to a Kafka Topic

Apache NiFi – Automate Data Flow

Course description: This course will help you to learn Apache NiFi which is designed to automate the flow of data between software systems.

Module 01 - Getting Started with NiFi

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

Module 02 - Apache NiFi in depth

Learning Objectives – In this module you will understand more about Apache NiFi

Topics – 

  • NiFi templates
  • Building a NiFi Data Flow
  • Monitoring a Data Flow
  • Exploring Data Provenance

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

Process we follow for project development

We follow Agile methodology for the project development,

  1. Each batch will be divided into scrum teams of size 4-5 members.
  2. We will start with a Feature Study before implementing a project.
  3. The Feature will be broken down into User Stories and Tasks.
  4. For each user story a proper Definition Of Done will be defined
  5. A Test plan will be defined for testing the user story

Real Time Data Simulator

Project description: Creating a project which generates dynamic mock data based on the schema at a real-time, which can be further used for Real-time Processing systems like Apache Storm or Spark Streaming.

Building Complex Real time Event Processing

Project Description:

In this project, you will be building a real-time event processing system using Apache Streaming where even sub seconds also matter for analysis, while still not fast enough for ultra-low latency (picosecond or nano second) applications, such as CDR (Calling Detailed Record) from telecommunication where you can expect millisecond response times.

User Story 01 – As a developer we should simulate Real time Network Data

  1. Task 01 – Use Java Socket programming to generate and publish data to a port
  2. Task 02 – Publish the data with different scenarios

User Story 02 – As a developer we should be able to consume data using Spark Streaming

User Story 03 – As a developer we should consume Google API to convert latitude and longitude to corresponding region names.

User Story 04 –  Perform computation to calculate some important KPI’s (Key Performance Indicator) on the real time data.

More detailed split up will be shared once you start the project.

Technologies Used :

  • Java Socket Programming
  • Google API
  • Scala Programming
  • Spark Streaming

Data Model Development Kit

Project Description :

This project helps data model developer to manage Hive tables with different tables, storage types, column types and column properties required for different use case development.

Roles & Responsibility

  1. Building .xml files to define structures of hive tables to be used for storing process data generated.
  2. Actively involved in development to read .xml files, create data models and load data in hive.

Technologies Used

Java, JAXB, JDBC, Hadoop, Hive,

Sample User Stories

[Study User Story 01] – Come up with a design to represent data model required to handle the following scenarios

  • To Handle different operations like “CREATE”, “UPDATE”,”DELETE”
  • Way to define partition table
  • To Store columns in Orders
  • To Store column Name
  • To Handle Update of Column type and Name

[User Story 02] – HQL Generator – As a developer, we have to provide a functionality to create table

– [ ] . Building Maven project and adding dependency
– [ ] . Integrate Loggers
– [ ] . Code Commit
– [ ] . Create a standard package structure.
– [ ] . Utility to read xml and create Java Object
– [ ] . Utility code to communicate to Hive DB
– [ ] . Check for Hive Service before executing queries
– [ ] . Code to construct HQL query for create.
– [ ] . Exception Handling.

Definition of Done
– [ ] Package structure should be created.
– [ ] Table has to be created in Hive
– [ ] Validate all required schema is created
– [ ] Validation of Hadoop + Hive Services

**Test Cases**
1.If table already exists we need to print “Table already exists”
2.Verify schema with xml
3.If services are not up and running,it should handle and log it.


Course hours

90 hours extensive class room training30 sessions of 3 hours eachCourse Duration : 5 Months


For each module multiple Hands-on exercises, assignments and quiz are provided in the E-Learning

Real time project

We follow Agile Methodology for the project development. Each project will have Feature Study followed by User stories and Tasks.

Mock Interview

There will be a dedicated 1 to 1 interview call between you and a Big Data Architect. Experience a real Mock Interview.


We have a community forum for all our students wherein you can enrich their learning through peer interaction and knowledge sharing.


From the beginning of the course, you will be working on a project. On completion of the project NPN Training certifies you as a “Big Data Architect” based on the project.

Nov 10th

Batch: Weekend Sat & Sun
Duration: 5 Months

Enroll Now

Dec 15th

Batch: Weekend Sat & Sun
Duration: 5 Months

Enroll Now

Jsn 12th

Batch: Weekend Sat & Sun
Duration: 5 Months

Enroll Now

Batches not available

Is Java a pre-requisite to learn Big Data Masters Program?

Yes Java is a pre-requisite, there are institutes who says Java is not required all those are false information

Can I attend a demo session before enrollment?

Yes, You will be sitting in an actual live class to experience the quality of training.

How will I execute the Practicals?

We will help you to setup NPN Training’s Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

Who are the Instructor at NPN Training?

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

how do I access the eLearning content for the course?

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content

What If I miss a session?

The course validity will be one year so that you can attend the missed session in another batches.

Do I avail EMI option?

The total fees you will be paying in 2 installments

Are there any group discounts for classroom training programs?

Yes, we have group discount options for our training programs. Contact us using the form “Drop Us a Query” on the right of any page on the NPN Training website, or select the Live Chat link. Our customer service representatives will give you more details.


Anindya Banerjee

After searching extensively in internet about big data courses,I came to know about NPN training and Naveen sir…it was a tough call for me to pick the right training institute which would provide me the right blend of practical exposure with theoretical knowledge on big data and Hadoop technologies..After attending few classes I am mesmerized by Naven Sir’s way of teaching ,his command over various topics and his study materials.unlike other training institutes ,Naveen sir believes in extensive learning and he believes in hands on training which makes ‘NPN training’far apart from other institutes.I would highly recommend any one to join NPN training if he/she wants to make his career in big data technologies .

Sarbartha Paul
HCL Technologies

The best thing I liked about this institute is the way Naveen sir teaches and also is his way of taking care of individual person’s doubt and interest and his tendency to make others learn big data with complete hands-on experience. The theory he teaches is compact and crunchy enough to get a good hold of the basics.
Other thing that always keeps this institute apart is the way Naveen sir has designed it’s Big Data Architecture Program course which covers nearly everything that other institutes lack. The course materials are also very to the point.
In one word, Naveen sir’s way of teaching is a class apart!
I am greatly moved by his ideology and teaching and this is probably one of the finest institute in town as far as big data courses are concerned. It is worth joining his classroom in all aspect. Thank you for all your efforts sir!

Sai Venkata Krishna

Naveen is an excellent trainer. Naveen mainly focus on HANDS ON and REAL TIME SCENARIOS which helps one to understand the concepts easily.I feel that NPN training curriculum is best in market for Big Data.
Naveen is very honest in his approach and he delivers additional concepts which are not present in the syllabus of particular topics.E learning and assignments are very informative and helpful.The amount you pay for the Big data course is worth every penny.
Thank you NPN Training for your support and motivation.

Chat with Instructor