Big Data Masters Program |Big Data Architect Training in Bangalore

Big Data Architect Masters Program

The Big Data Architect Masters Program Training is designed to help you gain end to end coverage of Big Data technologies by learning the conceptual implementation of Hadoop + Apache Storm + Spark using Scala + Kafka + MongoDB + Zepplein and Quartz Scheduler. The entire program is a highly recommended for any working professional who intends to land as a successful Big Data Developer/Architect.

  • About the courses
  • Curriculum
  • FAQ's
  • Certification
  • Review

About the Course

The Big Data Architect Masters Program is designed to empower working professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.


Being a Big Data Architect requires you to be a master of multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.


The program includes end-to-end coverage of Big Data Technologies like

  1. Hadoop - Distributed Storage + Batch Processing - Crash Course
  2. Python Programming - Interpreted + High-Level Programming - Crash Course
  3. Scala Programming - Multi Paradigm :Functional, Object Oriented
  4. Spark - Cluster Computing Framework
  5. Kafka - Distributed Messaging System
  6. Cassandra - NoSQL Database
  7. Quartz - Enterprise Job Scheduler
  8. Zepplein - Web-based notebook for data visualization
  9. Apache Nifi - Tool to automate flow of data


At NPN Training we believe in the philosophy "Learn by doing" hence we provide complete

Hands-on training with a real-time project development.

Chat with instructor


The course includes  CCA175 Cloudera Certified Associate Spark Hadoop Developer Certification training.


Work on a real-life Industry-based project

This program comes with a portfolio of industry-relevant POC's, Use cases and project work. Unlike other institutes we don't say use cases as a project, we clearly distinguish between use case and Project.


Social Media


Technologies used:  Hadoop, Hive (HQL)

Project #1: Discovering Insights of Social bookmarking websites +

We will be using the data accumulated from websites like the front page of the web, StumbleUpon which are famous bookmarking websites and enable you to bookmark, rate, review & seek different links on these bookmarking sites (Reddit, StumbleUpon,) and so on.


Project Statement: Analyze the information or data in the Hadoop ecosystem to:


  1. Collect the information into HDFS and examine it with the assistance of MapReduce, Pig, and Hive to identify the high rated links based on client remarks, likes etc.
  2. Using MapReduce, change the semi-structured format i.e. (XML information) into a structured format and sort the client rating as positive as well as negative for all of the thousand links.
  3. Drive the result in to HDFS and after that feed it into PIG, which divides the data into two sections: Category data and Rating data.
  4. Write a special Hive Query to examine or analyze the information or data further and drive the output or result into an (RDBMS) relational database using Sqoop.




Social Media


Technologies used:  Java, Apache Storm, Google API

Project #2: Building Real time complex event processing +

Project Description:

In this project, you will be building a real-time event processing system using Apache Storm where even sub seconds also matter for analysis, while still not fast enough for ultra-low latency (picosecond or nano second) applications, such as CDR (Calling Detailed Record) from telecommunication where you can expect millisecond response times. Sometimes, you'll see such systems use HBaseApache Storm and HDFS


Project Statement: You will be creating an Apache Storm application where you process the data in real time and perform the below mentioned tasks:


  1. Create a Spout to read the real time data which is generated by the network elements.
  2. Use Google API to perform transformation such as converting latitude and longitude to region names
  3. Perform computation to calculate some important KPI's (Key Performance Indicator) on the real time data.
  4. Make use of HDFSBolt to push the data to HDFS to perform batch processing.


Telecom Industry


Technologies used:  Java, Hadoop, Hive (HQL), Quartz scheduler

Project #3: Analysis of Call Detail Record +

Telecom service providers like Huawei, Ericson collect usage transaction, network performance data, cell-site data, device information and other information spread across the network which can be used for analysis.


Project description: You will be given a CDR (Call Detail Record) which is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transaction (e.g., text message) that passes through that facility or device.


Technologies used :  Java, Hadoop, Hive (HQL), Quartz scheduler



Retail industry


Technologies used:  Hadoop, Hive QL

Project #4: Customer complaint analysis about their products  +
Project description: Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc. Analyze the data in the Hadoop ecosystem to:
  1. Get the number of complaints filed under each product
  2. Get the total number of complaints filed from a particular location
  3. Get the list of complaints grouped by location which has no timely response

Technologies used:  Java, Hadoop, Hive (HQL), Quartz Scheduler



Education Industry


Technologies used:  Java, Hadoop, Hive (HQL), Quartz Scheduler

Project #5: Scholastic Assessment Analysis +

Project description: The data set is SAT (College Board) 2010 School Level Results which gives you the information about how the students perform in the tests from different schools. It consists of the below fields. DBN, School Name, Number of Test Takers, Critical Reading Mean, Mathematics Mean, Writing Mean Here DBN will be the unique field for this data set. The students were given a test. Based on the results of the test. Here we are trying to analyze this data and below are the few problem statements that we have chosen:

  1. Find the total number of test takers.
  2. Find the highest mean/average of the Critical Reading section and the school name.
  3. Find the highest mean/average of the Mathematics section and the school name.
  4. Find the highest mean/average of the Writing section and the school name.


Common across Industry


Technologies used:  Python

Project #6: Smart Data Generator +

Project description: Creating a project which generates dynamic mock data based on the schema at a real-time, which can be further used for Real-time Processing systems like Apache Storm or Spark Streaming.


Banking and Finance Industry


Technologies used:  Java, Apache Storm

Project #7: Bank Credit Card Authorization Using Storm +

Authorization hold (also card authorization, preauthorization) is the practice within the banking industry of verifying electronic transactions initiated with a debit card or credit card and holding this balance as unavailable until either the merchant clears the transaction, also called a settlement, or the hold "falls off."



Hadoop 2.x - Distributed Storage + Batch Processing


Course description: This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

Module 01 - Understanding Big Data & Hadoop 2.x +
Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

Topics -
  • Understanding what is Big Data
  • Combined storage + computation layer
  • Bussiness Usecase - Telecom
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Introduction to Hadoop
  • Exploring Hadoop 2.x Core Components
  • Understanding Hadoop 2.x Daemon Services
    1. NameNode
    2. DataNode
    3. Secondary NameNode
    4. ResourceManager
    5. NodeManager
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read and File Write
  • Understanding HDFS Federation
  • Understanding High Availablity Feature in Hadoop 2.x
  • Exploring Big Data ecosystem

View Module Presentation

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies


Module 02 - Exploring Administration + File System + YARN Commands +
Learning Objectives - In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics -
  • Analyzing ResourceManager and NameNode UI
  • Exploring HDFS File System Commands - [Hands-on]
  • Exploring Hadoop Admin Commands - [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program - [Hands-on]
  • Killing Job
  • Data Loading in Hadoop - [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS - [Hands-on]
  • Executing MapReduce Jobs
  • Different techniques to move data to HDFS - [Hands-on]
  • Backup and Recovery of Hadoop cluster - [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. - [Activity]
  • Understanding Hadoop Safe Mode - Maintenance state of NameNodeKey/value pairs - [Hands-on]
  • Configuring Trash in HDFS - [POC]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies


Module 03 - MapReduce Programming +
Learning Objectives - In this module, you will understand how MapReduce framework works.

Topics -
  • Understanding different phases of MapReduce programs
  • Understanding Key/Value pair
  • Flow of Operations in MapReduce
  • Hadoop Data Types
  • Writing MapReduce programs using Java - [Hands-on]
    1. Creating Mapper class
    2. Creating Reducer class
    3. Creating Driver program
  • Deploying MapReduce programs in the cluster
  • [Performance Improvement] - Understanding and Implementing Combiner
  • Exploring HashPartitioner
  • Understanding and implementing Partitioner.

View Module Presentation


Module 04 - Hive and Hive QL +
Learning Objectives -  In this module you will learn Hive and its similarity with SQL,Understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive.

Topics -
  • A Walkthrough of Hive Architecture
  • Understanding Hive Query Patterns
  • Internal vs External tables
  • Different ways to describe Hive tables
  • [Use case] - Disscussing where to use which types of table.
  • Different ways to load data into Hive tables - [Activity]
    1. Loading data from Local File System to hive Tables.
    2. Loading data from HDFS to Hive Tables.
  • Exploring Hive Complex Data types. - [Hands-on]
    1. Arrays
    2. Maps
    3. Structs
  • Exploring Hive built-in Functions.

For more assignments check E-Learning


Module 05 - Performance Tuning of Hive +
Learning Objectives - In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics -
  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] - Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] - Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
  •     TextFile Format - [Activity]
  •     SequenceFile Format - [Activity]
  •     RC File Format - [Activity]
  •     ORC Files in Hive - [Activity]
  • Using different file formats and capturing Performance reports - [POC]
  • Map-side join - [Hands-on]
  • Reduce-side join - [Hands-on]
  • [Use case] - Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join - [Hands-on]
  • Writing custom UDF - [Hands-on]
  • Accessing Hive with JDBC - [Hands-on]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 06 - Sqoop +
Learning Objectives - In this module you will learn how to Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations.

Topics -

Sqoop Overview

How does Sqoop work

Sqoop JDBC Driver and Connectors

Sqoop Importing Data

Various Options to Import Data

    Table Import

    Binary Data Import

    SpeedUp the Import

    Filtering Import

    Full Database Import

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies
Understanding & Building Data pipeline Architecture using Pig and Hive +

Project Description - We will use the the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) data who tracks the ontime performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights appear in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.


Architect diagram - 

Module - Quartz Scheduler +
Learning Objectives - In this module you will understand about quartz job scheduler

Topics -
  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler


Python Programming - General Purpose programming


Course description: The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream in real-time.

Module 07 - Python Language Fundamentals +
Learning Objectives - In this module you'll learn all the fundamentals required to quick start Python language.

Topics -
  • Introduction to Python
  • Installing Python in Windows using PyCharm
  • Exploring Datatypes
    1. Numbers
    2. Strings
    3. Booleans
  • Understanding Python Indentation
  • Introduction to Apache Storm
  • Exploring Decision Statements
    1. The if Statement
    2. The if-else statement
    3. The if-elif-else statement
  • Exploring Looping
    1. The while loop
    2. The for loop
    3. Using range() in for loops

 For more assignments check E-Learning


Module 08 - Collections +
Learning Objectives - In this module, you will learn different stream grouping available in Apache Storm.

Topics - 
  • Understanding Python Collections
  • Lists 
    1. Creating Lists
    2. Accessing List Elements
    3. Iterating through list elements
    4. Searching elements withing a lists
    5. List slices
    6. Adding and deleting elements
    7. Adding, Multiplying and Copying Lists
  • Tuples
    1. Creating tuples
    2. Accessing tuple elements
    3. Counting tuple element 
    4. Iterating through tuple elements
    5. Searching elements withing tuples
    6. Tuple slices
    7. Adding, Multiplying and Copying tuples
  • Sets
    1. Creating Sets
    2. Accessing Set elements
    3. Counting Set elements
    4. Iterating through Set elements
    5. Adding and Deleting elements
    6. Set Operations (Set, Union, Set difference)
  • Dictionaries
    1. Creating Dictionaries
    2. Accessing Dictionary elements
    3. Iterating through Dictionary elements
    4. Searching elements within Dictionary
    5. Adding and Deleting elements
For more assignments check E-Learning


Scala Programming - Multi Paradigm :Functional, Object Oriented


Module 09 - Introduction to Scala for Apache Spark +
Learning Objectives - In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

Topics -
  • Introduction to Scala REPL
  • Basic Scala operations
  • Exploring different Variable Types
    1. Mutable Variables - [Hands-on]
    2. Immutable Variables - [Hands-on]
  • Type Inference in Scala - [Hands-on]
  • Block Expressions
  • Exploring Lazy evaluation in Scala
  • Control Structures in Scala
  • Exploring different variants of for loop
    1. Enhanced for loop. - [Hands-on]
    2. For loop with yield. - [Hands-on]
    3. For Loop with if conditions : Pattern Guards - [Hands-on]
  • Match Expressions - [Hands-on]
  • Exploring Functions in Scala
  • Exploring Procedures in Scala

Collections in Scala

    i.    Array

    ii.   ArrayBuffer

    iii.   Map

    iv.   Tuples

     v.   Lists

For more assignments check E-Learning
Module 10 - Object-Oriented Programming in Scala +
Learning Objectives - In this module, you will learn Object Oriented Programming in Scala.

Topics -
  • Class in Scala
  • Getters and Setters
  • Exploring Constructors
    1. Auxilliary
    2. Primary
  • Exploring Case classes
  • Singletons
  • Exploring Companion Objects
  • Inheritance in Scala
  • Exploring Traits
    1. Traits as Interfaces
    2. Layered Traits
For more assignments check E-Learning
Module 11 - Functional Programming using Scala +
Learning Objectives - In this module, you will learn Object Oriented Programming in Scala.

Topics -
  • Imperative VS Declarative Style
  • Introduction to Funtional Programming
  • Principles of Funtional Programming
    1. Pure Functions
    2. Functions are First Class values
  • Exploring 'First Class Functions'
    1. Storing Functions in a variables
    2. Method returning a function
    3. Methods returning a function
  • Functions VS Methods
  • Converting methods to functions using eta expansion
  • Invoking Functions with Tuples as Parameters
  • Exploring High Order Functions


Apache Spark - Cluster Computing Framework


Module 12 -  Overview of Apache Spark +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

Overview of Apache Spark

Features of Apache Spark

Exploring Data sharing in MapReduce

Exploring Data sharing in Apache Spark

Spark Eco system

Introduction to RDD

Exploring Properties of RDD

    i.    Immutable

    ii.   Lazy evaluated

    iii.  Cacheable

    iv.  Type Inferred

Understanding Partitions in Spark

Characteristics of Partitions in Spark

Spark Architecture

Spark Modes

Hadoop VS Spark

Eco system of Hadoop VS Spark 

Module 13 - Spark Common Operations +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -
  • Installing Apache Spark on Windows - [Hands-on Activity]
  • Starting Spark Shell
  • Exploring different ways to start Spark
  • Understanding Spark UI - [Activity]
  • RDD Creations - [Hands-on]
    1. Loading a file
    2. Parallelize Collections
  • Exploring RDD Operations
    1. Transformations
    2. Actions
  • Rdd Actions - [Hands-on]
    1. count()
    2. first()
    3. take(int)
    4. saveAsTextFile(path:String)
    5. reduce(func)
    6. collect(func)
  • RDD Transformations - [Hands-on]
    1. map(func)
    2. foreach(func)
    3. filter(func)
    4. coalesce(int)
  • Passing functions to Spark High Order functions
    1. Anonymous function - [Industry Practices]
    2. Passing Named function - [Industry Practices]
    3. Static singleton function - [Industry Practices]
  • [Use case] - Analyzing Movie lens dataset and performing Actions and Transformation
  • Chaining Transformation and Actions in Spark
  • Running Spark programs in eclipse using Maven - [Industry Practices]
  • Understanding and initializing SparkSession  i.e Spark 2.0 entry point
  • [Performance Improvement] - Understanding Spark Caching
  • Loading RDD
    1. textFile
    2. wholeTextFiles
  • Saving RDD


For more assignments check E-Learning
Module 14 - Playing with RDD's  +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

RDD Caching and Persistance

reduce() vs fold()

Scala RDD Extensions

    i.    DoubleRDDFunctions

    ii.   PairRDDFunctions

    iii.   OrderedRDDFunctions

    iv.  SequenceFileRDDFunctions

Exploring Aggregate Functions

groupByKey function

reduceByKey function

For more assignments check E-Learning
Module 15 - Spark SQL - DataFrames+DataSet +
Learning Objectives - In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about data-frames and datasets in Spark SQL and perform SQL operations on data-frames.

Topics -
  • Understanding different abstractions in Spark
  • Introduction to Spark SQL
  • Featuer of Spark SQL
  • Overview of DataFrames
  • Understanding org.apache.spark.sql.DataFrameReader class
  • Instantiating org.apache.spark.sql.DataFrameReader using SparkSession - [Hands-on]
  • Creating a DataFrame from JSON file - [Hands-on]
  • Creating a DataFrame from CSVfile - [Hands-on]
  • Understanding different options while reading CSV
  • Creating a custom schema and querying
  • Understanding DataFrame explain() function [Industry practices]
  • Registering DataFrame as a Table
  • Operations supported by DataFrames
  • Converting RDD to DataFrame
  • [Use case] Analysing Employee dataset
  • Exploring Pivots - [Industry practices]
  • Join Operations in DataFrame - [Hands-on]

E-Commerce Data Analysis - [Real-time industry use case]

Use case Description :

Given the E-commerce data set of customer information of an organization and item dataset we have to perform below mentioned analysis.

1. Filter out the entire customer belonging to group 400.
2. Find the number of the customer from each country and find the total item price of all the purchased items from each country.


For more assignments check E-Learning
Module 16 - Spark Streaming +
Learning Objectives - In this module, you will learn Object Oriented Programming in Scala.

Topics -
  • Understanding Data Streaming
  • Overview of Spark Streaming
  • Spark Streaming Use cases
  • Working of Spark Streaming
  • Understanding DStreams / Discretized Streams 
    1. Transformations
    2. Output operations
  • Spark Streaming -  Entry point of Spark applications
  • Input Streams / Receiver
  • Real Time Streaming using socketTextStream - [Hands-on]
    Exploring special transformations on DStream API
    • Window Operations
    • UpdateStateByKey
  • Window Operations
  • Block Interval

E-Commerce Data Analysis - [Real-time industry use case]

Use case Description :

  • E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
  • This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
  • This Dash board will be very useful for operational intelligence


Apache Kafka - Distributed Messaging System


Course description: Today's application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 01 - Understanding Apache Kafka +
Learning Objectives - In this module, you will understand Big Data, Kafka and Kafka Architecture.

Topics -
  • Integration without Messaging System
  • Integration with Messaging System
  • What is Kafka
  • Download and install Kafka
  • Components of Messaging System
  • Exploring Kafka components
    1. Producer
    2. Consumer
    3. Broker
    4. Cluster
    5. Topic
    6. Partitions
    7. Offset
    8. Consumer groups
  • Installing Kafka
  • Kafka concepts - [Hands-on]
    1. Starting Zookeeper
    2. Starting Kafka Server
    3. Creating a topic
    4. Start a console producer
    5. Start a console consumer
    6. Sending and receiving messages


Module 02 - Exploring Kafka Producer,Consumer API +
Learning Objectives - In this module, you will understand Big Data, Kafka and Kafka Architecture.

Topics -
  • Adding Kafka dependency to Maven - [Activity]
  • Exploring Kafka core API's
  • Exploring Kafka producer API
  • Sending events to Kafka - Producer API - [Hands-on]
  • Exploring Kafka consumer API
  • Reading events from Kafka - Consumer API - [Hands-on]
  • Consumer Pool Loop - Offset Management
  • Changing the configuration of a topic - [Hands-on]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies



MongoDB - NoSQL Database


Module 01 - MongoDB Getting Started +
Learning Objectives -  In this module, you will get an understanding of basics of MongoDB.

Topics -
  • What is MongoDB
  • Difference between MongoDB and RDBMS
  • Installing MogoDB in windows - [Activity]
  • Configuring MongoDB server with configuration file - [Activity]
  • Creating First Database - [Hands-on]
  • Creating Document and saving it to collection - [Hands-on]
  • Dropping a database - [Hands-on]
  • Creating a Collection - Using db.createCollection(name,options) - [Hands-on]
  • Understanding Capped Collections - [Industry Practices]
  • Dropping a Collection - [Hands-on]

For more Assignments + Use cases + Project work + Materials check E-Learning

Module 02 - MongoDB CRUD Operations - Create, Read, Update and Delete +
Learning Objectives -  In this module, you will perform CRUD operations.

Topics -
  • reading/Inserting a document in collection using javascript file - [Hands-on]
  • Inserting Array of Documents - [Hands-on]
  • Reading a Document - Querying - [Hands-on]
  • Reading a Document with $lt, $gt operator - [Hands-on]
  • Other Query Operators - [Hands-on]
  • Updating Documents - [Hands-on]
  • Deleting documents - [Hands-on]

For more Assignments + Use cases + Project work + Materials check E-Learning

Module 03 - Connecting MongoDB with Java +
Learning Objectives -  In this module, you will connect to MongoDB using Java.

Topics -
  • Creating Maven Project & Adding dependencies for MongoDB-Java Driver - [Activity]
  • Connecting to MongoDB server - [Hands-on]
  • Displaying all databases - [Hands-on]
  • Creating a datbase and collection - [Hands-on]
  • Reading/Inserting a document in collection using Java - [Hands-on]

For more Assignments + Use cases + Project work + Materials check E-Learning

Apache NiFi - Automate Data Flow


Module - Apache NiFi +
Learning Objectives -  In this module, you will will learn to automate the flow of data between software systems.

Topics -
  • Introduction to Apache NiFi
  • What is Apache NiFi?
  • The core concepts of NiFi
  • NiFi Architecture
  • Concepts of Flow file
  • Concepts of Processor
  • Anatomy of a Processor
  • Types of Processors
  • NiFi Templates
  • NiFi User Interface
Hands-on -
  • Building a Fault Tolerant DataPipeline
  • Monitoring of Data Flow - [Hands-on]
  • Exploring Data Provenance - [Hands-on]

How will I execute the Practicals?

We will help you to setup NPN Training's Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

Is Java a pre-requisite to learn Big Data and Hadoop?

Yes, you definitely can. We will provide you the Video Tutorial for Java. You can start immediately and before the Java is introduced in the course from the third week (Map-Reduce), you would have enough time already to clear your concepts in Java.

Earn your certificate!

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.


Certificate review

Once you are successfully through the project (Reviewed by a NPN Training expert), you will be awarded with NPN Training's Big Data and Hadoop certificate.


Certificate verification

Each Certificate will be given a unique Certification ID which can be validated through Verify Certificate link

Shreyas Gowda

Best Institute to learn big data architect course, Naveen follow more of a hands on problem solving approach for programming and also guide us to solve many use cases, which helps us understand the concepts well. I would definitely recommend NPN training!

Surabhi KS

I have opted for Big data Architecture course in this institute. I had enquired in many other institutes before and one of my friend referred me here. After first few classes I was very much convinced that I have made a right decision by joining here. The materials and assignments given are very helpful. He summarizes all the topics covered at the end and beginning of each class which will be very helpful for us to remember. I would definitely recommend this institute to my friends. It's totally worth the amount we pay.

Chat with instructor

+91 8095918383 | +91 9535584691

Upcoming batches



Big Data Architect Training

- (Weekend batch)
Fees 30,000 INR



Big Data Architect Training

- (Weekend batch)
Fees 30,000 INR



Big Data Architect Training

(Weekend batch)
Fees 30,000 INR

Course Features

Big Data Architect Masters Program Training
4.8 stars - based on 150 reviews