info@npntraining.com   +91 8095918383 / +91 9535584691

Data Science Training in Bangalore

The Big Data Architect Masters Program Training is designed to help you gain end to end coverage of Big Data technologies by learning the conceptual implementation of Hadoop 2.x + Quartz Scheduler + Spark using Scala + Kafka + Cassandra and Zepplein. The entire program is a highly recommended for any working professional who intends to land as a successful Big Data Developer/Architect.

Course Description

The Big Data Masters Program is designed to empower working professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.

Being a Big Data Architect requires you to be a master of multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Course Objectives

By the end of the course,  you will:

  1. Understand what is Big Data, the challenges of with Big Data and how Hadoop solves the Big Data problem
  2. Understand Hadoop 2.x Architecture, Replication, Single Point of Failure, YARN
  3. Learn HDFS + YARN Commands to work with cluster.
  4. Understand how MapReduce can be used to analyze big data sets
  5. Perform Structured Data Analysis using Hive
  6. Learn different performance tuning techniques in Hive
  7. Learn Data Loading techniques using Sqoop
  8. Use Scala with an intermediate level of proficiency
  9. Use the REPL (the Scala Interactive Shell) for learning
  10. Learn Functional Programming using Scala
  11. Learn Apache Spark 2.x
  12. Use DataFrames and Structured Streaming in Spark 2.x
  13. Analyze and Visualize data using Zeeplein
  14. Learn popular NoSQL Cassandra database.

Work on a real time project on Big Data

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

Work is the target audience?

  • Software engineers and programmers who want to understand the larger Big Data ecosystem, and use it to store and analyze.
  • Project, program, or product managers who want to understand the high-level architecture and projects of Big Data.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.

Hadoop 2.x – Distributed Storage + Batch Processing

Course description: This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

Module 01 - Understanding Big Data & Hadoop 2.x

Learning Objectives – In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

Topics –

  • Introduction to Big Data
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Bussiness Usecase – Telecom
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Introduction to Hadoop
  • Exploring Hadoop 2.x Core Components
  • Hadoop 2.x Daemon Services
    1. NameNode
    2. DataNode
    3. Secondary NameNode
    4. ResourceManager
    5. NodeManager
  • HDFS Architecture
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read and File Write
  • YARN
  • Understanding HDFS Federation
  • Understanding High Availability Feature in Hadoop 2.x
  • Exploring Big Data ecosystem

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Module 02 - Exploring Administration + File System + YARN Commands

Learning Objectives – In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics –

  • Hadoop Distribution Modes
    1. Local Mode / Standalone Mode
    2. Pesudo Distribution Mode
    3. Fully Distribution Mode
  • Starting and Stopping HDFS & YARN daemons
  • Exploring HDFS File System Commands – [Hands-on]
  • Data Loading in Hadoop – [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS – [Hands-on]
  • Exploring Hadoop Admin Commands – [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program – [Hands-on]
  • Executing MapReduce Jobs
  • Killing Job
  • Backup and Recovery of Hadoop cluster – [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. – [Activity]
  • Understanding Hadoop Safe Mode – Maintenance state of NameNode – 
  • Configuring Trash in HDFS – [POC]

Configuring HDFS Trash – [Proof of Concept]

Problem Statement :

Accident deletion of files or directories from HDFS is completely lost.

Solution:

This POC helps one to configure HDFS trash which helps to prevent accidental deletion of files and directories. When you delete a file in HDFS, the file is not immediately expelled from HDFS. Deleted files are first moved to the /user/<username>/.Trash/Current directory, with their original filesystem path being preserved. After a user-configurable period of time (fs.trash.interval), a process known as trash checkpointing renames the Current directory to the current timestamp, that is, /user/<username>/.Trash/<timestamp>. The checkpointing process also checks the rest of the .Trash directory for any existing timestamp directories and removes them from HDFS permanently. You can restore files and directories in the trash simply by moving them to a location outside the .Trash directory.

Module 03 - Map Reduce Programming

Learning Objectives – In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics –

  • Hadoop Distribution Modes
    1. Local Mode / Standalone Mode
    2. Pesudo Distribution Mode
    3. Fully Distribution Mode
  • Starting and Stopping HDFS & YARN daemons
  • Exploring HDFS File System Commands – [Hands-on]
  • Data Loading in Hadoop – [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS – [Hands-on]
  • Exploring Hadoop Admin Commands – [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program – [Hands-on]
  • Executing MapReduce Jobs
  • Killing Job
  • Backup and Recovery of Hadoop cluster – [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. – [Activity]
  • Understanding Hadoop Safe Mode – Maintenance state of NameNode – 
  • Configuring Trash in HDFS – [POC]

Configuring HDFS Trash – [Proof of Concept]

Problem Statement :

Accident deletion of files or directories from HDFS is completely lost.

Solution:

This POC helps one to configure HDFS trash which helps to prevent accidental deletion of files and directories. When you delete a file in HDFS, the file is not immediately expelled from HDFS. Deleted files are first moved to the /user/<username>/.Trash/Current directory, with their original filesystem path being preserved. After a user-configurable period of time (fs.trash.interval), a process known as trash checkpointing renames the Current directory to the current timestamp, that is, /user/<username>/.Trash/<timestamp>. The checkpointing process also checks the rest of the .Trash directory for any existing timestamp directories and removes them from HDFS permanently. You can restore files and directories in the trash simply by moving them to a location outside the .Trash directory.

Module 04 - Hive & HiveQL

Learning Objectives – In this module you will learn Hive and its similarity with SQL,Understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive.

Topics –

  • Walk through of Hive Architecture
  • Understanding Hive Query Patterns
  • Hive Data warehouse directory
  • Database Creation
  • Switching to Database
  • Table Creation
  • Different ways to describe Hive tables
  • Different ways to load data into Hive tables – [Activity]
    1. Loading data from Local File System to hive Tables.
    2. Loading data from HDFS to Hive Tables.
  • Hive Table Types
    1. Internal vs
    2. External tables
  • [Use case] – Discussing where to use which types of table.
  • Exploring Hive Complex Data types. – [Hands-on]
    1. Arrays
    2. Maps
    3. Structs
  • Exploring Hive built-in Functions.

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 05 - Hive Optimization

Learning Objectives – In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics –

  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Understanding how select statement works in Hive
  • Partitioning
  • Dynamic Partitioning
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format
    2. SequenceFile Format
    3. RC File Format
    4. ORC Files in Hive
  • Using different file formats and capturing Performance reports
  • Map Side Join
  • Reduce-side join
  • [Use case] – Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join
  • Writing custom UDF
  • Starting Hive Thrift Server
  • Accessing Hive with JDBC

For more Assignments + Use cases + Project work + Materials can be found in E-Learning

Module 06 - Sqoop

Learning Objectives – In this module you will learn how to Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations.

Topics –

  • Sqoop Overview
  • Sqoop JDBC Driver and Connectors
  • Sqoop Importing Data
  • Various Options to Import Data
  • Understanding Sqoop Jobs
    1. Table Import
    2. Filtering Import
  • Incremental Imports using Sqoop

 

[Capstone Project] - Hadoop

Configuring HDFS Trash – [Proof of Concept]

Problem Statement :

Accident deletion of files or directories from HDFS is completely lost.

Solution:

This POC helps one to configure HDFS trash which helps to prevent accidental deletion of files and directories. When you delete a file in HDFS, the file is not immediately expelled from HDFS. Deleted files are first moved to the /user/<username>/.Trash/Current directory, with their original filesystem path being preserved. After a user-configurable period of time (fs.trash.interval), a process known as trash checkpointing renames the Current directory to the current timestamp, that is, /user/<username>/.Trash/<timestamp>. The checkpointing process also checks the rest of the .Trash directory for any existing timestamp directories and removes them from HDFS permanently. You can restore files and directories in the trash simply by moving them to a location outside the .Trash directory.

Quartz – Enterprise Job Scheduler

Course description: This course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.

Module - Quartz Scheduler

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

Scala Programming – Multi Paradigm : Functional + Object Oriented

Course description: This course will help you to learn one of the most powerful, expressive language built on top of the JVM. This means that it can interoperate and take advantage of many existing Java APIs. It is also considered as Object Oriented Programming + Functional Programming Language. This course will teach enough of the basics of Scala to enable you to start writing less boiler-plate code and focus more on business problems. It will get you started from the ground up and quickly familiarize you with some of the most powerful features of this modern language. Topics covered include the REPL, pattern matching, for comprehensions, recursion, (im)mutability, interoperability, and much more.

Module 01 - Introduction to Scala for Apache Spark

Learning Objectives –  In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

Topics –

  • Introduction to Scala Programming
  •   Exploring Scala REPL
  •   Basic Scala Operations
  •   Exploring different variable types
    1. Mutable Variables
    2. Immutable Variables
  • Type Inference
  • Block Expression
  • Lazy evaluation
  • Control structures
  • Exploring different variants of for loop
    1. Enhanced for loop
    2. For loop with yield
    3. For loop with if conditions – Pattern Guards
  • Match Expressions

Module 02- Object Oriented Programming

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • Structure of Classes
  • Scala Classes vs Java Classes
  • Getters & Setters
  • Constructors
    1. Auxilliary Constructors
    2. Primary Constructors
  • Case Classes
  • Companion Objects
  • Creating Objects using Apply
  • Traits
    1. Traits as Interfaces
    2. Layered Traits

Module 03 - Functional Programming

Learning Objectives – In this module, you will learn functional programming construct fundamentals and moves to first class functions and high order methods.

Topics –

  • Imperative vs Declarative Style of coding
  • Introduction to Functional Programming
  • Principles of Functional Programming
    1. Pure Functions
    2. Functions are ‘First Class Values’
  • Pure and Impure functions
  • Implementing Pure functions
  • Exploring ‘First Class’ functions
    1. Storing Functions in a variables
    2. Method returning a function
    3. Method taking function as arguments
  • Functions vs Methods
  • Converting methods to functions using eta expansion
  • Invoking functions with tuples as parameters
  • Exploring High order functions

Apache Spark 2.x – Cluster Computing Framework

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 01 - Overview of Apache Spark

Learning Objectives – In this module, you will learn cluster computing framework and learn about spark architecture in comparison with Hadoop Eco-system.

Topics –

  • Introduction to Spark
  • Features of Spark
  • Flow of iterative operations on MapReduce vs Spark
  • Exploring Spark Eco-system
  • Spark Architecture
  • Spark Components
  • RDD’s Introduction
  • Partitions in Spark
  • Characteristics of Partitions
  • Data Sharing in Spark
  • Deeper look of Spark RDD’s with partitions
  • RDD Graphs
    1. Dataset Level View
    2. Partition Level View
  • Lineage Graphs
  • Lineage Graphs dependencies
    1. Narrow dependency
    2. Wide dependency
  • Spark stages
  • Hadoop Eco-system vs Spark Eco-system

Module 02 - Learning Spark Core : RDD's

Learning Objectives – In this module, you will learn one of the fundamental building blocks of Spark – RDDs and related manipulations for implementing business logics (Transformations, Actions and Functions performed on RDD).

Topics –

  • Setting up Spark Development environment on Windows
  • Launching Spark Shell
  • Implicit Objects in Spark Shell
  • SparkSession i.e Spark 2.x entry point
  • Understanding Spark UI
  • RDD Creations
    1. Creating RDD with external files
    2. Creating RDD with existing collections
  • RDD Operations
    1. Transformations
    2. Actions
  • RDD Actions
    1. count()
    2. first()
    3. take(int)
    4. saveAsTextFile(path: String)
    5. reduce(func)
    6. collect(func)
  • RDD Transformations
    1. map(func)
    2. foreach(func)
    3. filter(func)
    4. coalesce(func)
  • Passing functions to Spark High Order Functions
    1. Anonymous function
    2. Passing Named function
    3. Static singleton function
  • Chaining Transformations and Actions in Spark
  • Building Spark Project using Maven
  • RDD Caching and Persistence

Module 03 - Deploying Spark in Production

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

Module 04 - Going Deeper into Spark Core

Learning Objectives – In this module, you will learn advance RDD’s

Topics –

  • RDD Extensions
    1. DoubleRDD
    2. PairRDD
    3. CoGroupedRDD
  • Aggregate functions
  • groupByKey function
  • reduceyByKey function

Module 05 - Spark SQL - DataFrame + DataSet API

Learning Objectives – In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about data-frames and datasets in Spark SQL and perform SQL operations on data-frames.

Topics –

  • Abstractions in Spark 2.x
  • Introduction to SparkSQL
  • Features of Spark SQL
  • Overview of DataFrames
  • Understanding org.apache.spark.sql.DataFrameReader class
  • Instantiating org.apache.spark.sql.DataFrameReader using SparkSession – [Hands-on]
  • Creating a DataFrame from JSON file – [Hands-on]
  • Creating a DataFrame from CSV file – [Hands-on]
  • Understanding and using inferschema
  • Creating a custom schema and querying
  • Understanding DataFrame explain() function
  • Registering DataFrame as a Table
  • Operations supported by DataFrames
  • Converting RDD to DataFrame
  • [Use case] Analysing Employee dataset
  • Exploring Pivots
  • Join Operations in DataFrame

Module 06 - Spark Streaming

Learning Objectives – In this module, you will learn Spark streaming which is fault-tolerant streaming applications. You will learn about DStreams and various Transformations performed on it. You will get to know about main streaming operators, Sliding Window Operators and Stateful Operators.

Topics –

  • Understanding Data Streaming
  • Overview of Spark Streaming
  • Spark Streaming Use cases
  • Working of Spark Streaming
  • Understanding DStreams / Discretized Streams
    1. Transformations
    2. Output operations
  • Spark Streaming –  Entry point of Spark applications
  • Input Streams / Receiver
  • Real Time Streaming using socketTextStream – [Hands-on]
  • Exploring special transformations on DStream API
    1. Window Operations
    2. UpdateStateByKey
  • Window Operations
  • Block Interval

E-Commerce Data Analysis – [Real-time industry use case]

Use case Description :

  • E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
  • This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
  • This Dash board will be very useful for operational intelligence

Module 07 - Structured Streaming

Learning Objectives – In this module, you will learn Structured Streaming which is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine.

Topics –

  • Drawbacks of DStream API
  • Understanding Structured Streaming
  • Input Source
  • Ouput Modes
  • Handling Event-time and Late Data
  • Window Operations on Event Time

[Capstone Project] - Spark Streaming

E-Commerce Data Analysis – [Real-time industry use case]

Use case Description :

  • E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
  • This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
  • This Dash board will be very useful for operational intelligence

Zeppelin – Web Based Notebook

Module - Getting Started

Learning Objectives – In this modules, you will learn the basic functionalities of Zeppelin and you will create your own data analysis applications or import existent Zeppelin Notebooks; additionally, you will learn advanced features of Zeppelin like creating and binding interpreters, and importing external libraries.

Topics –

  • Getting Started
    1. Installation
    2. Configuration
    3. Exploring Zeppelin
  • The Zeppelin Interpreter
  • Installing and launching Zeppelin
  • Building Use cases in Zeppelin
  • Building Dynamic forms
  • Publish your Paragraph

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Kafka – Distributed Messaging System

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 01 - Getting Started

Learning Objectives – In this modules, you will learn the basic functionalities of Zeppelin and you will create your own data analysis applications or import existent Zeppelin Notebooks; additionally, you will learn advanced features of Zeppelin like creating and binding interpreters, and importing external libraries.

Topics –

  • Introduction to Messaging System
  • Integration without Messaging System
  • What is Kafka
  • Kafka Architecture
  • Kafka Core Concepts
    1. Producer
    2. Consumer
    3. Broker
    4. Cluster
    5. Topic
    6. Partitions
    7. Offset
    8. Consumer Groups

Hands-on-

  • Starting Zookeeper
  • Starting Kafka Server
  • Starting Kafka Topic
  • Kafka Console Producer
  • Kafka Console Consumer
  • Publishing message to a Kafka Topic

Module 02 - Kafka Producer + Consumer API

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • Kafka Core API’s
  • Creating Maven project
  • Exploring Producer API
    1. Exploring org.apache.kafka.clients.producer.ProducerConfig
    2. Exploring org.apache.kafka.clients.producer.KafkaProducer
    3. Exploring org.apache.kafka.clients.producer.ProducerRecord
  • Exploring Consumer API
    1. Exploring org.apache.kafka.clients.consumer.ConsumerConfig
    2. Exploring org.apache.kafka.clients.consumer.KafkaConsumer
    3. Exploring org.apache.kafka.clients.consumer.ConsumerRecord
  • Producer Partitioning Mechanism
    1. Direct Partitioning
    2. Round Robin
    3. Key Hashing
  • Consumer Group

Apache NiFi – Automate Data Flow

Course description: This course will help you to learn Apache NiFi which is designed to automate the flow of data between software systems.

Module 01 - Getting Started with NiFi

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

Module 02 - Apache NiFi in depth

Learning Objectives – In this module you will understand more about Apache NiFi

Topics – 

  • NiFi templates
  • Building a NiFi Data Flow
  • Monitoring a Data Flow
  • Exploring Data Provenance

Cassandra – NoSQL Database

Course description: Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 01 - Getting started with Cassandra

Learning Objectives – In this module you will understand about quartz job scheduler

Topics –

  • Kafka Core API’s
  • Creating Maven project
  • Exploring Producer API
    1. Exploring org.apache.kafka.clients.producer.ProducerConfig
    2. Exploring org.apache.kafka.clients.producer.KafkaProducer
    3. Exploring org.apache.kafka.clients.producer.ProducerRecord
  • Exploring Consumer API
    1. Exploring org.apache.kafka.clients.consumer.ConsumerConfig
    2. Exploring org.apache.kafka.clients.consumer.KafkaConsumer
    3. Exploring org.apache.kafka.clients.consumer.ConsumerRecord
  • Producer Partitioning Mechanism
    1. Direct Partitioning
    2. Round Robin
    3. Key Hashing
  • Consumer Group

Module 02 - CRUD Operations

Learning Objectives – In this module, you will learn what is Cassandra and its uses.

Topics –

  • Creating a Database
    1. Understanding a Cassandra Database
    2. Defining a Keyspace
    3. Deleting a Keyspace
  • Table Creation
    1. Creating a Table
    2. Defining columns and Datatypes
    3. Defining a primary key
    4. Specifying a descending clustering order
  • Data Insertions
    1. Different ways to write a data
    2. Insert into command
    3. Using the COPY command
  • Modeling Data
    1. Understanding Data Modeling in Cassandra
    2. Using a where clause
    3. Understanding a Secondary Index
    4. Defining a Composite Partition Key

Module 02 - Creating an Application

Learning Objectives – In this module, you will learn how to perform CRUD operations using Java

Topics –

  • Understanding Cassandra Drivers
  • Exploring the DataStax Java Driver
  • Setting up Maven Project
  • Connecting to a Cassandra cluster
  • Executing a Query
  • Displaying Query results
  • Updating Data
  • Deleting Data

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

Process we follow for project development

We follow Agile methodology for the project development,

  1. Each batch will be divided into scrum teams of size 4-5 members.
  2. We will start with a Feature Study before implementing a project.
  3. The Feature will be broken down into User Stories and Tasks.
  4. For each user story a proper Definition Of Done will be defined
  5. A Test plan will be defined for testing the user story

Real Time Data Simulator

Project description: Creating a project which generates dynamic mock data based on the schema at a real-time, which can be further used for Real-time Processing systems like Apache Storm or Spark Streaming.

Building Complex Real time Event Processing

Project Description:

In this project, you will be building a real-time event processing system using Apache Streaming where even sub seconds also matter for analysis, while still not fast enough for ultra-low latency (picosecond or nano second) applications, such as CDR (Calling Detailed Record) from telecommunication where you can expect millisecond response times.

User Story 01 – As a developer we should simulate Real time Network Data

  1. Task 01 – Use Java Socket programming to generate and publish data to a port
  2. Task 02 – Publish the data with different scenarios

User Story 02 – As a developer we should be able to consume data using Spark Streaming

User Story 03 – As a developer we should consume Google API to convert latitude and longitude to corresponding region names.

User Story 04 –  Perform computation to calculate some important KPI’s (Key Performance Indicator) on the real time data.

More detailed split up will be shared once you start the project.

Technologies Used :

  • Java Socket Programming
  • Google API
  • Scala Programming
  • Spark Streaming

Data Model Development Kit

Project Description :

This project helps data model developer to manage Hive tables with different tables, storage types, column types and column properties required for different use case development.

Roles & Responsibility

  1. Building .xml files to define structures of hive tables to be used for storing process data generated.
  2. Actively involved in development to read .xml files, create data models and load data in hive.

Technologies Used

Java, JAXB, JDBC, Hadoop, Hive,

Sample User Stories

[Study User Story 01] – Come up with a design to represent data model required to handle the following scenarios

  • To Handle different operations like “CREATE”, “UPDATE”,”DELETE”
  • Way to define partition table
  • To Store columns in Orders
  • To Store column Name
  • To Handle Update of Column type and Name

[User Story 02] – HQL Generator – As a developer, we have to provide a functionality to create table

**Tasks**
– [ ] . Building Maven project and adding dependency
– [ ] . Integrate Loggers
– [ ] . Code Commit
– [ ] . Create a standard package structure.
– [ ] . Utility to read xml and create Java Object
– [ ] . Utility code to communicate to Hive DB
– [ ] . Check for Hive Service before executing queries
– [ ] . Code to construct HQL query for create.
– [ ] . Exception Handling.

Definition of Done
– [ ] Package structure should be created.
– [ ] Table has to be created in Hive
– [ ] Validate all required schema is created
– [ ] Validation of Hadoop + Hive Services

**Test Cases**
1.If table already exists we need to print “Table already exists”
2.Verify schema with xml
3.If services are not up and running,it should handle and log it.

 

Course hours

90 hours extensive class room training30 sessions of 3 hours eachCourse Duration : 5 Months

Assignments

For each module multiple Hands-on exercises, assignments and quiz are provided in the E-Learning

Real time project

We follow Agile Methodology for the project development. Each project will have Feature Study followed by User stories and Tasks.

Mock Interview

There will be a dedicated 1 to 1 interview call between you and a Big Data Architect. Experience a real Mock Interview.

Forum

We have a community forum for all our students wherein you can enrich their learning through peer interaction and knowledge sharing.

Certification

From the beginning of the course, you will be working on a project. On completion of the project NPN Training certifies you as a “Big Data Architect” based on the project.

Nov 10th

Batch: Weekend Sat & Sun
Duration: 5 Months
 30,000

Enroll Now

Dec 15th

Batch: Weekend Sat & Sun
Duration: 5 Months
 30,000

Enroll Now

Jsn 12th

Batch: Weekend Sat & Sun
Duration: 5 Months
 25,000

Enroll Now

Batches not available

Is Java a pre-requisite to learn Big Data Masters Program?

Yes Java is a pre-requisite, there are institutes who says Java is not required all those are false information

Can I attend a demo session before enrollment?

Yes, You will be sitting in an actual live class to experience the quality of training.

How will I execute the Practicals?

We will help you to setup NPN Training’s Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

Who are the Instructor at NPN Training?

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

how do I access the eLearning content for the course?

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content

What If I miss a session?

The course validity will be one year so that you can attend the missed session in another batches.

Do I avail EMI option?

The total fees you will be paying in 2 installments

Are there any group discounts for classroom training programs?

Yes, we have group discount options for our training programs. Contact us using the form “Drop Us a Query” on the right of any page on the NPN Training website, or select the Live Chat link. Our customer service representatives will give you more details.

Reviews

Anindya Banerjee
Cognizant
Linkedin

After searching extensively in internet about big data courses,I came to know about NPN training and Naveen sir…it was a tough call for me to pick the right training institute which would provide me the right blend of practical exposure with theoretical knowledge on big data and Hadoop technologies..After attending few classes I am mesmerized by Naven Sir’s way of teaching ,his command over various topics and his study materials.unlike other training institutes ,Naveen sir believes in extensive learning and he believes in hands on training which makes ‘NPN training’far apart from other institutes.I would highly recommend any one to join NPN training if he/she wants to make his career in big data technologies .

Sarbartha Paul
HCL Technologies
Linkedin

The best thing I liked about this institute is the way Naveen sir teaches and also is his way of taking care of individual person’s doubt and interest and his tendency to make others learn big data with complete hands-on experience. The theory he teaches is compact and crunchy enough to get a good hold of the basics.
Other thing that always keeps this institute apart is the way Naveen sir has designed it’s Big Data Architecture Program course which covers nearly everything that other institutes lack. The course materials are also very to the point.
In one word, Naveen sir’s way of teaching is a class apart!
I am greatly moved by his ideology and teaching and this is probably one of the finest institute in town as far as big data courses are concerned. It is worth joining his classroom in all aspect. Thank you for all your efforts sir!

Sai Venkata Krishna
Capgemini
Linkedin

Naveen is an excellent trainer. Naveen mainly focus on HANDS ON and REAL TIME SCENARIOS which helps one to understand the concepts easily.I feel that NPN training curriculum is best in market for Big Data.
Naveen is very honest in his approach and he delivers additional concepts which are not present in the syllabus of particular topics.E learning and assignments are very informative and helpful.The amount you pay for the Big data course is worth every penny.
Thank you NPN Training for your support and motivation.

Chat with Instructor