# Data Science Training in Bangalore using Python + R

Learn and master data analytics using Python and R programming languages. In this program you will learn Statistics, Data Analysis Libraries, Visualization Libraries, Machine Learning, Deep Learning using Tensor Flow. The program is a highly recommended for professional who intends to land as a successful Data Analyst

## Course Description

At NPN Training we believe in the philosophy **“Learn by doing”** hence we provide complete **Hands-on training **with a **real time project development**.

#### Course Structure

By the end of the course, you will:

**Python Programming :**his course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.**Flask – Building Rest Service :**This course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.**Robot Framework – Test Automation Framework :**his course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.

#### Work on a real time project

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

#### Evaluation

A multi-pronged assessment approach ensures that learning is maximized. The assessments test the students’ ability to grasp the terminology through concepts and to apply those concepts in business problem scenarios. These are done throughout the program to gauge their progress continuously. We do this with the help of “Recall Output Tests (ROTes), Conceptual Understanding Tests (CUTes), Hands-on Lab Tests (HoTs), Mid-term Hackathon (MiTH), Project Hackathon and Defense (PHD), Feedback and Attendance”. They have to write a project report and defend it. The exams, assignments and projects must be completed within the stipulated timeframes to be eligible for being awarded the grades.

#### Course Objectives

By the end of the course, you will:

- Understand what is Big Data, the challenges of with Big Data and how Hadoop solves the Big Data problem.

## Curriculum

## Statistics

#### Module 01 - Introduction to Data Science

**Learning Objectives –** In this module, you will learn fundamentals of statistics. After this module you will understand different statistical concepts which will help in data analysis and machine learning.

**Topics – **

- Introduction to Statistics
- Different types of Statistics
- Descriptive statistics
- Inferential statistics
- Types of data
- Numerical data
- Discrete data
- Continuous data

- Categorical data
- Ordinal data

- Numerical data
- Deep dive into Descriptive statistics
- Uni-variate Analysis
- Bi-variate Analysis
- Multivariate Analysis
- Function Models
- Significance in Data Science

- Deep dive into Inferential statistics
- Sampling Distributions & Estimation
- Hypothesis Testing (One and Two Group Means)
- Hypothesis Testing (Categorical Data)
- Hypothesis Testing (More Than Two Group Means)
- Quantitative Data (Correlation & Regression)
- Significance in Data Science

- Numerical Parameters to represent data
- Mean
- Mode
- Median
- Sensitivity
- Information Gain
- Entrophy

- Population and Sampling
- Sampling techniques
- Covariance
- Point Estimation

#### Module 02 - Introduction to Statistics

**Learning Objectives –** In this module, you will learn fundamentals of statistics. After this module you will understand different statistical concepts which will help in data analysis and machine learning.

**Topics – **

- Introduction to Statistics
- Different types of Statistics
- Descriptive statistics
- Inferential statistics
- Types of data
- Numerical data
- Discrete data
- Continuous data

- Categorical data
- Ordinal data

- Numerical data
- Deep dive into Descriptive statistics
- Uni-variate Analysis
- Bi-variate Analysis
- Multivariate Analysis
- Function Models
- Significance in Data Science

- Deep dive into Inferential statistics
- Sampling Distributions & Estimation
- Hypothesis Testing (One and Two Group Means)
- Hypothesis Testing (Categorical Data)
- Hypothesis Testing (More Than Two Group Means)
- Quantitative Data (Correlation & Regression)
- Significance in Data Science

- Numerical Parameters to represent data
- Mean
- Mode
- Median
- Sensitivity
- Information Gain
- Entrophy

- Population and Sampling
- Sampling techniques
- Covariance
- Point Estimation

## Statistics & R – Programming

**Course description:** This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

#### Module 01 - Getting started with R

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 02 - Data Importing Techniques

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 03 - Exploratory Data Analysis

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 04 - Data Visualization using R

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 05 - Exploring R Package

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

## Python 3.0 – Preparatory Course

**Course description:** This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

#### Module 01 - Language Fundamentals

**Learning Objectives –** In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

**Topics –**

- Introduction to Python
- Installing Python in Windows using PyCharm
- Data Types
- Numbers
- Strings
- Booleans

- Control Flow Statements
- Understanding Python Indentation
- Decisions
- The if Statement
- The if-else Statement
- The if-elif-else Statement

- Looping
- The while loop
- The for loop
- Using range() in for loops

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

#### Module 02 - Collections

**Learning Objectives –** In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

**Topics –**

- Exploring Python Collections
- Lists
- Creating Lists
- Accessing List Elements
- Iterating through list elements
- Searching elements within Lists
- Check for existence
- Counting occurence
- Locating elements

- List slices
- Adding and deleting elements
- Adding, Multiplying and Copying Lists

- Tuples
- Creating tuples
- Creating Tuples from Lists using tuple()
- Creating empty tuples using tuples()
- Creating Singleton Tuples

- Accessing Tuple elements
- Counting Tuple elements
- Iterating through tuple elements
- Searching elements within tuples
- Tuple slices
- Adding, Multiplying and Copying Tuples
- Adding Tuples
- Multiplying Tuples
- Assigning and Copying Tuples

- Creating tuples
- Sets
- Creating Sets
- Accessing Set elements
- Counting Set elements
- Iterating through Set elements
- Adding and Deleting elements
- Set Operations
- Set Union
- Set Intersection
- Set Difference

- Dictionaries
- Creating Dictionaries
- Accessing Dictionary elements
- Iterating through Dictionary elements
- Iterating through the keys of a Dictionary
- Iterating through the values of Dictionary
- Iterating through the key-value pairs of a Dictionary

- Searching elements within Dictionaries
- Checking for the existence of a key in a Dictionary
- Extracting the value of a key using []
- Extracting the value of akey using dict.get()

- Adding and Deleting elements
- Adding elements using []
- Adding elements using setdefault()

- Deleting Elements
- Using del to Delete and Element
- Using popitem() to Delete Elements
- Using pop() to Delete Elements
- Using clear() to Delete all elements of a Dictionary

#### Module 03 - Functions & Lambdas Expressions

**Learning Objectives –**In this module, you will understand how MapReduce framework works.

**Topics –**

- Introduction to Functions
- Function Definition
- Function call
- Positional Arguments
- Default arguments
- Keyword arguments
- Variable arguments
- Variable arguments with positional parameters
- Variable arguments with default arguments
- Variable arguments followed by default arguments
- Variable arguments followed by keyword arguments

- Returning From Functions
- Returning tuples from functions
- Returning Lists from functions
- Returning Dictionaries from functions

- Returning single values from functions
- Returning Collection from functions
- Global variables
- Exploring Lambda Expressions
- Introduction to Lambda expressions
- Declaring Lambda expressions
- What is an expression
- Understanding when to use Lamdba expressions
- Defaults in Lambda expressions

- Lambdas with built in functions
- The map() function
- The filter() function
- The reduce() function
- Practical use map(), filter() and reduce() using Lambda expressions

Data Analysis – Python Libraries for Analysis

**Course description:** This course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.

#### Module 01 - NumPy

**Learning Objectives –** In this module, you will learn NumPy which is one of the fundamental package for scientific computing with Python.

**Topics – **

- Introduction to NumPy
- Exploring NumPy Arrays
- Python Lists vs NumPy Arrays
- Exploring NumPy Operations
- Looping through List and NumPy Arrays
- Multiplying each elements in Lists and NumPy Arrays
- Creating multi-dimensional array using NumPy library
- Squaring the number of each element
- Exploring NumPy Built in Methods
- ndim
- temsize
- dtype
- shape
- reshape
- arange
- linspace
- eye

- Advantages of NumPy library
- Less Memory
- Fast
- Convenient

#### Module 02 - [Pandas] - Series

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

#### Module 02 - [Pandas] - DataFrame - I

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

#### Module 02 -[Pandas] - DataFrame - I

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

Data Visualization – Python Libraries for Visualization

**Course description:**

#### Module 01 - Matplotlib

**Learning Objectives –** In this module, you will learn one of the popular Data Visualization library i.e Matplotlib which makes it easy build various types of plots and to customize them to make them more visually appealing and interpretable.

**Topics – **

- Introduction to Matplotlib
- Plotting Line Chart
- Functional Method
- Object Oriented Method

- Plotting Scatter Plot
- Histograms
- Customization
- Colors attributes
- Understanding linewidth
- Line Style attributes
- Exploring alpha attributes
- Markers

#### Module 02 - Seaborn

**Learning Objectives –** In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

**Topics – **

- Introduction to Scala Programming
- Exploring Scala REPL
- Basic Scala Operations
- Exploring different variable types
- Mutable Variables
- Immutable Variables

- Type Inference
- Block Expression
- Lazy evaluation
- Control structures
- Exploring different variants of for loop
- Enhanced for loop
- For loop with yield
- For loop with if conditions – Pattern Guards

- Match Expressions

#### Module 03 - Geographical Plotting

**Learning Objectives –** In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

**Topics – **

- Introduction to Scala Programming
- Exploring Scala REPL
- Basic Scala Operations
- Exploring different variable types
- Mutable Variables
- Immutable Variables

- Type Inference
- Block Expression
- Lazy evaluation
- Control structures
- Exploring different variants of for loop
- Enhanced for loop
- For loop with yield
- For loop with if conditions – Pattern Guards

- Match Expressions

Machine Learning – Python for Data Visualization

**Course description:** Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

#### Module 01 - Introduction to Machine Learning

**Learning Objectives –** This module is an introduction to Machine Learning.

**Topics – **

- What is Machine Learning
- Traditional Learning vs Machine Learning
- Real life applications of Machine Learning
- Types of Machine Learning
- Supervised Machine Learning
- Unsupervised machine Learning
- Reinforcement Learning

- Supervised Learning
- Overview of Supervised Learning
- Walk through of Supervised Learning algorithms
- Real time Applications of Supervised Learning
- Pros / Cons of Supervised Learning

- Unsupervised Learning
- Overview of Unsupervised Learning
- Walk through of Unsupervised Learning algorithms
- Real time Applications of Unsupervised Learning
- Pros / Cons of Supervised Learning

- Reinforcement Learning
- Overview of Reinforcement Learning
- Walk through of Reinforcement Learning algorithms
- Real time Applications of Reinforcement Learning
- Pros / Cons of Reinforcement Learning

#### Module 02 - Introduction to Scikit-Learn

**Learning Objectives –** In this module, you will learn one of the popular Python library for machine learning

**Topics – **

- Introduction to Scikit Learn
- Features of Scikit Learn
- Exploring popular groups of models provided by scikit-learn

#### Module 03 - Linear Regression [Supervised Learning]

**Learning Objectives –** In this module, you will learn one of the most well known algorithm in statistics. Linear Regression performs the task to predict a continuous dependent variable based on the given set of independent variables.

**Topics – **

- Introduction to Linear Regression
- Understanding of gradient descent and cost function
- Implementation of linear regression model using scikit learn
- Different ways to validate the linear regression models
- Assumptions in linear regression
- Multicollinearity
- Heteroscedasticity
- Auto Serial correlation and
- Normal distribution of error)

- Introduction to cross validation
- Advantages and Drawbacks of Linear Regression

#### Module 04 - Logistic Regression [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Logistic Regression algorithm. Logistic Regression performs the task to predict a categorical dependent variable based on the given set of independent variables.

**Topics – **

- Introduction to Logistic Regression
- Assumptions of Logistic Regression
- Understanding of odds and odds ratio
- Implementation of Logistic Regression using scikit learn
- Understanding of TPR,TNR, PRECISION,RECALL and Confusion Matrix
- Validation of the model using ROC curve
- Advantages and Drawbacks of Logistic Regression

#### Module 05 - K-Nearest Neighbours [Supervised Learning]

**Learning Objectives –** In this module, you will learn about popular KNN algorithm. This is the most simplest but powerful supervised machine learning algorithm.

**Topics – **

- Introduction to KNN Algorithm
- Introduction to different distance measures
- Implementation of KNN using scikit learn
- Validation of KNN Model
- Advantages of KNN
- Drawbacks of KNN

#### Module 06 - Support Vector Machine [Supervised Learning]

**Learning Objectives –** In this module, you will learn about SVM algorithm. SVM is a powerful algorithm which can be used for both classification and regression use cases.

**Topics – **

- Introduction to SVM
- Understanding of hyperplanes
- Benefits of SVM as compared to other algorithms
- Kernel Trick in SVM
- Implementation of SVM using scikit learn
- Hyper parameters tuning in SVM

#### Module 07 - K-Means Clustering [Unsupervised Learning]

**Learning Objectives –** In this module, you will learn different clustering algorithms. We will also go through the popular K-Means clustering algorithm.

**Topics – **

- Introduction to different clustering techniques
- Understanding of hierarchical clustering
- Introduction to K-Means clustering
- Understanding of Euclidean distance
- Implementation of K-Means using scikit Learn
- Optimization of K-Means clustering

#### Module 08 - Principle Component Analysis [UnSupervised Learning]

**Learning Objectives –** In this module, you will learn about Principle Component Analysis(PCA). PCA is a dimensionality reduction technique. You will also learn the implementation using scikit Learn.

**Topics – **

- Introduction to PCA
- Introduction to Factor Analysis
- Implementation using scikit learn
- Advantages of PCA

#### Module 09 - Decision Trees [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Decision Tree Algorithm which can be used for both classification and regression use cases. Decision Trees are widely used because of it’s high interpretability.

**Topics – **

- Introduction to Decision Tree
- Understanding of CART algorithm
- Understanding of Entropy and Gini Index
- Implementation of Decision Tree using scikit learn
- Parameters tuning in Decision Trees

#### Module 10 - Random Forest Algorithm [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Random Forest ensemble Algorithm which can be used for both classification and regression use cases. Random Forest is widely used because of it’s high accuracy and easy to use features.

**Topics – **

- Introduction to Random Forest
- Understanding of ensemble Modelling
- Bagging and Boosting
- Implementation of Random Forest using scikit learn
- Hyper parameters tuning in Random Forest
- Feature Importance in Random Forest

#### Module 11 - XGBOOST [Supervised Learning]

**Learning Objectives –** XGBOOST is the most popular algorithm in Machine Learning. It can be used for both classification and regression. In this session we will go through how to get high accuracy with this Boosting algorithm.

**Topics – **

- Introduction to XGBOOST
- Benefits of XGBOOST as compared to other algorithms
- Implementation of XGBOOST using scikit learn
- Hyper parameters tuning in XGBOOST

#### Module 12 - Time series Forecasting [Supervised Learning]

**Learning Objectives –** In this module, you will learn about various time series forecasting methods. We will do a deep dive into ARIMA model.

**Topics – **

- Introduction to Time series Forecasting
- Understanding of different Time Series forecasting methods
- Understanding of ARIMA
- Implementation of ARIMA

#### Module 13 - Natural Language Processing

**Learning Objectives –** In this module, you will learn about Natural Language Processing. We will go through the popular NLTK packages and sentiment analysis using Python.

**Topics – **

- Introduction to Natural Language Processing
- Deep Dive to NLTK package
- Tokenizing words and sentences
- Stop words
- Stemming words
- Lemmatization
- Word Net
- Sentiment Analysis using NLTK

## Tensor Flow – Deep Learning

**Course description:** In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

#### Module 01 - Neural Network

**Learning Objectives –** In this module, you will learn about Neural Network. We will go through the different concepts of neural networks. We will do deep dive into RNN and CNN models.

**Topics – **

- Introduction to Neural Network
- Understanding of gradient descent
- Understanding of forward and backward propagation
- Understanding of RNN and CNN
- Implementation of neural network using scikit learn

#### Module 02 - TensorFlow

**Learning Objectives –** In this module, you will learn about Basics of deep learning. We will go through tensorflow API. We will do a neural network implementation using tensorflow.

**Topics – **

- Introduction to tensorflow API
- Implementation of neural network using tensorflow
- Benefits of tensorflow API

## Projects

This program comes with a portfolio of industry-relevant **POC’s, Use cases and project work. **Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

**Process we follow for project development**

We follow Agile methodology for the project development,

- Each batch will be divided into scrum teams of size 4-5 members.
- We will start with a
**Feature Study**before implementing a project. - The Feature will be broken down into
**User Stories and Tasks**. - For each user story a proper
**Definition Of Done**will be defined - A
**Test plan**will be defined for testing the user story

#### Real Time Data Simulator

**Project description:** Creating a project which generates dynamic mock data based on the schema at a real-time, which can be further used for Real-time Processing systems like Apache Storm or Spark Streaming.

#### Building Complex Real time Event Processing

**Project Description:**

In this project, you will be building a real-time event processing system using Apache Streaming where even sub seconds also matter for analysis, while still not fast enough for ultra-low latency (picosecond or nano second) applications, such as CDR (Calling Detailed Record) from telecommunication where you can expect millisecond response times.

**User Story 01 –** As a developer we should simulate Real time Network Data

- Task 01 – Use Java Socket programming to generate and publish data to a port
- Task 02 – Publish the data with different scenarios

**User Story 02 –** As a developer we should be able to consume data using Spark Streaming

**User Story 03** – As a developer we should consume Google API to convert latitude and longitude to corresponding region names.

**User Story 04** – Perform computation to calculate some important KPI’s (Key Performance Indicator) on the real time data.

More detailed split up will be shared once you start the project.

**Technologies Used :**

- Java Socket Programming
- Google API
- Scala Programming
- Spark Streaming

#### Data Model Development Kit

**Project Description :**

This project helps data model developer to manage Hive tables with different tables, storage types, column types and column properties required for different use case development.

**Roles & Responsibility**

- Building .xml files to define structures of hive tables to be used for storing process data generated.
- Actively involved in development to read .xml files, create data models and load data in hive.

**Technologies Used**

Java, JAXB, JDBC, Hadoop, Hive,

**Sample User Stories**

** [Study User Story 01] – **Come up with a design to represent data model required to handle the following scenarios

- To Handle different operations like “CREATE”, “UPDATE”,”DELETE”
- Way to define partition table
- To Store columns in Orders
- To Store column Name
- To Handle Update of Column type and Name

* [User Story 02] – *HQL Generator – As a developer, we have to provide a functionality to create table

***Tasks***

– [ ] . Building Maven project and adding dependency

– [ ] . Integrate Loggers

– [ ] . Code Commit

– [ ] . Create a standard package structure.

– [ ] . Utility to read xml and create Java Object

– [ ] . Utility code to communicate to Hive DB

– [ ] . Check for Hive Service before executing queries

– [ ] . Code to construct HQL query for create.

– [ ] . Exception Handling.

*Definition of Done*

– [ ] Package structure should be created.

– [ ] Table has to be created in Hive

– [ ] Validate all required schema is created

– [ ] Validation of Hadoop + Hive Services

***Test Cases***

1.If table already exists we need to print “Table already exists”

2.Verify schema with xml

3.If services are not up and running,it should handle and log it.

## Training Features

## Upcoming Batches

## FAQ’S

**Is Java a pre-requisite to learn Big Data Masters Program?**

Yes Java is a pre-requisite, there are institutes who says Java is not required all those are false information

**Can I attend a demo session before enrollment?**

Yes, You will be sitting in an actual live class to experience the quality of training.

**How will I execute the Practicals?**

We will help you to setup NPN Training’s Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

**Who are the Instructor at NPN Training?**

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

**How do I access the eLearning content for the course?**

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content

**What If I miss a session?**

The course validity will be one year so that you can attend the missed session in another batches.

**Do I avail EMI option?**

The total fees you will be paying in 2 installments

**Are there any group discounts for classroom training programs?**

Yes, we have group discount options for our training programs. Contact us using the form “**Drop Us a Query**” on the right of any page on the NPN Training website, or select the Live Chat link. Our customer service representatives will give you more details.

**Do I need to bring my own laptop?**

NPN Training will provide students with all the course material in hard copies. However, students should carry their individual laptops for the program. Please find the minimum configuration required:

- Windows 7 / Mac OS
- 8 GB RAM is highly preferred
- 100 GB HDD
- 64 bit OS

## Sample Videos

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

## Course Description

The Big Data Masters Program is designed to empower working professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.

Being a Big Data Architect requires you to be a master of multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.

At NPN Training we believe in the philosophy **“Learn by doing”** hence we provide complete **Hands-on training **with a **real time project development**.

#### Course Objectives

By the end of the course, you will:

- Understand what is Big Data, the challenges of with Big Data and how Hadoop solves the Big Data problem
- Understand Hadoop 2.x Architecture, Replication, Single Point of Failure, YARN
- Learn HDFS + YARN Commands to work with cluster.
- Understand how MapReduce can be used to analyze big data sets
- Perform Structured Data Analysis using Hive
- Learn different performance tuning techniques in Hive
- Learn Data Loading techniques using Sqoop
- Use Scala with an intermediate level of proficiency
- Use the REPL (the Scala Interactive Shell) for learning
- Learn Functional Programming using Scala
- Learn Apache Spark 2.x
- Use DataFrames and Structured Streaming in Spark 2.x
- Analyze and Visualize data using Zeeplein
- Learn popular NoSQL Cassandra database.

#### Work on a real time project on Big Data

This program comes with a portfolio of industry-relevant POC’s, Use cases and project work. Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

#### Work is the target audience?

- Software engineers and programmers who want to understand the larger Big Data ecosystem, and use it to store and analyze.
- Project, program, or product managers who want to understand the high-level architecture and projects of Big Data.
- Data analysts and database administrators who are curious about Hadoop and how it relates to their work.

## Statistics

**Course description:** This section of the training will help you understand how Hadoop solves storage and processing of large data sets in a distributed environment .

#### Module 01 - Introduction to Data Science

**Learning Objectives –** In this module, you will learn fundamentals of statistics. After this module you will understand different statistical concepts which will help in data analysis and machine learning.

**Topics – **

- Introduction to Statistics
- Different types of Statistics
- Descriptive statistics
- Inferential statistics
- Types of data
- Numerical data
- Discrete data
- Continuous data

- Categorical data
- Ordinal data

- Numerical data
- Deep dive into Descriptive statistics
- Uni-variate Analysis
- Bi-variate Analysis
- Multivariate Analysis
- Function Models
- Significance in Data Science

- Deep dive into Inferential statistics
- Sampling Distributions & Estimation
- Hypothesis Testing (One and Two Group Means)
- Hypothesis Testing (Categorical Data)
- Hypothesis Testing (More Than Two Group Means)
- Quantitative Data (Correlation & Regression)
- Significance in Data Science

- Numerical Parameters to represent data
- Mean
- Mode
- Median
- Sensitivity
- Information Gain
- Entrophy

- Population and Sampling
- Sampling techniques
- Covariance
- Point Estimation

#### Module 02 - Introduction to Statistics

**Learning Objectives –** In this module, you will learn fundamentals of statistics. After this module you will understand different statistical concepts which will help in data analysis and machine learning.

**Topics – **

- Introduction to Statistics
- Different types of Statistics
- Descriptive statistics
- Inferential statistics
- Types of data
- Numerical data
- Discrete data
- Continuous data

- Categorical data
- Ordinal data

- Numerical data
- Deep dive into Descriptive statistics
- Uni-variate Analysis
- Bi-variate Analysis
- Multivariate Analysis
- Function Models
- Significance in Data Science

- Deep dive into Inferential statistics
- Sampling Distributions & Estimation
- Hypothesis Testing (One and Two Group Means)
- Hypothesis Testing (Categorical Data)
- Hypothesis Testing (More Than Two Group Means)
- Quantitative Data (Correlation & Regression)
- Significance in Data Science

- Numerical Parameters to represent data
- Mean
- Mode
- Median
- Sensitivity
- Information Gain
- Entrophy

- Population and Sampling
- Sampling techniques
- Covariance
- Point Estimation

R – Programming

**Course description:** This course will help you to learn one of the most popular Job Scheduling Library i.e Quartz that can be integrated into a wide variety of Java applications. Quartz is widely used in enterprise class applications to support scheduling of jobs and to build process workflow.

#### Module 01 - Getting started with R

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 02 - Data Importing Techniques

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 03 - Exploratory Data Analysis

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 04 - Data Visualization using R

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

#### Module 05 - Exploring R Package

**Learning Objectives –** In this module, you will learn about R fundamentals, understand different types of R Data Structures, Flow control statements and Functions. After this module you will be able to create/extract data from different R Data Structures and write your own R functions.

**Topics – **

- Introduction to R – Overview and Features
- Environment Setup
- Understanding of different Arithmetic operation in R
- Variables
- Understanding of different R Data structures
- Exploring Data Structure
- Vector
- List
- Matrices
- Arrays
- Factors
- DataFrames

- Introduction to Vector
- Vector creation, data extraction and manipulation

- Data Structure – List
- Introduction to List
- List creation and manipulation

- Data Structure – Matrices
- Introduction to Matrices
- Matrices creation, data extraction and computations

- Data Structure – Arrays
- Introduction to Arrays
- Arrays creation and manipulation

- Data Structure – Factors
- Introduction to Factors
- Generating different Factor Levels

- Data Structure – Data Frames
- Introduction to Data Frames
- Data Frame creation, data extraction and computations
- Data Reshaping

- Flow Control statements in R
- If statement
- If…else statement
- switch statement
- while loop
- for loop
- repeat loop
- break and next

- Exploring built-in functions in R
- Generating Sequence
- Generating Random Numbers
- Column Bind : cbind()
- Row Bind : rbind()
- Merge Functions

- Exploring user defined functions in R
- Declaring Function
- Calling a function with/without arguments
- Lazy Evaluation of Function in R

Python 3.0 – Preparatory Course

#### Module 01 - Language Fundamentals

**Learning Objectives –** In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

**Topics –**

- Introduction to Python
- Installing Python in Windows using PyCharm
- Data Types
- Numbers
- Strings
- Booleans

- Control Flow Statements
- Understanding Python Indentation
- Decisions
- The if Statement
- The if-else Statement
- The if-elif-else Statement

- Looping
- The while loop
- The for loop
- Using range() in for loops

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

#### Module 02 - Collections

**Learning Objectives –** In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

**Topics –**

- Exploring Python Collections
- Lists
- Creating Lists
- Accessing List Elements
- Iterating through list elements
- Searching elements within Lists
- Check for existence
- Counting occurence
- Locating elements

- List slices
- Adding and deleting elements
- Adding, Multiplying and Copying Lists

- Tuples
- Creating tuples
- Creating Tuples from Lists using tuple()
- Creating empty tuples using tuples()
- Creating Singleton Tuples

- Accessing Tuple elements
- Counting Tuple elements
- Iterating through tuple elements
- Searching elements within tuples
- Tuple slices
- Adding, Multiplying and Copying Tuples
- Adding Tuples
- Multiplying Tuples
- Assigning and Copying Tuples

- Creating tuples
- Sets
- Creating Sets
- Accessing Set elements
- Counting Set elements
- Iterating through Set elements
- Adding and Deleting elements
- Set Operations
- Set Union
- Set Intersection
- Set Difference

- Dictionaries
- Creating Dictionaries
- Accessing Dictionary elements
- Iterating through Dictionary elements
- Iterating through the keys of a Dictionary
- Iterating through the values of Dictionary
- Iterating through the key-value pairs of a Dictionary

- Searching elements within Dictionaries
- Checking for the existence of a key in a Dictionary
- Extracting the value of a key using []
- Extracting the value of akey using dict.get()

- Adding and Deleting elements
- Adding elements using []
- Adding elements using setdefault()

- Deleting Elements
- Using del to Delete and Element
- Using popitem() to Delete Elements
- Using pop() to Delete Elements
- Using clear() to Delete all elements of a Dictionary

#### Module 03 - Functions & Lambdas Expressions

**Learning Objectives –**In this module, you will understand how MapReduce framework works.

**Topics –**

- Introduction to Functions
- Function Definition
- Function call
- Positional Arguments
- Default arguments
- Keyword arguments
- Variable arguments
- Variable arguments with positional parameters
- Variable arguments with default arguments
- Variable arguments followed by default arguments
- Variable arguments followed by keyword arguments

- Returning From Functions
- Returning tuples from functions
- Returning Lists from functions
- Returning Dictionaries from functions

- Returning single values from functions
- Returning Collection from functions
- Global variables
- Exploring Lambda Expressions
- Introduction to Lambda expressions
- Declaring Lambda expressions
- What is an expression
- Understanding when to use Lamdba expressions
- Defaults in Lambda expressions

- Lambdas with built in functions
- The map() function
- The filter() function
- The reduce() function
- Practical use map(), filter() and reduce() using Lambda expressions

#### [Capstone Project] - Spark Streaming

**E-Commerce Data Analysis – [Real-time industry use case]**

**Use case Description :**

- E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
- This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
- This Dash board will be very useful for operational intelligence

## Data Analysis – Python Libraries for Data Analysis

#### Module 01 - NumPy

**Learning Objectives –** In this module, you will learn NumPy which is one of the fundamental package for scientific computing with Python.

**Topics – **

- Introduction to NumPy
- Exploring NumPy Arrays
- Python Lists vs NumPy Arrays
- Exploring NumPy Operations
- Looping through List and NumPy Arrays
- Multiplying each elements in Lists and NumPy Arrays
- Creating multi-dimensional array using NumPy library
- Squaring the number of each element
- Exploring NumPy Built in Methods
- ndim
- temsize
- dtype
- shape
- reshape
- arange
- linspace
- eye

- Advantages of NumPy library
- Less Memory
- Fast
- Convenient

#### Module 02 - [Pandas] - Series

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

#### Module 03 - [Pandas] - DataFrame - I

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

#### Module 04 - [Pandas] - DataFrame - I

**Learning Objectives –** In this module you will understand about quartz job scheduler

**Topics – **

- Introduction to Pandas
- Exploring Pandas fundamental data structure
- Series Data Structure
- DataFrame Data Structure

- Different ways to create series data structure
- Parameters and Arguments for series object
- Understanding usecols parameters in Series object
- Modifying the squeeze parameters
- Exploring inplace parameter

- Exploring Series attributes
- The .value attribute
- The .index attribute
- The .dtype attribute

- Exploring Series methods
- The head() and .tail() method
- The .sort_values() method
- The .sort_index() method
- Extracting Series values by Index position
- Extracting Series values by index label
- The .get() Method
- Math methods and Series objects
- The .idxmax() and .idxmin() method
- The .value_counts() method
- The .apply() method
- The .map() method
- Applying Python Built-In Functions to Series

**Course description:** In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

## Data Visualization – Python Libraries for Data Visualization

**Course description:** Today’s application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

#### Module 01 - Matplotlib

**Learning Objectives –** In this module, you will learn one of the popular Data Visualization library i.e Matplotlib which makes it easy build various types of plots and to customize them to make them more visually appealing and interpretable.

**Topics – **

- Introduction to Matplotlib
- Plotting Line Chart
- Functional Method
- Object Oriented Method

- Plotting Scatter Plot
- Histograms
- Customization
- Colors attributes
- Understanding linewidth
- Line Style attributes
- Exploring alpha attributes
- Markers

#### Module 02 - Seaborn

**Learning Objectives –** In this module, you will learn one of the popular Data Visualization library i.e Matplotlib which makes it easy build various types of plots and to customize them to make them more visually appealing and interpretable.

**Topics – **

- Introduction to Matplotlib
- Plotting Line Chart
- Functional Method
- Object Oriented Method

- Plotting Scatter Plot
- Histograms
- Customization
- Colors attributes
- Understanding linewidth
- Line Style attributes
- Exploring alpha attributes
- Markers

#### Module 03 - Geographical Plotting

**Learning Objectives –** In this module, you will learn one of the popular Data Visualization library i.e Matplotlib which makes it easy build various types of plots and to customize them to make them more visually appealing and interpretable.

**Topics – **

- Introduction to Matplotlib
- Plotting Line Chart
- Functional Method
- Object Oriented Method

- Plotting Scatter Plot
- Histograms
- Customization
- Colors attributes
- Understanding linewidth
- Line Style attributes
- Exploring alpha attributes
- Markers

Machine Learning

**Course description:** This course will help you to learn Apache NiFi which is designed to automate the flow of data between software systems.

#### Module 01 - Introduction to Machine Learning

**Learning Objectives –** This module is an introduction to Machine Learning.

**Topics – **

- What is Machine Learning
- Traditional Learning vs Machine Learning
- Real life applications of Machine Learning
- Types of Machine Learning
- Supervised Machine Learning
- Unsupervised machine Learning
- Reinforcement Learning

- Supervised Learning
- Overview of Supervised Learning
- Walk through of Supervised Learning algorithms
- Real time Applications of Supervised Learning
- Pros / Cons of Supervised Learning

- Unsupervised Learning
- Overview of Unsupervised Learning
- Walk through of Unsupervised Learning algorithms
- Real time Applications of Unsupervised Learning
- Pros / Cons of Supervised Learning

- Reinforcement Learning
- Overview of Reinforcement Learning
- Walk through of Reinforcement Learning algorithms
- Real time Applications of Reinforcement Learning
- Pros / Cons of Reinforcement Learning

#### Module 02 - Introduction to Scikit-Learn

**Learning Objectives –** In this module, you will learn one of the popular Python library for machine learning

**Topics – **

- Introduction to Scikit Learn
- Features of Scikit Learn
- Exploring popular groups of models provided by scikit-learn

#### Module 03 - Linear Regression [Supervised Learning]

**Learning Objectives –** In this module, you will learn one of the most well known algorithm in statistics. Linear Regression performs the task to predict a continuous dependent variable based on the given set of independent variables.

**Topics – **

- Introduction to Linear Regression
- Understanding of gradient descent and cost function
- Implementation of linear regression model using scikit learn
- Different ways to validate the linear regression models
- Assumptions in linear regression
- Multicollinearity
- Heteroscedasticity
- Auto Serial correlation and
- Normal distribution of error)

- Introduction to cross validation
- Advantages and Drawbacks of Linear Regression

#### Module 04 - Logistic Regression [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Logistic Regression algorithm. Logistic Regression performs the task to predict a categorical dependent variable based on the given set of independent variables.

**Topics – **

- Introduction to Logistic Regression
- Assumptions of Logistic Regression
- Understanding of odds and odds ratio
- Implementation of Logistic Regression using scikit learn
- Understanding of TPR,TNR, PRECISION,RECALL and Confusion Matrix
- Validation of the model using ROC curve
- Advantages and Drawbacks of Logistic Regression

#### Module 05 - K-Nearest Neighbours [Supervised Learning]

**Learning Objectives –** In this module, you will learn about popular KNN algorithm. This is the most simplest but powerful supervised machine learning algorithm.

**Topics – **

- Introduction to KNN Algorithm
- Introduction to different distance measures
- Implementation of KNN using scikit learn
- Validation of KNN Model
- Advantages of KNN
- Drawbacks of KNN

#### Module 06 - Support Vector Machine [Supervised Learning]

**Learning Objectives –** In this module, you will learn about SVM algorithm. SVM is a powerful algorithm which can be used for both classification and regression use cases.

**Topics – **

- Introduction to SVM
- Understanding of hyperplanes
- Benefits of SVM as compared to other algorithms
- Kernel Trick in SVM
- Implementation of SVM using scikit learn
- Hyper parameters tuning in SVM

#### Module 07 - K-Means Clustering [Unsupervised Learning]

**Learning Objectives –** In this module, you will learn different clustering algorithms. We will also go through the popular K-Means clustering algorithm.

**Topics – **

- Introduction to different clustering techniques
- Understanding of hierarchical clustering
- Introduction to K-Means clustering
- Understanding of Euclidean distance
- Implementation of K-Means using scikit Learn
- Optimization of K-Means clustering

#### Module 08 - Principle Component Analysis [UnSupervised Learning]

**Learning Objectives –** In this module, you will learn about Principle Component Analysis(PCA). PCA is a dimensionality reduction technique. You will also learn the implementation using scikit Learn.

**Topics – **

- Introduction to PCA
- Introduction to Factor Analysis
- Implementation using scikit learn
- Advantages of PCA

#### Module 09 - Decision Trees [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Decision Tree Algorithm which can be used for both classification and regression use cases. Decision Trees are widely used because of it’s high interpretability.

**Topics – **

- Introduction to Decision Tree
- Understanding of CART algorithm
- Understanding of Entropy and Gini Index
- Implementation of Decision Tree using scikit learn
- Parameters tuning in Decision Trees

#### Module 10 - Random Forest Algorithm [Supervised Learning]

**Learning Objectives –** In this module, you will learn about Random Forest ensemble Algorithm which can be used for both classification and regression use cases. Random Forest is widely used because of it’s high accuracy and easy to use features.

**Topics – **

- Introduction to Random Forest
- Understanding of ensemble Modelling
- Bagging and Boosting
- Implementation of Random Forest using scikit learn
- Hyper parameters tuning in Random Forest
- Feature Importance in Random Forest

#### Module 11 - XGBOOST [Supervised Learning]

**Learning Objectives –** XGBOOST is the most popular algorithm in Machine Learning. It can be used for both classification and regression. In this session we will go through how to get high accuracy with this Boosting algorithm.

**Topics – **

- Introduction to XGBOOST
- Benefits of XGBOOST as compared to other algorithms
- Implementation of XGBOOST using scikit learn
- Hyper parameters tuning in XGBOOST

#### Module 12 - Time series Forecasting [Supervised Learning]

**Learning Objectives –** In this module, you will learn about various time series forecasting methods. We will do a deep dive into ARIMA model.

**Topics – **

- Introduction to Time series Forecasting
- Understanding of different Time Series forecasting methods
- Understanding of ARIMA
- Implementation of ARIMA

#### Module 13 - Natural Language Processing

**Learning Objectives –** In this module, you will learn about Natural Language Processing. We will go through the popular NLTK packages and sentiment analysis using Python.

**Topics – **

- Introduction to Natural Language Processing
- Deep Dive to NLTK package
- Tokenizing words and sentences
- Stop words
- Stemming words
- Lemmatization
- Word Net
- Sentiment Analysis using NLTK

## Tensor Flow – Deep Learning

#### Module 01 - Neural Network

**Learning Objectives –** In this module, you will learn about Neural Network. We will go through the different concepts of neural networks. We will do deep dive into RNN and CNN models.

**Topics – **

- Introduction to Neural Network
- Understanding of gradient descent
- Understanding of forward and backward propagation
- Understanding of RNN and CNN
- Implementation of neural network using scikit learn

#### Module 02 - TensorFlow

**Learning Objectives –** In this module, you will learn about Basics of deep learning. We will go through tensorflow API. We will do a neural network implementation using tensorflow.

**Topics – **

- Introduction to tensorflow API
- Implementation of neural network using tensorflow
- Benefits of tensorflow API

**POC’s, Use cases and project work. **Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

**Process we follow for project development**

We follow Agile methodology for the project development,

- Each batch will be divided into scrum teams of size 4-5 members.
- We will start with a
**Feature Study**before implementing a project. - The Feature will be broken down into
**User Stories and Tasks**. - For each user story a proper
**Definition Of Done**will be defined - A
**Test plan**will be defined for testing the user story

#### Real Time Data Simulator

**Project description:** Creating a project which generates dynamic mock data based on the schema at a real-time, which can be further used for Real-time Processing systems like Apache Storm or Spark Streaming.

#### Building Complex Real time Event Processing

**Project Description:**

In this project, you will be building a real-time event processing system using Apache Streaming where even sub seconds also matter for analysis, while still not fast enough for ultra-low latency (picosecond or nano second) applications, such as CDR (Calling Detailed Record) from telecommunication where you can expect millisecond response times.

**User Story 01 –** As a developer we should simulate Real time Network Data

- Task 01 – Use Java Socket programming to generate and publish data to a port
- Task 02 – Publish the data with different scenarios

**User Story 02 –** As a developer we should be able to consume data using Spark Streaming

**User Story 03** – As a developer we should consume Google API to convert latitude and longitude to corresponding region names.

**User Story 04** – Perform computation to calculate some important KPI’s (Key Performance Indicator) on the real time data.

More detailed split up will be shared once you start the project.

**Technologies Used :**

- Java Socket Programming
- Google API
- Scala Programming
- Spark Streaming

#### Data Model Development Kit

**Project Description :**

This project helps data model developer to manage Hive tables with different tables, storage types, column types and column properties required for different use case development.

**Roles & Responsibility**

- Building .xml files to define structures of hive tables to be used for storing process data generated.
- Actively involved in development to read .xml files, create data models and load data in hive.

**Technologies Used**

Java, JAXB, JDBC, Hadoop, Hive,

**Sample User Stories**

** [Study User Story 01] – **Come up with a design to represent data model required to handle the following scenarios

- To Handle different operations like “CREATE”, “UPDATE”,”DELETE”
- Way to define partition table
- To Store columns in Orders
- To Store column Name
- To Handle Update of Column type and Name

* [User Story 02] – *HQL Generator – As a developer, we have to provide a functionality to create table

***Tasks***

– [ ] . Building Maven project and adding dependency

– [ ] . Integrate Loggers

– [ ] . Code Commit

– [ ] . Create a standard package structure.

– [ ] . Utility to read xml and create Java Object

– [ ] . Utility code to communicate to Hive DB

– [ ] . Check for Hive Service before executing queries

– [ ] . Code to construct HQL query for create.

– [ ] . Exception Handling.

*Definition of Done*

– [ ] Package structure should be created.

– [ ] Table has to be created in Hive

– [ ] Validate all required schema is created

– [ ] Validation of Hadoop + Hive Services

***Test Cases***

1.If table already exists we need to print “Table already exists”

2.Verify schema with xml

3.If services are not up and running,it should handle and log it.

### Batches not available

**Is Java a pre-requisite to learn Big Data Masters Program?**

Yes Java is a pre-requisite, there are institutes who says Java is not required all those are false information

**Can I attend a demo session before enrollment?**

Yes, You will be sitting in an actual live class to experience the quality of training.

**How will I execute the Practicals?**

We will help you to setup NPN Training’s Virtual Machine + Cloudera Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

**Who are the Instructor at NPN Training?**

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

**how do I access the eLearning content for the course?**

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content

**What If I miss a session?**

The course validity will be one year so that you can attend the missed session in another batches.

**Do I avail EMI option?**

The total fees you will be paying in 2 installments

**Are there any group discounts for classroom training programs?**

Yes, we have group discount options for our training programs. Contact us using the form “**Drop Us a Query**” on the right of any page on the NPN Training website, or select the Live Chat link. Our customer service representatives will give you more details.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

## Reviews

**Anindya Banerjee**

**Cognizant**

Linkedin

After searching extensively in internet about big data courses,I came to know about NPN training and Naveen sir…it was a tough call for me to pick the right training institute which would provide me the right blend of practical exposure with theoretical knowledge on big data and Hadoop technologies..After attending few classes I am mesmerized by Naven Sir’s way of teaching ,his command over various topics and his study materials.unlike other training institutes ,Naveen sir believes in extensive learning and he believes in hands on training which makes ‘NPN training’far apart from other institutes.I would highly recommend any one to join NPN training if he/she wants to make his career in big data technologies .

**Sarbartha Paul**

**HCL Technologies**

Linkedin

The best thing I liked about this institute is the way Naveen sir teaches and also is his way of taking care of individual person’s doubt and interest and his tendency to make others learn big data with complete hands-on experience. The theory he teaches is compact and crunchy enough to get a good hold of the basics.

Other thing that always keeps this institute apart is the way Naveen sir has designed it’s Big Data Architecture Program course which covers nearly everything that other institutes lack. The course materials are also very to the point.

In one word, Naveen sir’s way of teaching is a class apart!

I am greatly moved by his ideology and teaching and this is probably one of the finest institute in town as far as big data courses are concerned. It is worth joining his classroom in all aspect. Thank you for all your efforts sir!

**Sai Venkata Krishna**

**Capgemini**

Linkedin

Naveen is an excellent trainer. Naveen mainly focus on HANDS ON and REAL TIME SCENARIOS which helps one to understand the concepts easily.I feel that NPN training curriculum is best in market for Big Data.

Naveen is very honest in his approach and he delivers additional concepts which are not present in the syllabus of particular topics.E learning and assignments are very informative and helpful.The amount you pay for the Big data course is worth every penny.

Thank you NPN Training for your support and motivation.

**Big Data Masters Program**, big data training with project hadoop training in bangalore **Big Data Training and Certification** best big data training institutes near me spark and scala training in bangalore best spark and scala training institutes in bangalore best hadoop training institutes in bangalore **Big Data Training and Certification** Big Data Training with project Big Data Live Project Training, Best Training institute to learn Big Data Big Data Training for working professionals big data training with a real time project spark and scala training with a real time project