Apache Spark And Scala Training In Bangalore

With NPN Training’s Apache Spark and Scala certification training you would advance your expertise in Big Data Hadoop Ecosystem. With this Apache Spark Scala certification you will master the essential skills such as Scala Programming, Spark Streaming, Spark SQL, Machine Learning Programming, GraphX Programming, Shell Scripting Spark.


  • About the courses
  • Curriculum
  • FAQ's
  • Certification
  • Review

About the Course

Be the expert in Big Data processing by learning the conceptual implementation of Apache Storm and Apache Spark using Scala Programming

This course includes Apache Storm, Spark & Scala that is designed keeping in mind the industry requirements for high-speed processing of data.

 

At NPN Training we believe in the philosophy "Learn by doing" hence we provide complete

Hands-on training with a real time project development.

 

Course Objectives

After completing the Apache Spark & Scala course, you will be able to:

1.   Understanding Apache Spark

2.   Understanding Scala & its implementation

3.   Understand Functional Programming in Scala

4.   Understand Control Structures, Loops, Collection and more using Scala

5.   Master the concepts of Traits andd OOPs in Scala

6.   Comparision between Spark and Scala

7.   Install Spark and implement Spark operations on Spark Shell

8.   Understand the role of RDD

9.   Implement Spark applications on YARN (Hadoop)

10. Streaming data using Spark Streaming API

11. Implement Maching Learning algorithms in Spark using MLib API

12. Analyze Hive and Spark SQL Architecture

13. Implement Spark SQL queries to perform several computations

14. Understand GraphX API and implement graph algorithms.

15. Implement Broadcast variable and accumulators for performance tuning.

 

As part of the course work, you will work on the below mentioned projects,

 

Project #1 : Smart Data Generator

Industry : General

Creating a project which generate dynamic mock data based on the schema at real time  which can be further used for Real time Processing Streaming using Apache Storm or Spark Streaming.

 

Project #2 : Analysis of Call Detail Record (CDR)

Industry : Telecom

You will be given a CDR (Call Detail Record) which is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transaction (e.g., text message) that passes through that facility or device.

 

Scala Programming - Multi Paradigm :Functional, Object Oriented

 

Module 01 - Introduction to Scala for Apache Spark +
Learning Objectives - In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

Topics -
  • Introduction to Scala REPL
  • Basic Scala operations
  • Exploring different Variable Types
    1. Mutable Variables - [Hands-on]
    2. Immutable Variables - [Hands-on]
  • Type Inference in Scala - [Hands-on]
  • Block Expressions
  • Exploring Lazy evaluation in Scala
  • Control Structures in Scala
  • Exploring different variants of for loop
    1. Enhanced for loop. - [Hands-on]
    2. For loop with yield. - [Hands-on]
    3. For Loop with if conditions : Pattern Guards - [Hands-on]
  • Match Expressions - [Hands-on]
  • Exploring Functions in Scala
  • Exploring Procedures in Scala

Collections in Scala

    i.    Array

    ii.   ArrayBuffer

    iii.   Map

    iv.   Tuples

     v.   Lists

For more assignments check E-Learning
Module 02 - Object Oriented Programming in Scala +
Learning Objectives - In this module, you will learn Object Oriented Programming in Scala.

Topics -
  • Class in Scala
  • Getters and Setters
  • Exploring Constructors
    1. Auxilliary
    2. Primary
  • Exploring Case classes
  • Singletons
  • Exploring Companion Objects
  • Inheritance in Scala
  • Exploring Traits
    1. Traits as Interfaces
    2. Layered Traits
For more assignments check E-Learning
Module 03 - Functional Programming using Scala +
Learning Objectives - In this module, you will learn Object Oriented Programming in Scala.

Topics -
  • Imperative VS Declarative Style
  • Introduction to Funtional Programming
  • Principles of Funtional Programming
    1. Pure Functions
    2. Functions are First Class values
  • Exploring 'First Class Functions'
    1. Storing Functions in a variables
    2. Method returning a function
    3. Methods returning a function
  • Functions VS Methods
  • Converting methods to functions using eta expansion
  • Invoking Functions with Tuples as Parameters
  • Exploring High Order Functions

 

Apache Spark - Cluster Computing Framework

 

Module 04 -  Overview of Apache Spark +
Learning Objectives - In this module, you will learn cluster computing framework and learn about spark architectuture in comparision with Hadoop eco-system.

Topics -
  • Overview of Apache Spark
  • Features of Spark
  • Understanding Data sharing in MapReduce
  • Understanding Data sharing in Spark
  • Spark Eco-system
  • Overview of RDD
  • Exploring RDD properties
    1. Immutable
    2. Lazy Evaluated
    3. Cacheable
    4. Type Inferred
  • Understanding Partitions in Spark
  • Characteristics of Partitions in Spark
  • Spark Architecture
  • Eco-system of Hadoop vs Spark
 
 
Module 05 - Spark Common Operations +
Learning Objectives - In this module, you will learn one of the fundamental building blocks of Spark – RDDs and related manipulations for implementing business logics (Transformations, Actions and Functions performed on RDD).

Topics -
  • Installing Apache Spark on Windows - [Hands-on Activity]
  • Starting Spark Shell
  • Exploring different ways to start Spark
  • Understanding Spark UI - [Activity]
  • RDD Creations - [Hands-on]
    1. Loading a file
    2. Parallelize Collections
  • Exploring RDD Operations
    1. Transformations
    2. Actions
  • Rdd Actions - [Hands-on]
    1. count()
    2. first()
    3. take(int)
    4. saveAsTextFile(path:String)
    5. reduce(func)
    6. collect(func)
  • RDD Transformations - [Hands-on]
    1. map(func)
    2. foreach(func)
    3. filter(func)
    4. coalesce(int)
  • Passing functions to Spark High Order functions
    1. Anonymous function - [Industry Practices]
    2. Passing Named function - [Industry Practices]
    3. Static singleton function - [Industry Practices]
  • [Use case] - Analyzing Movie lens dataset and performing Actions and Transformation
  • Chaining Transformation and Actions in Spark
  • Running Spark programs in eclipse using Maven - [Industry Practices]
  • Understanding and initializing SparkSession  i.e Spark 2.0 entry point
  • [Performance Improvement] - Understanding Spark Caching
  • Loading RDD
    1. textFile
    2. wholeTextFiles
  • Saving RDD

 

For more assignments check E-Learning
 
Module 06 - Playing with RDD's  +
Learning Objectives - In this module, you will learn one RDD's in more details with with industry standard use cases.

Topics -

RDD Caching and Persistance

reduce() vs fold()

Scala RDD Extensions

    i.    DoubleRDDFunctions

    ii.   PairRDDFunctions

    iii.   OrderedRDDFunctions

    iv.  SequenceFileRDDFunctions

Exploring Aggregate Functions

groupByKey function

reduceByKey function

For more assignments check E-Learning
Module 07 - Spark SQL - DataFrame API + DataSet API +
Learning Objectives - In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about data-frames and datasets in Spark SQL and perform SQL operations on data-frames.

Topics -
  • Discussing Abstractions in Spark 2.x
  • Introduction to SparkSQL
  • Features of Spark SQL
  • Overview of DataFrames
  • Understanding org.apache.spark.sql.DataFrameReader class
  • Instantiating org.apache.spark.sql.DataFrameReader using SparkSession - [Hands-on]
  • Creating a DataFrame from JSON file - [Hands-on]
  • Creating a DataFrame from CSV file - [Hands-on]
  • Understanding and using inferschema 
  • Creating a custom schema and querying
  • Understanding DataFrame explain() function [Industry practices]
  • Registering DataFrame as a Table
  • Operations supported by DataFrames
  • Converting RDD to DataFrame
  • [Use case] Analysing Employee dataset
  • Exploring Pivots - [Industry practices]
  • Join Operations in DataFrame

E-Commerce Data Analysis - [Real-time industry use case]

Use case Description :

Given the E-commerce data set of customer information of an organization and item dataset we have to perform below mentioned analysis.

1. Filter out the entire customer belonging to group 400.
2. Find the number of the customer from each country and find the total item price of all the purchased items from each country.


 

For more assignments check E-Learning
Module 08 - Spark Streaming +
Learning Objectives - In this module, you will learn Spark streaming which is fault-tolerant streaming applications. You will learn about DStreams and various Transformations performed on it. You will get to know about main streaming operators, Sliding Window Operators and Stateful Operators.

Topics -
  • Understanding Data Streaming
  • Overview of Spark Streaming
  • Spark Streaming Use cases
  • Working of Spark Streaming
  • Understanding DStreams / Discretized Streams 
    1. Transformations
    2. Output operations
  • Spark Streaming -  Entry point of Spark applications
  • Input Streams / Receiver
  • Real Time Streaming using socketTextStream - [Hands-on]
  • Exploring special transformations on DStream API
    • Window Operations
    • UpdateStateByKey
  • Window Operations
  • Block Interval

E-Commerce Data Analysis - [Real-time industry use case]

Use case Description :

  • E -Commerce company wants to build a real time analytics dash board to optimize its inventory and operations .
  • This Dashboard should have the information of how many products are getting purchased , shipped, derived and cancelled every minute.
  • This Dash board will be very useful for operational intelligence

Module 09 - Structured Streaming +
Learning Objectives - In this module, you will learn Structured Streaming which is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine.

Topics -
  • Drawbacks of DStream API
  • Understanding Structured Streaming
  • Input Source
  • Ouput Modes
  • Handling Event-time and Late Data
  • Window Operations on Event Time

 

Apache Kafka - Distributed Messaging System

 

Course description: Today's application are build in the Microservices Architecture. Having a lot of Microservices that needs to communicate with each other can be problematic as they quickly become tight coupled. Apache Kafka allows us to create services that are loosely coupled and operate in the event driven way.

Module 10 - Understanding Apache Kafka +
Learning Objectives - In this module, you will understand Big Data, Kafka and Kafka Architecture.

Topics -
  • Integration without Messaging System
  • Integration with Messaging System
  • What is Kafka
  • Download and install Kafka
  • Components of Messaging System
  • Exploring Kafka components
    1. Producer
    2. Consumer
    3. Broker
    4. Cluster
    5. Topic
    6. Partitions
    7. Offset
    8. Consumer groups
  • Installing Kafka
  • Kafka concepts - [Hands-on]
    1. Starting Zookeeper
    2. Starting Kafka Server
    3. Creating a topic
    4. Start a console producer
    5. Start a console consumer
    6. Sending and receiving messages

 

Module 11 - Exploring Kafka Producer,Consumer API +
Learning Objectives - In this module, you will understand Big Data, Kafka and Kafka Architecture.

Topics -
  • Adding Kafka dependency to Maven - [Activity]
  • Exploring Kafka core API's
  • Exploring Kafka producer API
  • Sending events to Kafka - Producer API - [Hands-on]
  • Exploring Kafka consumer API
  • Reading events from Kafka - Consumer API - [Hands-on]
  • Consumer Pool Loop - Offset Management
  • Changing the configuration of a topic - [Hands-on]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies

 

Contact us


+91-9535584691 | +91-8095918383

Upcoming batches

Apr

28

Apache Spark & Scala

Timings
- (Weekend Saturday batch)
Fees 19,000 INR

May

12

Apache Spark & Scala

Timings
- (Weekend Saturday batch)
Fees 19,000 INR

Course Features

Big Data Architect Masters Program Training
4.8 stars - based on 150 reviews