Components of Spark


Following are some important components of Spark

  1. Cluster Manager
    1. Is used to run the Spark Application in Cluster Mode
  2. Application
    • User program built on Spark. Consists of,
    • Driver Program
      • The Program that has SparkContext. Acts as a coordinator for the Application
    • Executors
      • Runs computation & Stores Application Data
      • Are launched at the beginning of an Application & runs for the entire life time of an Application
      • Each Application gets it own Executors
      • An Application can have multiple Executors
      • An Executor is not shared by Multiple Applications
      • Provides in-memory storage for RDDs
      • For an Application, No >1 Executors run in the same Node
  3. Task
    1. Represents a unit of work in Spark
    2. Gets executed in Executor
  4. Job 
    1. Parallel Computation consisting of multiple Tasks that gets spawned in response to Spark
Related Post
Aggregation using collect_set on Spark DataFrame In this blog post you will learn how to use collect_set on Spark DataFrame and also how to map the data to a domain object. Introduction collect_set...

Naveen P.N

12+ years of experience in IT with vast experience in executing complex projects using Java, Micro Services , Big Data and Cloud Platforms. I found NPN Training Pvt Ltd a India based startup to provide high quality training for IT professionals. I have trained more than 3000+ IT professionals and helped them to succeed in their career in different technologies. I am very passionate about Technology and Training. I have spent 12 years at Siemens, Yahoo, Amazon and Cisco, developing and managing technology.