Executor, Job, Stage, Task & RDD Persistence

Let us start an Application. For this demo, Scala shell acts as a Driver (Application)

$ spark-shell

Connect to web app(localhost:4040) and explore all the tabs. Except for Environment & Executors tab all other tabs are empty



That clearly indicates we have an Executor running in the background to support our Application.


The First Run

$ val data = sc.parallelize(1 to 10)
data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>

$ data.count
res0: Long = 10  

Let us check all the tabs,

Jobs Tab


We are able to see how our action(count) run is bifurcated into sub components (Job -> Stages -> Tasks). So any action is converted into Job which in turn is again divided into Stages, with each stage having its own set of Tasks.

Given below is the snapshot of other Tabs which are self explanatory

Stages Tab


Executors Tab


The Second run

Let us once again run the action ‘count’ and see what happens. This time we got a new Job(Job Id 1) with its own Stages & Tasks

$  data.count
res1: Long = 10

Jobs Tab

Note : Clicking on Description of each Job will take us to Stages specific to that Job

Stages Tab

Clicking on Description of each Stage will provide us information on Tasks related to that Stage

Executors Tab

Adding Persistence

Let us explore the concept of Persistence using Web App. Let us Persist to our RDD ‘data’ and run action ‘count’ again. This time we are interested in Storage Tab(It has been empty so far)

$   import org.apache.spark.storage.StorageLevel
$   import org.apache.spark.storage.StorageLevel

$   data.persist(StorageLevel.MEMORY_ONLY)
res3: data.type = ParallelCollectionRDD[0] at parallelize at <console>:21

$   data.count
res4: Long = 10
The tab provides us information on RDDs that are persisted. The link in RDD Name provides us more information on the RDD

The Executor tab also provides us information on the Memory that is used to persist data

Naveen P.N

12+ years of experience in IT with vast experience in executing complex projects using Java, Micro Services , Big Data and Cloud Platforms. I found NPN Training Pvt Ltd a India based startup to provide high quality training for IT professionals. I have trained more than 3000+ IT professionals and helped them to succeed in their career in different technologies. I am very passionate about Technology and Training. I have spent 12 years at Siemens, Yahoo, Amazon and Cisco, developing and managing technology.