RDD : Lazy Evaluation


Lazy Evaluation helps to optimize the Disk & Memory Usage in Spark. Consider this example,


$ cat words.txt
line1 word1
line2 word2 word1
line3 word3 word4
line4 word1

$ val lines = sc.textFile("words.txt") //Transformation(1)
$ val filtered = lines.filter(line => line.contains("word1"))
$ filtered.first() //Action(2)
res0: String = line1 word1
Based on the code above, we would infer that the file ‘words.txt’ will be read during the execution of  Transformation operation (1). But this never happens in Spark. Instead, the file will only be read during the execution of action operation (2). The benefit of this Lazy Evaluation is, we only need to read the first line from the File instead of the whole file & also there is no need to store the complete file content in Memory
Thus we can say that, Transformations in Spark is Lazily evaluated and Spark will not evaluate the Transformations until it sees an action.

Naveen P.N

12+ years of experience in IT with vast experience in executing complex projects using Java, Micro Services , Big Data and Cloud Platforms. I found NPN Training Pvt Ltd a India based startup to provide high quality training for IT professionals. I have trained more than 3000+ IT professionals and helped them to succeed in their career in different technologies. I am very passionate about Technology and Training. I have spent 12 years at Siemens, Yahoo, Amazon and Cisco, developing and managing technology.