RDD : Lazy Evaluation

Lazy Evaluation helps to optimize the Disk & Memory Usage in Spark. Consider this example,


$ cat words.txt
line1 word1
line2 word2 word1
line3 word3 word4
line4 word1

$ val lines = sc.textFile("words.txt") //Transformation(1)
$ val filtered = lines.filter(line => line.contains("word1"))
$ filtered.first() //Action(2)
res0: String = line1 word1
Based on the code above, we would infer that the file ‘words.txt’ will be read during the execution of  Transformation operation (1). But this never happens in Spark. Instead, the file will only be read during the execution of action operation (2). The benefit of this Lazy Evaluation is, we only need to read the first line from the File instead of the whole file & also there is no need to store the complete file content in Memory
Thus we can say that, Transformations in Spark is Lazily evaluated and Spark will not evaluate the Transformations until it sees an action.