Author: Naveen P.N

12+ years of experience in IT with vast experience in executing complex projects using Java, Micro Services , Big Data and Cloud Platforms. I found NPN Training Pvt Ltd a India based startup to provide high quality training for IT professionals. I have trained more than 3000+ IT professionals and helped them to succeed in their career in different technologies. I am very passionate about Technology and Training. I have spent 12 years at Siemens, Yahoo, Amazon and Cisco, developing and managing technology.

Spark textFile() VS wholeTextFile()

textFile() def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String] Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings For example sc.textFile(“/home/hdadmin/wc-data.txt”) so it will create RDD in which each individual line an element. wholeTextFile() def wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions):...