Category: Apache Spark

Spark textFile() VS wholeTextFile()

textFile() def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String] Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings For example sc.textFile(“/home/hdadmin/wc-data.txt”) so it will create RDD in which each individual line an element. wholeTextFile() def wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions):...