Spark textFile() VS wholeTextFile()

Table of Contents1 textFile()2 wholeTextFile() textFile() def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String] Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings For example sc.textFile(“/home/hdadmin/wc-data.txt”) so it will create RDD in which each individual line an element. wholeTextFile() def wholeTextFiles(path:...

Big Data Frameworks every programmer should know

Table of Contents1 Introduction1.1 Connection between Big Data and Analytics1.2 Why Big Data is essentials for programmers?1.3 Big Data Frameworks every programmer should know1.4 Hadoop1.4.1 Spark1.4.2 Mahout1.4.3 HBase1.4.4 Hive1.4.5 Pig1.4.6 Logstash Introduction Big Data is a major buzz word in the current technological forefront. Big Data technologies have given rise to the usage of cutting-edge research to practical applications. Machine...

Hive Optimization Techniques in Hadoop 2.x

Enable the below Properties in hive SQL for large volumes of data: SET hive.execution.engine = tez; SET mapreduce.framework.name=yarn-tez; SET tez.queue.name=SIU; SET hive.vectorized.execution.enabled=true; SET hive.auto.convert.join=true; SET hive.compute.query.using.stats = true; SET hive.stats.fetch.column.stats = true; SET hive.stats.fetch.partition.stats = true; SET hive.cbo.enable = true; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.parallel=true; SET hive.exec.mode.local.auto=true; SET hive.exec.reducers.bytes.per.reducer=1000000000; (Depends on your total size of all tables...