Category: Hadoop

Changing the Output Delimiter in t

Introduction Hadoop’s default output delimiter (character isolating the output key and value) is tab (“\t”). This post explains the best approach to alter the default Hadoop output delimiter.   Output Delimiter Configuration Property The output delimiter of a Hadoop job can easily be altered by Changing the mapred.textoutputformat.separator configuration property. This property can be set from the code itself or from the...

Hive Optimization Techniques in Hadoop 2.x

Enable the below Properties in hive SQL for large volumes of data: SET hive.execution.engine = tez; SET mapreduce.framework.name=yarn-tez; SET tez.queue.name=SIU; SET hive.vectorized.execution.enabled=true; SET hive.auto.convert.join=true; SET hive.compute.query.using.stats = true; SET hive.stats.fetch.column.stats = true; SET hive.stats.fetch.partition.stats = true; SET hive.cbo.enable = true; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.parallel=true; SET hive.exec.mode.local.auto=true; SET hive.exec.reducers.bytes.per.reducer=1000000000; (Depends on your total size of all tables...