Changing the Output Delimiter in t

Table of Contents1 Introduction2 Output Delimiter Configuration Property3 Example Introduction Hadoop’s default output delimiter (character isolating the output key and value) is tab (“\t”). This post explains the best approach to alter the default Hadoop output delimiter.   Output Delimiter Configuration Property The output delimiter of a Hadoop job can easily be altered by Changing the mapred.textoutputformat.separator configuration property. This property can...

Working with Nested JSON in Spark

JSON could be a quite common way to store information. however JSON will get untidy and parsing it will get tough. Here are some samples of parsing nested data structures in JSON Spark DataFrames (examples here finished Spark one.6.0).   Sample JSON File: { “user”: “gT35Hhhre9m”, “dates”: [“2016-01-29”, “2016-01-28”], “status”: “OK”, “reason”: “some reason”, “content”: [{ “foo”: 123, “bar”: “val1″...