Category: Apache Spark

Apache Spark : Loading CSV file Using Custom Timestamp Format

In this blog post, we will see how to load csv which contains timestamp as one of the column. Creating DataFrame from CSV file If you see the below data set it contains 2 columns event-name and event-date.The event-date column is a timestamp with following format “DD-MM-YYYY HH MM SS“. EVENT_ID,EVENT_DATE AUTUMN-L001,20-01-2019 15 40 23 AUTUMN-L002,21-01-2019 01 20 12 AUTUMN-L003,22-01-2019...

Working with Nested JSON in Spark

JSON could be a quite common way to store information. however JSON will get untidy and parsing it will get tough. Here are some samples of parsing nested data structures in JSON Spark DataFrames (examples here finished Spark one.6.0).   Sample JSON File: { “user”: “gT35Hhhre9m”, “dates”: [“2016-01-29”, “2016-01-28”], “status”: “OK”, “reason”: “some reason”, “content”: [{ “foo”: 123, “bar”: “val1″...