Hive SerDe – RegexSerDe

SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

The SerDe interface allows you to instruct Hive as to how a record should be processed. A SerDe is a combination of a Serializer and a Deserializer (hence, Ser-De).

 

What is RegexSerDe

Regex stands for a regular expression. Whenever you want to have a kind of pattern matching, based on the pattern matching, you have to store the fields. RegexSerDe is present in org.apache.hadoop.hive.contrib.serde2.RegexSerDe.

In the SerDeproperties, you have to define your input pattern and output fields.

Let’s consider employee dataset

employee.csv

1$Naveen|Cisco

2$Praveen|Infosys

If you want to extract the column values from the above dataset you can use the below mentioned pattern

‘input.regex’ = ‘(.*)$(.*)|(.*)’

To specify how to store them, you can use

‘output.format.string’ = ‘%1$s%2$s%3$s’;

Create a table

CREATE TABLE employee(id INT,name STRING,exp INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '^(\\d+)\\$(.*)\\|(\\d+)');

Load data into a table

load data local inpath '/home/naveen/Desktop/employee.txt' into table employee;

Output

hive> select * from employee;
employee.id	employee.name	employee.exp
1	Ram	2
2	Pavan	6
2	Girish	10

d

Download code

Reference documentation


About the course

Hadoop Training with Real time project

Big Data Hadoop Training in Bangalore provided by NPN Training is a program designed to help professionals gain proficiency to work with the latest and core components of Hadoop like MapReduce, Hive, HBase, Sqoop, Quartz Scheduler, Pig and more. The course is prepared with inputs from the best in the industry.

 

 

Naveen P.N

12+ years of experience in IT with vast experience in executing complex projects using Java, Micro Services , Big Data and Cloud Platforms. I found NPN Training Pvt Ltd a India based startup to provide high quality training for IT professionals. I have trained more than 3000+ IT professionals and helped them to succeed in their career in different technologies. I am very passionate about Technology and Training. I have spent 12 years at Siemens, Yahoo, Amazon and Cisco, developing and managing technology.