Wednesday, June 22, 2016

Hive - Working with JSON files

Hive has two built-in functions, get_json_object and json_tuple for dealing with JSON. There are also a couple of JSON SerDe's (Serializer/Deserializers)- OPENX, Cloudera and Hcatalog for Hive.

Handling JSON using Build in Functions { "Product": "Laptop", "ID": "123456", "Address": { "BlockNo": 12, "Place": "Bangalore" } } CREATE TABLE json_test ( json string ); LOAD DATA LOCAL INPATH '/test.json' INTO TABLE json_test; Using get_json_Object select get_json_object(json_test, '$.Product') as Product, get_json_object(json_test, '$.ID') as ID, get_json_object(json_test, '$.Address.BlockNo') as BlockNo, get_json_object(json_test, '$.Place.Bangalore') as Place from json_test; Using json_tuple select v.Product, v.ID, v.Address, v.BlockNo from json_test test LATERAL VIEW json_tuple('Product', 'ID', 'Address', 'Address.BlockNo') v as Product, ID, Address, BlockNo;

Handling JSON using SerDe's Converting JSON schema to Hive table Create Schema Step 1: Open a file in notepad++ Replace all " with empty Replace all :{ with :struct< Replace all :[ with :array< Replace all } with > Replace all { with struct< Replace all ] with > Replace all Null with STRING Replace all hive keyword ( function, group) with 'function' or 'group' Replace all field values to STRING Step 2: Example JSON file { "Product": "Laptop", "ID": "123456", "Address": { "BlockNo": 12, "Place": "Bangalore" } } Step 3:Creating Hive Serde table CREATE TABLE json_serde ( Product string, ID string, Address struct ) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'; Step 4 : LOAD DATA LOCAL INPATH '/test.json' INTO TABLE json_serde; Step 5: SELECT Product,ID,Address.BlockNo as BlockNo, Address.Place as location from json_serde;