Friday, July 1, 2016

Apache Kafka

Installing Apache Kafka Steps to Install Apache Kafka and Testing 1. Download "kafka_2.11-0.10.0.0.tgz" from Apache. 2. Extract files from tgz file tar -xvzf kafka_2.11-0.10.0.0.tgz 3. Starting zookeeper server bin/zookeeper-server-start.sh config/zookeeper.properties 4. Starting Kafka Broker server bin/kafka-server-start.sh config/server.properties 5. Creating a Topic bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partition 1 --topic topictest 6. Starting Producer console to publish messages into Topic bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topictest 7. Starting Consumer console to consume messages from Topic bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topictest --from-beginning

Wednesday, June 22, 2016

Hive - Working with JSON files

Hive has two built-in functions, get_json_object and json_tuple for dealing with JSON. There are also a couple of JSON SerDe's (Serializer/Deserializers)- OPENX, Cloudera and Hcatalog for Hive.

Handling JSON using Build in Functions { "Product": "Laptop", "ID": "123456", "Address": { "BlockNo": 12, "Place": "Bangalore" } } CREATE TABLE json_test ( json string ); LOAD DATA LOCAL INPATH '/test.json' INTO TABLE json_test; Using get_json_Object select get_json_object(json_test, '$.Product') as Product, get_json_object(json_test, '$.ID') as ID, get_json_object(json_test, '$.Address.BlockNo') as BlockNo, get_json_object(json_test, '$.Place.Bangalore') as Place from json_test; Using json_tuple select v.Product, v.ID, v.Address, v.BlockNo from json_test test LATERAL VIEW json_tuple('Product', 'ID', 'Address', 'Address.BlockNo') v as Product, ID, Address, BlockNo;

Handling JSON using SerDe's Converting JSON schema to Hive table Create Schema Step 1: Open a file in notepad++ Replace all " with empty Replace all :{ with :struct< Replace all :[ with :array< Replace all } with > Replace all { with struct< Replace all ] with > Replace all Null with STRING Replace all hive keyword ( function, group) with 'function' or 'group' Replace all field values to STRING Step 2: Example JSON file { "Product": "Laptop", "ID": "123456", "Address": { "BlockNo": 12, "Place": "Bangalore" } } Step 3:Creating Hive Serde table CREATE TABLE json_serde ( Product string, ID string, Address struct ) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'; Step 4 : LOAD DATA LOCAL INPATH '/test.json' INTO TABLE json_serde; Step 5: SELECT Product,ID,Address.BlockNo as BlockNo, Address.Place as location from json_serde;