Handling JSON files with Hive is not always an easy task.
If you need to extract some specific fields from a structured JSON, we have some alternatives:
There are two UDF functions that are usually helpful on this cases: ‘get_json_object’ and ‘json_tuple’. These functions allows you to access json fields from Hive without installing additional libraries.
get_json_object:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object
json_tuple:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-json_tuple
To navigate JSON structure with get_json_object, the entire JSON file has to be mapped as string.
As example:
1) Create the external table as string:
CREATE EXTERNAL TABLE json_table (str string) LOCATION 's3://mybucket/input/jsonserde' ;
2) select field[0] from store.fruit field:
select get_json_object(json_table.str, '$.store.fruit\[0]') as MyField from json_table;