hive的Json解析大集合(好文收藏)

需求背景:
日常生产环境中,对于埋点上报数据中有json格式的数据,需要解析json格式的字符串里面的字段和数值.对于这种需求,hive系统内置解析json格式的函数.

1.get_json_object(jsonstr,’$.key’)

说明:str表示待解析的map,key抽取指定字段的值;这个函数每次只能返回一个key对应的值;若想要解析多个key的值只能多写几次这个函数才行

测试:
select get_json_object('{"id":"c1","address":"shenzheng"}','$.id') id;
结果:
id:c1

2.json_tuple(jsonstr,key1,key2,key3,…)

说明:解析json格式的字符串,之前的get_json_object,不能一次性解析,那么就衍生出了
json_tuple这个函数,可以一次性解析多个字段

测试1:
select 
json_tuple('{"id":"001","name":"zhangsan","address":"shenzheng"}','id','name','address') as (id,name,address)
结果:
id name address
001 zhangsan shenzheng

测试2:
select
b.id,
b.name
from a lateral view 
json_tuple('{"id":"001","name":"zhangsan","address":"shenzheng"}','id','name') b as id,name;
结果:
id name
001 zhangsan

总结:以上两个函数是解析json格式的字符串,但是无法解析json数组.

hive内置解析json数组的函数

1.explode(map/array) :炸裂函数专门处理map和array格式的数据,一行转多行输出

测试map:
select explode(map('001',10,'002',20,'003',30));
结果:
key  value
001  10
002  20
002  30

2.regexp_replace(string str,string str1,string str2) :替换函数
说明:将字符串str中满足java正则表达式str1的替换成str2,注意某些情况下
需要加上转义字符
解析的数据: [{“id”:“001”,“name”:“zhangsan”,“address”:“shenzheng”},{“id”:“002”,“name”:“zhangsan”,“address”:“shenzheng”}]
需要结合以上两个函数做解析

2.1
测试:
select REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]','');
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}
去掉[]
2.2
测试:把数组内每个元素的分隔符改成&,便于后面切分处理
select REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{');
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"}&{"id":"002","name":"zhangsan","address":"shenzheng"}
2.3
测试:使用split()函数按照'&'切分形成数组,就可以用explode函数炸裂成json格式的字符串

select split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&')

结果:
["{\"id\":\"001\",\"name\":\"zhangsan\",\"address\":\"shenzheng\"}","{\"id\":\"002\",\"name\":\"zhangsan\",\"address\":\"shenzheng\"}"]

2.4
测试:使用explode函数炸开数组
select explode(split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&')) 
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"}
{"id":"002","name":"zhangsan","address":"shenzheng"}
2.5
测试:使用前面介绍的json_tuple解析出里面的字段和值(注意此处需要使用子查询才行,不然会报错)
    select 
    json_tuple(b.json_str,'id','name','address') as (id,name,address)
    from 
    (select explode(split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&')) as json_str) b
结果:
id name address 
001 zhangsan shenzheng
002 zhangsan shenzheng

写在最后,如果本文对你有所帮助,原创不易,欢迎点赞一波哈。


版权声明:本文为weixin_45857425原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。