需求背景:
日常生产环境中,对于埋点上报数据中有json格式的数据,需要解析json格式的字符串里面的字段和数值.对于这种需求,hive系统内置解析json格式的函数.
1.get_json_object(jsonstr,’$.key’)
说明:str表示待解析的map,key抽取指定字段的值;这个函数每次只能返回一个key对应的值;若想要解析多个key的值只能多写几次这个函数才行
测试:
select get_json_object('{"id":"c1","address":"shenzheng"}','$.id') id;
结果:
id:c1
2.json_tuple(jsonstr,key1,key2,key3,…)
说明:解析json格式的字符串,之前的get_json_object,不能一次性解析,那么就衍生出了
json_tuple这个函数,可以一次性解析多个字段
测试1:
select
json_tuple('{"id":"001","name":"zhangsan","address":"shenzheng"}','id','name','address') as (id,name,address)
结果:
id name address
001 zhangsan shenzheng
测试2:
select
b.id,
b.name
from a lateral view
json_tuple('{"id":"001","name":"zhangsan","address":"shenzheng"}','id','name') b as id,name;
结果:
id name
001 zhangsan
总结:以上两个函数是解析json格式的字符串,但是无法解析json数组.
hive内置解析json数组的函数
1.explode(map/array) :炸裂函数专门处理map和array格式的数据,一行转多行输出
测试map:
select explode(map('001',10,'002',20,'003',30));
结果:
key value
001 10
002 20
002 30
2.regexp_replace(string str,string str1,string str2) :替换函数
说明:将字符串str中满足java正则表达式str1的替换成str2,注意某些情况下
需要加上转义字符
解析的数据: [{“id”:“001”,“name”:“zhangsan”,“address”:“shenzheng”},{“id”:“002”,“name”:“zhangsan”,“address”:“shenzheng”}]
需要结合以上两个函数做解析
2.1
测试:
select REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]','');
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}
去掉[ 与 ]
2.2
测试:把数组内每个元素的分隔符改成&,便于后面切分处理
select REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{');
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"}&{"id":"002","name":"zhangsan","address":"shenzheng"}
2.3
测试:使用split()函数按照'&'切分形成数组,就可以用explode函数炸裂成json格式的字符串
select split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&')
结果:
["{\"id\":\"001\",\"name\":\"zhangsan\",\"address\":\"shenzheng\"}","{\"id\":\"002\",\"name\":\"zhangsan\",\"address\":\"shenzheng\"}"]
2.4
测试:使用explode函数炸开数组
select explode(split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&'))
结果:
{"id":"001","name":"zhangsan","address":"shenzheng"}
{"id":"002","name":"zhangsan","address":"shenzheng"}
2.5
测试:使用前面介绍的json_tuple解析出里面的字段和值(注意此处需要使用子查询才行,不然会报错)
select
json_tuple(b.json_str,'id','name','address') as (id,name,address)
from
(select explode(split(REGEXP_replace(REGEXP_replace('[{"id":"001","name":"zhangsan","address":"shenzheng"},{"id":"002","name":"zhangsan","address":"shenzheng"}]','\\[|\\]',''),'\\}\\,\\{','\\}\\&\\{'),'\\&')) as json_str) b
结果:
id name address
001 zhangsan shenzheng
002 zhangsan shenzheng
写在最后,如果本文对你有所帮助,原创不易,欢迎点赞一波哈。
版权声明:本文为weixin_45857425原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。