在日常工作中,我们在处理数据时,会遇到某个列存在多个数据的情况,如果想拿到这个列里面每个数据进行后续处理的话,这种情况下有两种处理方式:
第一种:如果这个列多个数据是固定数目,可以使用split切分
第二种:如果是不定数目,下面我们就来看看这种情况要怎么处理。
先创建一张临时表供测试,sql如下:
-- 执行引擎:spark引擎
-- 1.创建表
create table tmp.tmp1(book_name string comment '书名',character_name string comment '人名');
-- 2.插入数据
insert into tmp.tmp1
select
"《凡人修仙传》" as book_name,"韩立" as character_name
union all
select
"《凡人修仙传》" as book_name,"厉飞雨" as character_name
union all
select
"《凡人修仙传》" as book_name,"南宫婉" as character_name
union all
select
"《凡人修仙传》" as book_name,"紫灵仙子" as character_name
union all
select
"《凡人修仙传》" as book_name,"银月" as character_name
union all
select
"《凡人修仙传》" as book_name,"元瑶" as character_name
union all
select
"《凡人修仙传》" as book_name,"董萱儿" as character_name
union all
select
"《凡人修仙传》" as book_name,"墨彩环" as character_name
union all
select
"《凡人修仙传》" as book_name,"冰凤" as character_name
;
-- 3.查看结果
select * from tmp.tmp1;
sql执行结果:
第一步:
先把需要转换的各列拼到一起,想要的结果数据样式如下:
下面两个sql执行的结果是一样的,具体sql如下:
sql1:
select book_name,concat_ws(',',collect_set(cast(character_name as string))) as character_name
from tmp.tmp1 group by book_name;
sql2:
select
concat_ws(',',collect_set(cast(book_name as string))) as book_name
,concat_ws(',',collect_list(cast(character_name as string))) as character_name
from tmp.tmp1;
注意:使用concat_ws函数,需将字段转成string格式,collect_set会对该列进行去重,如果不需要去重,可使用collect_list参数代替。
第二步:
用split函数把上面拼接的字符串转换为分割数组,类型为array。sql为:
select
split(concat_ws(',',collect_set(cast(book_name as string))),',') as book_name
,split(concat_ws(',',collect_set(cast(character_name as string))),',') as character_name
from tmp.tmp1;
sql执行结果:
第三步:
第二步的返回结果为array数组类型,如果想取上面数组的第5个数的值也就是"《凡人修仙传》"、”紫灵仙子“,加个下标就能实现了。(注意:这跟行转列无关)具体sql为:
select book_name,split(concat_ws(',',collect_set(cast(character_name as string))),',')[4] as character_name
from tmp.tmp1
group by book_name;
第四步:
用explode函数实现行转列,任选其中一个字段,sql为:
select
explode(split(concat_ws(',',character_name),',')) as character_name
from tmp.tmp1;
sql执行结果为:
第五步:
上面方法已经实现了行转列,不过仅仅实现了一列,如果表中还有其他字段也需要一起展示出来,这个时候就要用到lateral view了,不过写法跟上面有所不同,sql为:
select t1.*,t2 from tmp.tmp1 t1 lateral view explode(split(concat_ws(',',character_name),',')) cn as t2;
sql执行结果:
备注:解释下,from后面是你的表名,在表名后面加lateral view explode(也就是你的行转列sql语句) ,同时还必须起一个别名,我这个字段的别名为t2,然后再看看select后面的 t1.*,就是原表的字段,我这里有2个字段book_name、character_name,都有值,所以最后的效果就是上述图片的效果了。
版权声明:本文为qq_35684452原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。