行式存储和列式存储的比较

行式存储的优点:

同一行数据存放在同一个block块里面,select * from table_name;数据能直接获取出来;

 INSERT/UPDATE比较方便

行式存储的缺点:

不同类型数据存放在同一个block块里面,压缩性能不好;

select id,name from table_name;这种类型的列查询,所有数据都要读取,而不能跳过。


列式存储的优点:

同类型数据存放在同一个block块里面,压缩性能好;

任何列都能作为索引。

列式存储的缺点:

select * from table_name;这类全表查询,需要数据重组;

INSERT/UPDATE比较麻烦。


create table page_views_orc_zlib
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC 
TBLPROPERTIES("orc.compress"="ZLIB")
as select * from page_views;
#默认是zlib,写不写都一样

create table page_views_orc_snappy
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS ORC 
TBLPROPERTIES("orc.compress"="SNAPPY")
as select * from page_views;



create table page_views_parquet
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET 
as select * from page_views;


set parquet.compression=gzip;
create table page_views_parquet_gzip
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET 
as select * from page_views;


【来自@若泽大数据】


版权声明:本文为weixin_39182877原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。