Hive 分区--静态分区、动态分区

Hive静态分区

1、Hive 分区 partition:必须在表定义的时候指定对应的partition字段

     单分区建表语句:create table day_table(idint, content string) partitioned by (dtstring);

                                  单分区表,按天分区,在表结构中存在idcontentdt三列

                                   以dt为文件夹区分

      双分区建表语句:

      create tableday_hour_table(idint, content string) partitioned by (dtstring, hour string);

      双分区表,按天和小时分区,在表结构中新增加了dthour两列

      先以dt为文件夹,再以hour子文件夹区分

2、Hive查询执行分区语法

      SELECT day_table.* FROM day_table WHERE day_table.dt>= '2008-08-08'; 

      分区表的意义在于优化查询。查询时尽量利用分区字段。如果不使用分区字段,就会全部扫描

3、预先导入分区数据,但是无法识别怎么办
      msck repair table tablename
      直接添加分区

1、创建静态单分区表:
hive> 
    > create table student_static_partion1
    > (
    > id int,
    > name String,
    > likes array<String>,
    > address map<String, String>
    > )
    > partitioned by (age int)
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':';
OK
Time taken: 1.963 seconds


2、导入本地数据:当数据被加载至表中时,不会对数据进行任何转换。
Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录

hive> load data local inpath '/opt/software/data/student' into table student_static_partion1 partition(age=16);  
Loading data to table default.student_static_partion1 partition (age=16)
OK
Time taken: 2.432 seconds
hive> 


3、查询导入数据:
hive> select * from student_static_partion1;
OK
1	小红1	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	16
2	小红2	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}	16
3	小红3	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}	16
4	小红4	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}	16
5	小红5	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	16
6	小红6	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}	16
7	小红7	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	16
8	小红8	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}	16
9	小红9	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	16
10	小红10	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}	16
Time taken: 1.858 seconds, Fetched: 10 row(s)
hive> 


=========================================================
创建静态双分区:
hive> create table student_static_partition2
    > (
    > id int,
    > name String,
    > likes array<String>,
    > address map<String, String>
    > )
    > partitioned by (age int,sex String)
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':';
OK
Time taken: 0.159 seconds

查看分区信息:
hive> desc formatted student_static_partition2;
OK
# col_name            	data_type           	comment             
	 	 
id                  	int                 	                    
name                	string              	                    
likes               	array<string>       	                    
address             	map<string,string>  	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
sex                 	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Sat Jun 26 10:35:20 CST 2021	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://mycluster/user/hive/warehouse/student_static_partition2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}
	numFiles            	0                   
	numPartitions       	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1624674920          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	colelction.delim    	-                   
	field.delim         	,                   
	mapkey.delim        	:                   
	serialization.format	,                   
Time taken: 0.113 seconds, Fetched: 43 row(s)

导入数据:partition(age=26,sex='man') age 和 sex 可交换位置
hive> load data local inpath '/opt/software/data/student' into table student_static_partition2 partition(age=26,sex='man');  
Loading data to table default.student_static_partition2 partition (age=26, sex=man)
OK
Time taken: 0.736 seconds
hive> 

查看数据信息:
hive> select * from student_static_partition2;
OK
1	小红1	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	26	man
2	小红2	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}	26	man
3	小红3	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}	26	man
4	小红4	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}	26	man
5	小红5	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	26	man
6	小红6	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}	26	man
7	小红7	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	26	man
8	小红8	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}	26	man
9	小红9	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	26	man
10	小红10	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}	26	man
Time taken: 0.303 seconds, Fetched: 10 row(s)

2、Hive改变分区 

新增分区语法:当数据被加载至表中时,不会对数据进行任何转换。Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录

​​​​​​​ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

删除分区语法, 内部表中、对应分区的元数据和数据将被一并删除

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
  [IGNORE PROTECTION] [PURGE];            -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)

1、新增分区:只新建表中已有列的值,不新增列;只新增目录
hive> alter table student_static_partition2 add partition(age=19,name='female');
FAILED: ValidationFailureSemanticException Partition spec {age=19, name=female} contains non-partition columns

hive> alter table student_static_partition2 add partition(age=19,sex='female');
OK
Time taken: 0.195 seconds
hive> 

此时age=19并没有数据,只是新增了目录:
hive> select * from student_static_partition2 where age = 19;
OK
Time taken: 1.188 seconds



2、删除分区
hive> alter table student_static_partition2 drop partition(age=19);
Dropped the partition age=19/sex=female
OK
Time taken: 0.352 seconds
hive> 


3、MySQL中也可查看分区信息:
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME      | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
|       1 |  1624673853 |                0 | age=16         |    12 |     11 |
|       2 |  1624675035 |                0 | age=26/sex=man |    14 |     12 |
+---------+-------------+------------------+----------------+-------+--------+
2 rows in set (0.00 sec)

恢复分区信息--如果分区信息是直接通过对HDFS命令操作新增或者删除,hive不会感知需要手动修复分区信息(分区信息属于元数据)

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];

A、使用HDFS创建目录

[root@node04 ~]# hdfs dfs -mkdir /user/age=10
21/06/26 12:01:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 ~]# hdfs dfs -mkdir /user/age=20
21/06/26 12:01:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 ~]# 

B、使用HDFS上传数据

[root@node04 data]# hdfs dfs -put /opt/software/data/student /user/age=10
21/06/26 12:09:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 data]# hdfs dfs -put /opt/software/data/student /user/age=20
21/06/26 12:09:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 data]# 

C、建立分区对应的表,并查询无数据,此时就需要修复分区信息

hive> create external table student_ex
    > (
    > id int,
    > name String,
    > likes array<String>,
    > address map<String, String>
    > )
    > partitioned by (age int)
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':'
    > location '/user';
OK
Time taken: 0.101 seconds


==========查询表中数据==============

hive> select * from student_ex;
OK
Time taken: 0.234 seconds
hive> 

============MYSQL中查询无分区信息=============
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME      | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
|       1 |  1624673853 |                0 | age=16         |    12 |     11 |
|       2 |  1624675035 |                0 | age=26/sex=man |    14 |     12 |
+---------+-------------+------------------+----------------+-------+--------+
2 rows in set (0.00 sec)

mysql> 

D、修复分区,查询数据

hive> msck repair table  student_ex;
OK
Partitions not in metastore:	student_ex:age=10	student_ex:age=20
Repair: Added partition to metastore student_ex:age=10
Repair: Added partition to metastore student_ex:age=20
Time taken: 0.212 seconds, Fetched: 3 row(s)
hive> 


==============MySQL中查询分区信息=================
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME      | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
|       1 |  1624673853 |                0 | age=16         |    12 |     11 |
|       2 |  1624675035 |                0 | age=26/sex=man |    14 |     12 |
|       4 |  1624681398 |                0 | age=10         |    21 |     17 |
|       5 |  1624681398 |                0 | age=20         |    22 |     17 |
+---------+-------------+------------------+----------------+-------+--------+
4 rows in set (0.00 sec)

mysql> 


===============查询数据======================
hive> select * from student_ex;
OK
1	小红1	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	10
2	小红2	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}	10
3	小红3	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}	10
4	小红4	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}	10
5	小红5	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	10
6	小红6	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}	10
7	小红7	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	10
8	小红8	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}	10
9	小红9	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	10
10	小红10	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}	10
1	小红1	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	20
2	小红2	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}	20
3	小红3	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}	20
4	小红4	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}	20
5	小红5	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	20
6	小红6	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}	20
7	小红7	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	20
8	小红8	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}	20
9	小红9	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	20
10	小红10	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}	20
Time taken: 0.235 seconds, Fetched: 20 row(s)
hive> 

Hive动态分区

开启支持动态分区:

     set hive.exec.dynamic.partition=true; (默认:true) 

     set hive.exec.dynamic.partition.mode=nostrict; 【默认:strict(至少有一个分区列是静态分区)】

其他相关参数:

      set hive.exec.max.dynamic.partitions.pernode; 【每一个执行mr节点上,允许创建的动态分区的最大数量(100)】

      set hive.exec.max.dynamic.partitions;【所有执行mr节点上,允许创建的所有动态分区的最大数量(1000)】

      set hive.exec.max.created.files;【所有的mr job允许创建的文件的最大数量(100000)】

1、设置支持分区、非严格模式
hive> set hive.exec.dynamic.partition;
hive.exec.dynamic.partition=true
hive> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=strict
hive> set hive.exec.dynamic.partition.mode=nostrict;
hive> 

2、准备测试数据
[root@node04 ~]# cd /opt/software/data/
[root@node04 data]# vi student_for_dp

1,22,female,小红,王者-book-movie,modu:renminglu-shenzheng:futian
2,22,man,明明,王者-book-movie,modu:renminglu-xizhang:lasha
3,21,female,小兰,吃鸡-book-movie,chongqing:renminglu-shenzheng:futian
4,28,female,花花,王者-book-movie,modu:renminglu-dongguang:changan
5,22,man,悟空,walking-book-movie,modu:renminglu-shenzheng:futian
6,26,female,和尚,王者-book-movie,nanchang:renminglu-shenzheng:futian
7,22,female,蓝宝,王者-book-movie,modu:renminglu-shenzheng:futian
8,21,man,肚皮,walking-book-movie,nanchang:renminglu-guangzhou:niwan
9,22,female,狗蛋,王者-book-movie,modu:renminglu-shenzheng:futian
10,28,female,赵二,王者-book-movie,shanghai:renminglu-shenzheng:futian

3、创建数据表
create table student_for_dp
(
id int,
age int,
gender string,
name string,
likes array<String>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

4、导入数据:
hive> load data local inpath '/opt/software/data/student_for_dp' into table student_for_dp;
Loading data to table default.student_for_dp
OK
Time taken: 1.467 seconds
hive> select * from student_for_dp;
OK
student_for_dp.id	student_for_dp.age	student_for_dp.gender	student_for_dp.name	student_for_dp.likes	student_for_dp.address
1	22	female	小红	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
2	22	man	明明	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}
3	21	female	小兰	["吃鸡","book","movie"]	{"chongqing":"renminglu","shenzheng":"futian"}
4	28	female	花花	["王者","book","movie"]	{"modu":"renminglu","dongguang":"changan"}
5	22	man	悟空	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
6	26	female	和尚	["王者","book","movie"]	{"nanchang":"renminglu","shenzheng":"futian"}
7	22	female	蓝宝	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
8	21	man	肚皮	["walking","book","movie"]	{"nanchang":"renminglu","guangzhou":"niwan"}
9	22	female	狗蛋	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}
10	28	female	赵二	["王者","book","movie"]	{"shanghai":"renminglu","shenzheng":"futian"}
Time taken: 0.329 seconds, Fetched: 10 row(s)
hive> 

5、创建动态分区表:以年龄和性别作为分区
create table student_dynamic_partition
(
id int,
name string,
likes array<String>,
address map<string,string>
)
partitioned by (age int,gender string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

6、查询数据插入动态分区表
from student_for_dp
insert into student_dynamic_partition partition(age,gender)
select id,name,likes,address,age,gender;

hive> from student_for_dp
    > insert into student_dynamic_partition partition(age,gender)
    > select id,name,likes,address,age,gender;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20210628223131_3e0bf5fb-c36a-44ef-80a8-b6f272d6ca80
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1624888384456_0001, Tracking URL = http://node04:8088/proxy/application_1624888384456_0001/
Kill Command = /opt/software/hadoop-2.10.1/bin/hadoop job  -kill job_1624888384456_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2021-06-28 22:31:49,693 Stage-1 map = 0%,  reduce = 0%
2021-06-28 22:32:01,310 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.69 sec
MapReduce Total cumulative CPU time: 1 seconds 690 msec
Ended Job = job_1624888384456_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dynamic_partition/.hive-staging_hive_2021-06-28_22-31-31_801_1811947806510288557-1/-ext-10000
Loading data to table default.student_dynamic_partition partition (age=null, gender=null)


Loaded : 1/6 partitions.
Loaded : 2/6 partitions.
Loaded : 3/6 partitions.
Loaded : 4/6 partitions.
Loaded : 5/6 partitions.
Loaded : 6/6 partitions.
	 Time taken to load dynamic partitions: 0.899 seconds
	 Time taken for adding to write entity : 0.002 seconds
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.69 sec   HDFS Read: 5893 HDFS Write: 1079 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
id	name	likes	address	age	gender
Time taken: 32.429 seconds
hive> 

7、查询数据:
hive> select * from student_dynamic_partition where age = 22;
OK
student_dynamic_partition.id	student_dynamic_partition.name	student_dynamic_partition.likes	student_dynamic_partition.address	student_dynamic_partition.age	student_dynamic_partition.gender
1	小红	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	22	female
7	蓝宝	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	22	female
9	狗蛋	["王者","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	22	female
2	明明	["王者","book","movie"]	{"modu":"renminglu","xizhang":"lasha"}	22	man
5	悟空	["walking","book","movie"]	{"modu":"renminglu","shenzheng":"futian"}	22	man
Time taken: 0.943 seconds, Fetched: 5 row(s)
hive> 


版权声明:本文为chooser007原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。