1、Hive 分区 partition:必须在表定义的时候指定对应的partition字段
单分区建表语句:create table day_table(idint, content string) partitioned by (dtstring);
单分区表,按天分区,在表结构中存在id,content,dt三列
以dt为文件夹区分
双分区建表语句:
create tableday_hour_table(idint, content string) partitioned by (dtstring, hour string);
双分区表,按天和小时分区,在表结构中新增加了dt和hour两列
先以dt为文件夹,再以hour子文件夹区分
2、Hive查询执行分区语法
SELECT day_table.* FROM day_table WHERE day_table.dt>= '2008-08-08';
分区表的意义在于优化查询。查询时尽量利用分区字段。如果不使用分区字段,就会全部扫描
3、预先导入分区数据,但是无法识别怎么办
msck repair table tablename
直接添加分区
1、创建静态单分区表:
hive>
> create table student_static_partion1
> (
> id int,
> name String,
> likes array<String>,
> address map<String, String>
> )
> partitioned by (age int)
> row format delimited
> fields terminated by ','
> collection items terminated by '-'
> map keys terminated by ':';
OK
Time taken: 1.963 seconds
2、导入本地数据:当数据被加载至表中时,不会对数据进行任何转换。
Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录
hive> load data local inpath '/opt/software/data/student' into table student_static_partion1 partition(age=16);
Loading data to table default.student_static_partion1 partition (age=16)
OK
Time taken: 2.432 seconds
hive>
3、查询导入数据:
hive> select * from student_static_partion1;
OK
1 小红1 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 16
2 小红2 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"} 16
3 小红3 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"} 16
4 小红4 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"} 16
5 小红5 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 16
6 小红6 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"} 16
7 小红7 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 16
8 小红8 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"} 16
9 小红9 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 16
10 小红10 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"} 16
Time taken: 1.858 seconds, Fetched: 10 row(s)
hive>
=========================================================
创建静态双分区:
hive> create table student_static_partition2
> (
> id int,
> name String,
> likes array<String>,
> address map<String, String>
> )
> partitioned by (age int,sex String)
> row format delimited
> fields terminated by ','
> collection items terminated by '-'
> map keys terminated by ':';
OK
Time taken: 0.159 seconds
查看分区信息:
hive> desc formatted student_static_partition2;
OK
# col_name data_type comment
id int
name string
likes array<string>
address map<string,string>
# Partition Information
# col_name data_type comment
age int
sex string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Sat Jun 26 10:35:20 CST 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://mycluster/user/hive/warehouse/student_static_partition2
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 0
numPartitions 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1624674920
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
colelction.delim -
field.delim ,
mapkey.delim :
serialization.format ,
Time taken: 0.113 seconds, Fetched: 43 row(s)
导入数据:partition(age=26,sex='man') age 和 sex 可交换位置
hive> load data local inpath '/opt/software/data/student' into table student_static_partition2 partition(age=26,sex='man');
Loading data to table default.student_static_partition2 partition (age=26, sex=man)
OK
Time taken: 0.736 seconds
hive>
查看数据信息:
hive> select * from student_static_partition2;
OK
1 小红1 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 26 man
2 小红2 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"} 26 man
3 小红3 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"} 26 man
4 小红4 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"} 26 man
5 小红5 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 26 man
6 小红6 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"} 26 man
7 小红7 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 26 man
8 小红8 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"} 26 man
9 小红9 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 26 man
10 小红10 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"} 26 man
Time taken: 0.303 seconds, Fetched: 10 row(s)
2、Hive改变分区
新增分区语法:当数据被加载至表中时,不会对数据进行任何转换。Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)删除分区语法, 内部表中、对应分区的元数据和数据将被一并删除
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)1、新增分区:只新建表中已有列的值,不新增列;只新增目录
hive> alter table student_static_partition2 add partition(age=19,name='female');
FAILED: ValidationFailureSemanticException Partition spec {age=19, name=female} contains non-partition columns
hive> alter table student_static_partition2 add partition(age=19,sex='female');
OK
Time taken: 0.195 seconds
hive>
此时age=19并没有数据,只是新增了目录:
hive> select * from student_static_partition2 where age = 19;
OK
Time taken: 1.188 seconds
2、删除分区
hive> alter table student_static_partition2 drop partition(age=19);
Dropped the partition age=19/sex=female
OK
Time taken: 0.352 seconds
hive>
3、MySQL中也可查看分区信息:
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
| 1 | 1624673853 | 0 | age=16 | 12 | 11 |
| 2 | 1624675035 | 0 | age=26/sex=man | 14 | 12 |
+---------+-------------+------------------+----------------+-------+--------+
2 rows in set (0.00 sec)
恢复分区信息--如果分区信息是直接通过对HDFS命令操作新增或者删除,hive不会感知需要手动修复分区信息(分区信息属于元数据)
MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
A、使用HDFS创建目录
[root@node04 ~]# hdfs dfs -mkdir /user/age=10
21/06/26 12:01:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 ~]# hdfs dfs -mkdir /user/age=20
21/06/26 12:01:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 ~]#
B、使用HDFS上传数据
[root@node04 data]# hdfs dfs -put /opt/software/data/student /user/age=10
21/06/26 12:09:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 data]# hdfs dfs -put /opt/software/data/student /user/age=20
21/06/26 12:09:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@node04 data]#
C、建立分区对应的表,并查询无数据,此时就需要修复分区信息
hive> create external table student_ex
> (
> id int,
> name String,
> likes array<String>,
> address map<String, String>
> )
> partitioned by (age int)
> row format delimited
> fields terminated by ','
> collection items terminated by '-'
> map keys terminated by ':'
> location '/user';
OK
Time taken: 0.101 seconds
==========查询表中数据==============
hive> select * from student_ex;
OK
Time taken: 0.234 seconds
hive>
============MYSQL中查询无分区信息=============
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
| 1 | 1624673853 | 0 | age=16 | 12 | 11 |
| 2 | 1624675035 | 0 | age=26/sex=man | 14 | 12 |
+---------+-------------+------------------+----------------+-------+--------+
2 rows in set (0.00 sec)
mysql>
D、修复分区,查询数据
hive> msck repair table student_ex;
OK
Partitions not in metastore: student_ex:age=10 student_ex:age=20
Repair: Added partition to metastore student_ex:age=10
Repair: Added partition to metastore student_ex:age=20
Time taken: 0.212 seconds, Fetched: 3 row(s)
hive>
==============MySQL中查询分区信息=================
mysql> select * from PARTITIONS;
+---------+-------------+------------------+----------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+----------------+-------+--------+
| 1 | 1624673853 | 0 | age=16 | 12 | 11 |
| 2 | 1624675035 | 0 | age=26/sex=man | 14 | 12 |
| 4 | 1624681398 | 0 | age=10 | 21 | 17 |
| 5 | 1624681398 | 0 | age=20 | 22 | 17 |
+---------+-------------+------------------+----------------+-------+--------+
4 rows in set (0.00 sec)
mysql>
===============查询数据======================
hive> select * from student_ex;
OK
1 小红1 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 10
2 小红2 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"} 10
3 小红3 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"} 10
4 小红4 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"} 10
5 小红5 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 10
6 小红6 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"} 10
7 小红7 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 10
8 小红8 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"} 10
9 小红9 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 10
10 小红10 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"} 10
1 小红1 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 20
2 小红2 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"} 20
3 小红3 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"} 20
4 小红4 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"} 20
5 小红5 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 20
6 小红6 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"} 20
7 小红7 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 20
8 小红8 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"} 20
9 小红9 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 20
10 小红10 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"} 20
Time taken: 0.235 seconds, Fetched: 20 row(s)
hive>
开启支持动态分区:
set hive.exec.dynamic.partition=true; (默认:true)
set hive.exec.dynamic.partition.mode=nostrict; 【默认:strict(至少有一个分区列是静态分区)】
其他相关参数:
set hive.exec.max.dynamic.partitions.pernode; 【每一个执行mr节点上,允许创建的动态分区的最大数量(100)】
set hive.exec.max.dynamic.partitions;【所有执行mr节点上,允许创建的所有动态分区的最大数量(1000)】
set hive.exec.max.created.files;【所有的mr job允许创建的文件的最大数量(100000)】
1、设置支持分区、非严格模式
hive> set hive.exec.dynamic.partition;
hive.exec.dynamic.partition=true
hive> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=strict
hive> set hive.exec.dynamic.partition.mode=nostrict;
hive>
2、准备测试数据
[root@node04 ~]# cd /opt/software/data/
[root@node04 data]# vi student_for_dp
1,22,female,小红,王者-book-movie,modu:renminglu-shenzheng:futian
2,22,man,明明,王者-book-movie,modu:renminglu-xizhang:lasha
3,21,female,小兰,吃鸡-book-movie,chongqing:renminglu-shenzheng:futian
4,28,female,花花,王者-book-movie,modu:renminglu-dongguang:changan
5,22,man,悟空,walking-book-movie,modu:renminglu-shenzheng:futian
6,26,female,和尚,王者-book-movie,nanchang:renminglu-shenzheng:futian
7,22,female,蓝宝,王者-book-movie,modu:renminglu-shenzheng:futian
8,21,man,肚皮,walking-book-movie,nanchang:renminglu-guangzhou:niwan
9,22,female,狗蛋,王者-book-movie,modu:renminglu-shenzheng:futian
10,28,female,赵二,王者-book-movie,shanghai:renminglu-shenzheng:futian
3、创建数据表
create table student_for_dp
(
id int,
age int,
gender string,
name string,
likes array<String>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';
4、导入数据:
hive> load data local inpath '/opt/software/data/student_for_dp' into table student_for_dp;
Loading data to table default.student_for_dp
OK
Time taken: 1.467 seconds
hive> select * from student_for_dp;
OK
student_for_dp.id student_for_dp.age student_for_dp.gender student_for_dp.name student_for_dp.likes student_for_dp.address
1 22 female 小红 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
2 22 man 明明 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"}
3 21 female 小兰 ["吃鸡","book","movie"] {"chongqing":"renminglu","shenzheng":"futian"}
4 28 female 花花 ["王者","book","movie"] {"modu":"renminglu","dongguang":"changan"}
5 22 man 悟空 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
6 26 female 和尚 ["王者","book","movie"] {"nanchang":"renminglu","shenzheng":"futian"}
7 22 female 蓝宝 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
8 21 man 肚皮 ["walking","book","movie"] {"nanchang":"renminglu","guangzhou":"niwan"}
9 22 female 狗蛋 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"}
10 28 female 赵二 ["王者","book","movie"] {"shanghai":"renminglu","shenzheng":"futian"}
Time taken: 0.329 seconds, Fetched: 10 row(s)
hive>
5、创建动态分区表:以年龄和性别作为分区
create table student_dynamic_partition
(
id int,
name string,
likes array<String>,
address map<string,string>
)
partitioned by (age int,gender string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';
6、查询数据插入动态分区表
from student_for_dp
insert into student_dynamic_partition partition(age,gender)
select id,name,likes,address,age,gender;
hive> from student_for_dp
> insert into student_dynamic_partition partition(age,gender)
> select id,name,likes,address,age,gender;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20210628223131_3e0bf5fb-c36a-44ef-80a8-b6f272d6ca80
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1624888384456_0001, Tracking URL = http://node04:8088/proxy/application_1624888384456_0001/
Kill Command = /opt/software/hadoop-2.10.1/bin/hadoop job -kill job_1624888384456_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2021-06-28 22:31:49,693 Stage-1 map = 0%, reduce = 0%
2021-06-28 22:32:01,310 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.69 sec
MapReduce Total cumulative CPU time: 1 seconds 690 msec
Ended Job = job_1624888384456_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://mycluster/user/hive/warehouse/student_dynamic_partition/.hive-staging_hive_2021-06-28_22-31-31_801_1811947806510288557-1/-ext-10000
Loading data to table default.student_dynamic_partition partition (age=null, gender=null)
Loaded : 1/6 partitions.
Loaded : 2/6 partitions.
Loaded : 3/6 partitions.
Loaded : 4/6 partitions.
Loaded : 5/6 partitions.
Loaded : 6/6 partitions.
Time taken to load dynamic partitions: 0.899 seconds
Time taken for adding to write entity : 0.002 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.69 sec HDFS Read: 5893 HDFS Write: 1079 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
id name likes address age gender
Time taken: 32.429 seconds
hive>
7、查询数据:
hive> select * from student_dynamic_partition where age = 22;
OK
student_dynamic_partition.id student_dynamic_partition.name student_dynamic_partition.likes student_dynamic_partition.address student_dynamic_partition.age student_dynamic_partition.gender
1 小红 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 22 female
7 蓝宝 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 22 female
9 狗蛋 ["王者","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 22 female
2 明明 ["王者","book","movie"] {"modu":"renminglu","xizhang":"lasha"} 22 man
5 悟空 ["walking","book","movie"] {"modu":"renminglu","shenzheng":"futian"} 22 man
Time taken: 0.943 seconds, Fetched: 5 row(s)
hive>