Hive查看执行计划

可以用explain查看执行计划
比如

explain select deptno `dept`,
       year(hiredate) `year`,
       sum(sal)
from tb_emp
group by deptno, year(hiredate);

1 可以先看有几个stage

比如这个例子有2个

+------------------------------------+
|Explain                             |
+------------------------------------+
|STAGE DEPENDENCIES:                 |
|  Stage-1 is a root stage           |
|  Stage-0 depends on stages: Stage-1|
+------------------------------------+

stage 0 依赖于stage1,就是说先执行stage1,再执行stage 0

1查看stage1的map阶段

可以看出map阶段主要做了

  • 表的扫描
  • 表数据量的统计
  • 检索的字段 就是expressions那块
  • aggregations
+-------------------------------------------------------------------------------------------------+
|Explain                                                                                          |
+-------------------------------------------------------------------------------------------------+
|    Map Reduce                                                                                   |
|      Map Operator Tree:                                                                         |
|          TableScan                                                                              |
|            alias: tb_emp                                                                        |
|            Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE      |
|            Select Operator                                                                      |
|              expressions: deptno (type: int), year(hiredate) (type: int), sal (type: float)     |
|              outputColumnNames: _col0, _col1, _col2                                             |
|              Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE    |
|              Group By Operator                                                                  |
|                aggregations: sum(_col2)                                                         |
|                keys: _col0 (type: int), _col1 (type: int)                                       |
|                mode: hash                                                                       |
|                outputColumnNames: _col0, _col1, _col2                                           |
|                Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE  |
|                Reduce Output Operator                                                           |
|                  key expressions: _col0 (type: int), _col1 (type: int)                          |
|                  sort order: ++                                                                 |
|                  Map-reduce partition columns: _col0 (type: int), _col1 (type: int)             |
|                  Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
|                  value expressions: _col2 (type: double)                                        |
+-------------------------------------------------------------------------------------------------+

3看reduce阶段

  • 确定输入与输出格式
+-------------------------------------------------------------------------------------------+
|Explain                                                                                    |
+-------------------------------------------------------------------------------------------+
|      Reduce Operator Tree:                                                                |
|        Group By Operator                                                                  |
|          aggregations: sum(VALUE._col0)                                                   |
|          keys: KEY._col0 (type: int), KEY._col1 (type: int)                               |
|          mode: mergepartial                                                               |
|          outputColumnNames: _col0, _col1, _col2                                           |
|          Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE  |
|          File Output Operator                                                             |
|            compressed: false                                                              |
|            Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE|
|            table:                                                                         |
|                input format: org.apache.hadoop.mapred.SequenceFileInputFormat             |
|                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat   |
|                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  |
+-------------------------------------------------------------------------------------------+

参考

Hive实验5:查看Hql执行计划及关键步骤说明_heroicpoem的专栏-CSDN博客_hive查看执行计划

LanguageManual Explain - Apache Hive - Apache Software Foundation


版权声明:本文为u010711495原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。