HIVE调优MapJoin

2024-05-12 10:20

文章标签 hive 调优 mapjoin

本文主要是介绍HIVE调优MapJoin，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

HIVE调优MapJoin

1.mapjoin （1.2以后自动默认启动mapjoin）

2.创建表格

3.查询建表

4.通过 explain 展示执行计划

5.Map JOIN 相关设置：

1.mapjoin （1.2以后自动默认启动mapjoin）

select /*+mapjoin(b)*/ a.xx,b.xxx from a left outer join b on a.id=b.id

2.创建表格


CREATE EXTERNAL TABLE IF NOT EXISTS learn4.student1(
id STRING COMMENT "学生ID",
name STRING COMMENT "学生姓名",
age int COMMENT "年龄",
gender STRING COMMENT "性别",
clazz STRING COMMENT "班级"
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";load data local inpath "/usr/local/soft/hive-3.1.2/data/students.txt" INTO TABLE learn4.student1;CREATE EXTERNAL TABLE IF NOT EXISTS learn4.score1(
id STRING COMMENT "学生ID",
subject_id STRING COMMENT "科目ID",
score int COMMENT "成绩"
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
load data local inpath "/usr/local/soft/hive-3.1.2/data/score.txt" INTO TABLE learn4.score1;

3.查询建表


CREATE TABLE mapJonTest AS SELECT max.name,min.subject_id,min.score FROM learn4.student1 max JOIN learn4.score1 min ON max.id = min.id;

建表所需时间：

INFO : Total MapReduce CPU Time Spent: 2 seconds 510 msec
INFO : Completed executing command(queryId=root_20240511090524_3a34bdda-4247-4af4-b686-d681856af110); Time taken: 19.199 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
No rows affected (20.707 seconds)

4.通过 explain 展示执行计划

explain SELECT max.name,min.subject_id,min.score FROM learn4.student1 max JOIN learn4.score1 min ON max.id = min.id;查看详细信息：explain extended SELECT max.name,min.subject_id,min.score FROM learn4.student1 max JOIN learn4.score1 min ON max.id = min.id;

| STAGE DEPENDENCIES: | -- 执行stage的依赖
| Stage-4 is a root stage |   Stage-4 表示根流程 --表示最先执行的流程
| Stage-3 depends on stages: Stage-4 |   Stage-3 依赖 Stage-4
| Stage-0 depends on stages: Stage-3 |   Stage-0 依赖 Stage-3 依赖 Stage-4
| |
| STAGE PLANS: |
| Stage: Stage-4 |
| Map Reduce Local Work |
| Alias -> Map Local Tables: |
| $hdt$_1:min |
| Fetch Operator |
| limit: -1 |
| Alias -> Map Local Operator Tree: |
| $hdt$_1:min |
| TableScan |   TableScan 扫描的表
| alias: min |
| Statistics: Num rows: 1 Data size: 1385400 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 1385400 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: id (type: string), subject_id (type: string), score (type: int) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 1385400 Basic stats: COMPLETE Column stats: NONE |
| HashTable Sink Operator |
| keys: |
| 0 _col0 (type: string) |
| 1 _col0 (type: string) |
| |
| Stage: Stage-3 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: max |
| Statistics: Num rows: 1 Data size: 388080000 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: id is not null (type: boolean) |
| Statistics: Num rows: 1 Data size: 388080000 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: id (type: string), name (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 388080000 Basic stats: COMPLETE Column stats: NONE |
| Map Join Operator | -- 不需要做任何操作默认开启 Map JOIN 操作
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col0 (type: string) |
| 1 _col0 (type: string) |
| outputColumnNames: _col1, _col3, _col4 |
| Statistics: Num rows: 1 Data size: 426888009 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col1 (type: string), _col3 (type: string), _col4 (type: int) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 426888009 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 426888009 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| Execution mode: vectorized |
| Local Work: |
| Map Reduce Local Work |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |

5.Map JOIN 相关设置：

1）设置自动选择Mapjoin

set hive.auto.convert.join = true; 默认为trueset hive.auto.convert.join = false; 默认为true

2）大表小表的阈值设置（默认25M以下认为是小表）：

set hive.mapjoin.smalltable.filesize = 25000000;set hive.mapjoin.smalltable.filesize = 10000000;

这篇关于HIVE调优MapJoin的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

HIVE调优MapJoin

HIVE调优MapJoin

1.mapjoin （1.2以后自动默认启动mapjoin）

2.创建表格

3.查询建表

4.通过 explain 展示执行计划

5.Map JOIN 相关设置：

相关文章

jvm调优常用命令行工具详解

mysql线上查询之前要性能调优的技巧及示例

java如何通过Kerberos认证方式连接hive

Hadoop企业开发案例调优场景

JVM内存调优原则及几种JVM内存调优方法

Hive和Hbase的区别

Linux系统性能调优详解

高性能计算应用优化之代码实现调优(一)

掌握Hive函数[2]：从基础到高级应用

经验笔记：SQL调优