SSo*_*lid 6 hadoop hive mapreduce
我是新手,我遇到了一个问题,
我有一个像这样的蜂巢表:
create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int,
v4 int, v5 bigint, v6 int) PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' lines TERMINATED BY '\n' ;
Run Code Online (Sandbox Code Playgroud)
我运行一个像:
from td
INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1)
INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1)
INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1
INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4
INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4
INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6
INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3
INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5
INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3
Run Code Online (Sandbox Code Playgroud)
它工作,输出结果是我想要的.
但是有一个问题,hive会生成9个mapreduce作业并逐个运行这些作业.
我对这个查询运行解释,我收到以下消息:
STAGE DEPENDENCIES:
Stage-9 is a root stage
Stage-0 depends on stages: Stage-9
Stage-10 depends on stages: Stage-9
Stage-1 depends on stages: Stage-10
Stage-11 depends on stages: Stage-9
Stage-2 depends on stages: Stage-11
Stage-12 depends on stages: Stage-9
Stage-3 depends on stages: Stage-12
Stage-13 depends on stages: Stage-9
Stage-4 depends on stages: Stage-13
Stage-14 depends on stages: Stage-9
Stage-5 depends on stages: Stage-14
Stage-15 depends on stages: Stage-9
Stage-6 depends on stages: Stage-15
Stage-16 depends on stages: Stage-9
Stage-7 depends on stages: Stage-16
Stage-17 depends on stages: Stage-9
Stage-8 depends on stages: Stage-17
Run Code Online (Sandbox Code Playgroud)
似乎第9-17阶段对应于mapreduce作业0-8
但是从上面的解释信息来看,阶段10-17只取决于阶段9,
所以我有一个问题,为什么作业1-8不能同时运行?
或者我如何让作业1-8同时运行?
非常感谢您的帮助!
小智 5
在hive-default.xml中,有一个名为"hive.exec.parallel"的属性,它可以并行执行作业.默认值为"false".您可以将其更改为"true"以获得此功能.您可以使用另一个属性"hive.exec.parallel.thread.number"来控制最多可以并行执行的作业数.
有关详细信息,请访问:https://issues.apache.org/jira/browse/HIVE-549
| 归档时间: |
|
| 查看次数: |
9709 次 |
| 最近记录: |