有几个地方说Hadoop作业中默认的减少器数是1.您可以使用mapred.reduce.tasks符号手动设置减速器的数量.
当我运行Hive作业时(在Amazon EMR,AMI 2.3.3上),它有一些大于1的减速器.看看工作设置,有些东西已经设置了mapred.reduce.tasks,我认为是Hive.它如何选择这个数字?
注意:这是运行Hive作业时的一些消息,应该是一个线索:
...
Number of reduce tasks not specified. Estimated from input data size: 500
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
...
Run Code Online (Sandbox Code Playgroud)