我正在运行2个Spark作业,一个与1个执行者一起执行,其他与8个执行者一起执行。
A(1位执行者)
spark-submit --class com.SmallfilesResearchProcess --master yarn-cluster --queue xxx --executor-memory 1g --driver-memory 6g --num-executors 1 --conf spark.executor.cores=8 --conf spark.logLineage=true /tmp/hadoop-tools-1.2-SNAPSHOT-spark.jar coalesce.num.partitions=32
Run Code Online (Sandbox Code Playgroud)
这是Spark UI中案例A的结果
Executor ID ? Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Size / Records
1 machine:xxx 2.0 min 32 0 32 404.5 MB / 525572
Total Time Across All Tasks: 2.0 min
Locality Level Summary: Node local: 1; Rack local: 31
Input Size / Records: 404.5 MB / 525572
Run Code Online (Sandbox Code Playgroud)
B(8位执行者)
spark-submit --class com.SmallfilesResearchProcess --master yarn-cluster --queue xxx --executor-memory 1g --driver-memory 6g --num-executors 8 --conf spark.executor.cores=8 --conf spark.logLineage=true /tmp/hadoop-tools-1.2-SNAPSHOT-spark.jar coalesce.num.partitions=32
Run Code Online (Sandbox Code Playgroud)
这是Spark UI中案例B的结果
Executor ID ? Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Size / Records
1 machine:xxxx 22 s 4 0 4 37.0 MB / 63106
2 machine:xxxx 25 s 4 0 4 0.0 B / 64068
3 machine:xxxx 27 s 4 0 4 0.0 B / 65045
4 machine:xxxx 22 s 4 0 4 38.1 MB / 64255
5 machine:xxxx 27 s 5 0 5 52.3 MB / 82091
6 machine:xxxx 22 s 5 0 5 49.1 MB / 79232
7 machine:xxxx 19 s 3 0 3 0.0 B / 48337
8 machine:xxxx 22 s 3 0 3 0.0 B / 59438
Total Time Across All Tasks: 2.8 min
Locality Level Summary: Node local: 4; Rack local: 28
Input Size / Records: 176.5 MB / 525572
Run Code Online (Sandbox Code Playgroud)
问题1:输入大小是如何计算的,考虑到两种情况的记录相同,我想应该是相同的吗?案例B为176.5 MB,案例A为405MB。
问题2:为什么案例B中的某些任务的输入大小为0 B?
| 归档时间: |
|
| 查看次数: |
481 次 |
| 最近记录: |