Spark：为什么有些任务的输入大小为零，如何计算输入大小？

我正在运行2个Spark作业，一个与1个执行者一起执行，其他与8个执行者一起执行。

A（1位执行者）

spark-submit --class com.SmallfilesResearchProcess --master yarn-cluster --queue xxx --executor-memory 1g --driver-memory 6g --num-executors 1 --conf spark.executor.cores=8 --conf spark.logLineage=true /tmp/hadoop-tools-1.2-SNAPSHOT-spark.jar coalesce.num.partitions=32

Run Code Online (Sandbox Code Playgroud)

这是Spark UI中案例A的结果

Executor ID ?   Address         Task Time   Total Tasks     Failed Tasks    Succeeded Tasks     Input Size / Records
1               machine:xxx     2.0 min        32             0              32                 404.5 MB / 525572

Total Time Across All Tasks: 2.0 min
Locality Level Summary: Node local: 1; Rack local: 31
Input Size / Records: 404.5 MB / 525572

Run Code Online (Sandbox Code Playgroud)

B（8位执行者）

spark-submit --class com.SmallfilesResearchProcess --master yarn-cluster --queue xxx --executor-memory 1g --driver-memory 6g --num-executors 8 --conf spark.executor.cores=8 --conf spark.logLineage=true /tmp/hadoop-tools-1.2-SNAPSHOT-spark.jar coalesce.num.partitions=32

Run Code Online (Sandbox Code Playgroud)

这是Spark UI中案例B的结果

Executor ID ?   Address         Task Time   Total Tasks     Failed Tasks    Succeeded Tasks     Input Size / Records
1               machine:xxxx    22 s        4                   0               4                   37.0 MB / 63106
2               machine:xxxx    25 s        4                   0               4                   0.0 B / 64068
3               machine:xxxx    27 s        4                   0               4                   0.0 B / 65045
4               machine:xxxx    22 s        4                   0               4                   38.1 MB / 64255
5               machine:xxxx    27 s        5                   0               5                   52.3 MB / 82091
6               machine:xxxx    22 s        5                   0               5                   49.1 MB / 79232
7               machine:xxxx    19 s        3                   0               3                   0.0 B / 48337
8               machine:xxxx    22 s        3                   0               3                   0.0 B / 59438


Total Time Across All Tasks: 2.8 min
Locality Level Summary: Node local: 4; Rack local: 28
Input Size / Records: 176.5 MB / 525572

Run Code Online (Sandbox Code Playgroud)

问题1：输入大小是如何计算的，考虑到两种情况的记录相同，我想应该是相同的吗？案例B为176.5 MB，案例A为405MB。

问题2：为什么案例B中的某些任务的输入大小为0 B？

归档时间：	8 年，9 月前
查看次数：	481 次
最近记录：	8 年，9 月前