为什么 Spark-Submit 比使用 Spark-Shell 运行相同的作业花费更长的时间？

我首先使用 Spark-shell 然后使用 Spark-submit 运行相同的作业。但是，spark-submit 需要更长的时间。我在客户端模式下的 16 节点集群（>180 个 Vcore）上运行此程序。

火花提交配置：

spark-submit --class tool \
    --master yarn \
    --deploy-mode client \
    --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
    --conf "spark.kryo.classesToRegister=com.fastdtw.timeseries.TimeSeriesBase" \
    --executor-memory 14g \
    --driver-memory 16g \
    --conf "spark.driver.maxResultSize=16g" \
    --conf "spark.kryoserializer.buffer.max=512" \
    --num-executors 30 \
    --conf "spark.executor.cores=6" \
    /home/target/scala-2.10/tool_2.10-0.1-SNAPSHOT.jar

Run Code Online (Sandbox Code Playgroud)

火花外壳配置：

spark-shell \
  --master yarn 
  --deploy-mode client \
  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
  --conf "spark.kryo.classesToRegister=com.fastdtw.timeseries.TimeSeriesBase" \
  --executor-memory 12g \
  --driver-memory 16g \
  --conf "spark.driver.maxResultSize=16g" \
  --conf "spark.kryoserializer.buffer.max=512" \
  --conf "spark.executor.cores=6" \
  --conf "spark.executor.instances=30"

Run Code Online (Sandbox Code Playgroud)

为什么运行时会有差异？

归档时间：	9 年，1 月前
查看次数：	1288 次
最近记录：	9 年，1 月前