Spark YARN:无法分配超过 17179869176 字节的页面

bob*_*bob 6 hadoop-yarn apache-spark apache-spark-sql

我正在加入 1100 万条记录。我在 EMR Cluster Spark 2.2.1 中与 5 个工作人员一起运行

运行作业时出现以下错误:

executor 3): java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes
        at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:277)
        at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:328)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:379)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:167)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Run Code Online (Sandbox Code Playgroud)

我无法理解可能的原因。请帮我设置什么参数。

目前我正在运行以下参数: --num-executors 5 --conf spark.eventLog.enabled=true --executor-memory 70g --driver-memory 30g --executor-cores 16 --conf spark.shuffle.memoryFraction=0.5