AWS Glue - 无法设置 spark.yarn.executor.memoryOverhead

Question

AWS Glue - 无法设置 spark.yarn.executor.memoryOverhead

KDi*_*lla 6 apache-spark pyspark aws-glue

在 AWS Glue 中运行 python 作业时，出现错误：

原因：容器因超出内存限制而被 YARN 杀死。使用了 5.6 GB 的 5.5 GB 物理内存。考虑提升 spark.yarn.executor.memoryOverhead

在脚本开头运行时：

print '--- Before Conf --'
print 'spark.yarn.driver.memory', sc._conf.get('spark.yarn.driver.memory')
print 'spark.yarn.driver.cores', sc._conf.get('spark.yarn.driver.cores')
print 'spark.yarn.executor.memory', sc._conf.get('spark.yarn.executor.memory')
print 'spark.yarn.executor.cores', sc._conf.get('spark.yarn.executor.cores')
print "spark.yarn.executor.memoryOverhead", sc._conf.get("spark.yarn.executor.memoryOverhead")

print '--- Conf --'
sc._conf.setAll([('spark.yarn.executor.memory', '15G'),('spark.yarn.executor.memoryOverhead', '10G'),('spark.yarn.driver.cores','5'),('spark.yarn.executor.cores', '5'), ('spark.yarn.cores.max', '5'), ('spark.yarn.driver.memory','15G')])

print '--- After Conf ---'
print 'spark.driver.memory', sc._conf.get('spark.driver.memory')
print 'spark.driver.cores', sc._conf.get('spark.driver.cores')
print 'spark.executor.memory', sc._conf.get('spark.executor.memory')
print 'spark.executor.cores', sc._conf.get('spark.executor.cores')
print "spark.executor.memoryOverhead", sc._conf.get("spark.executor.memoryOverhead")

Run Code Online (Sandbox Code Playgroud)

我得到以下输出：

--- 开会前 --

spark.yarn.driver.memory 无

spark.yarn.driver.cores 无

spark.yarn.executor.memory 无

spark.yarn.executor.cores 无

spark.yarn.executor.memoryOverhead 无

--- 会议 --

--- 会议后 ---

spark.yarn.driver.memory 15G

spark.yarn.driver.cores 5

spark.yarn.executor.memory 15G

spark.yarn.executor.cores 5

spark.yarn.executor.memoryOverhead 10G