Dav*_*d H 7 hive hadoop-yarn hortonworks-data-platform apache-spark
我正在使用HDP 2.5,将spark-submit作为纱线簇模式运行。
我试图使用数据框交叉连接生成数据。即
val generatedData = df1.join(df2).join(df3).join(df4)
generatedData.saveAsTable(...)....
Run Code Online (Sandbox Code Playgroud)
df1的存储级别为MEMORY_AND_DISK
df2,df3,df4存储级别为MEMORY_ONLY
df1具有更多记录,即500万条记录,而df2至df4具有最多100条记录。这样,使用BroadcastNestedLoopJoin解释计划,我的解释就会得到更好的性能。
由于某种原因,它总是失败。我不知道如何调试它以及内存在哪里爆炸。
错误日志输出:
16/12/06 19:44:08 WARN YarnAllocator: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 ERROR YarnClusterScheduler: Lost executor 1 on hdp4: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 WARN TaskSetManager: Lost task 1.0 in stage 12.0 (TID 19, hdp4): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Run Code Online (Sandbox Code Playgroud)
在出现此错误之前,我没有看到任何警告或错误日志。问题是什么?我应该在哪里寻找内存消耗?我在SparkUI 的“ 存储”选项卡上看不到任何内容。该日志取自HDP 2.5上的纱线资源管理器UI
编辑
查看容器日志,看来这是一个java.lang.OutOfMemoryError: GC overhead limit exceeded
我知道如何增加内存,但是我没有任何内存了。我如何在没有出现此错误的情况下将笛卡尔/乘积与4个数据框合并。
我也遇到了这个问题,并尝试通过引用一些博客来解决它。1.运行spark add conf bellow:
Run Code Online (Sandbox Code Playgroud)--conf 'spark.driver.extraJavaOptions=-XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps' \ --conf 'spark.executor.extraJavaOptions=-XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC ' \
Run Code Online (Sandbox Code Playgroud)Heap after GC invocations=157 (full 98): PSYoungGen total 940544K, used 853456K [0x0000000781800000, 0x00000007c0000000, 0x00000007c0000000) eden space 860160K, 99% used [0x0000000781800000,0x00000007b5974118,0x00000007b6000000) from space 80384K, 0% used [0x00000007b6000000,0x00000007b6000000,0x00000007bae80000) to space 77824K, 0% used [0x00000007bb400000,0x00000007bb400000,0x00000007c0000000) ParOldGen total 2048000K, used 2047964K [0x0000000704800000, 0x0000000781800000, 0x0000000781800000) object space 2048000K, 99% used [0x0000000704800000,0x00000007817f7148,0x0000000781800000) Metaspace used 43044K, capacity 43310K, committed 44288K, reserved 1087488K class space used 6618K, capacity 6701K, committed 6912K, reserved 1048576K }
PSYoungGen和ParOldGen都为99%,那么您将得到java.lang.OutOfMemoryError:如果创建了更多对象,则超出了GC开销限制。
当有更多的内存资源可用时,尝试为执行程序或驱动程序添加更多的内存:
--executor 内存10000m \- 驱动程序内存10000m \
就我而言:PSYoungGen的内存小于ParOldGen,这导致许多年轻对象进入ParOldGen内存区域,而最终ParOldGen不可用。因此java.lang.OutOfMemoryError:Java堆空间错误出现。
为执行程序添加conf:
'spark.executor.extraJavaOptions = -XX:NewRatio = 1 -XX:+ UseCompressedOops-详细:gc -XX:+ PrintGCDetails -XX:+ PrintGCTimeStamps'
-XX:NewRatio = rate rate = ParOldGen / PSYoungGen
这取决于你可以尝试GC策略,例如
-XX:+UseSerialGC :Serial Collector
-XX:+UseParallelGC :Parallel Collector
-XX:+UseParallelOldGC :Parallel Old collector
-XX:+UseConcMarkSweepGC :Concurrent Mark Sweep
Run Code Online (Sandbox Code Playgroud)
所有容器和am的日志文件都可用,
yarn logs -applicationId application_1480922439133_0845_02
Run Code Online (Sandbox Code Playgroud)
如果您只想要AM日志,
yarn logs -am -applicationId application_1480922439133_0845_02
Run Code Online (Sandbox Code Playgroud)
如果您要查找为此任务运行的容器,
yarn logs -applicationId application_1480922439133_0845_02|grep container_e33_1480922439133_0845_02
Run Code Online (Sandbox Code Playgroud)
如果您只需要一个容器日志,
yarn logs -containerId container_e33_1480922439133_0845_02_000002
Run Code Online (Sandbox Code Playgroud)
为了使这些命令起作用,必须将日志聚合设置为true,否则您将必须从单个服务器目录中获取日志。
更新 除了尝试交换之外,您无能为力,但这会大大降低性能。
GC开销限制意味着,GC已经连续不间断运行,但是无法恢复大量内存。这样做的唯一原因是,要么代码编写不正确,并且具有大量的反向引用(这很可疑,因为您正在执行简单的连接),否则已达到内存容量。
| 归档时间: |
|
| 查看次数: |
24642 次 |
| 最近记录: |