火花错误：executor.CoarseGrainedExecutorBackend：收到信号条款

Question

火花错误：executor.CoarseGrainedExecutorBackend：收到信号条款

我正在使用以下火花配置

maxCores = 5
 driverMemory=2g
 executorMemory=17g
 executorInstances=100

Run Code Online (Sandbox Code Playgroud)

问题：在 100 个 Executor 中，我的工作最终只有 10 个活动的 executor，但仍有足够的可用内存。即使尝试将执行程序设置为 250，只有 10 个仍然处于活动状态。我要做的就是加载一个多分区配置单元表并对其执行 df.count。

Please help me understanding the issue causing the executors kill
17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called

Run Code Online (Sandbox Code Playgroud)

不知道为什么纱线会杀死我的执行者。

Answer 1

maf*_*ffe 4

我遇到了类似的问题，对 NodeManager-Logs 的调查引导我找到了根本原因。您可以通过 Web 界面访问它们

nodeManagerAddress:PORT/logs

Run Code Online (Sandbox Code Playgroud)

PORT在yarn.nodemanager.webapp.address下的yarn-site.xml中指定。（默认：8042）

我的调查工作流程：

收集日志（纱线日志...命令）

识别发出错误的节点和容器（在这些日志中）

按错误的时间戳搜索 NodeManager 日志以查找根本原因

顺便说一句：您可以使用以下命令访问影响同一端口节点的所有配置的聚合集合（xml）：

nodeManagerAdress:PORT/conf
Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，2 月前
查看次数：	29692 次
最近记录：	6 年，11 月前