YARN UNHEALTHY节点

roy*_*roy 7 hadoop distributed-computing cloudera hadoop-yarn cloudera-cdh

在80%已满的YARN集群中,我们看到一些纱线节点管理器被标记为不健康.在挖掘日志之后我发现了它,因为数据目录的磁盘空间已满90%.有以下错误

2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4 local-dirs are bad: /data3/yarn/nm,/data2/yarn/nm,/data4/yarn/nm,/data1/yarn/nm;
2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: hdp009.abc.com:8041 Node Transitioned from RUNNING to UNHEALTHY
Run Code Online (Sandbox Code Playgroud)

我试图了解纱线如何标记节点不健康&有没有办法改变门槛?

谢谢

小智 14

尝试将属性yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage添加到yarn-site.xml.此属性指定允许的磁盘空间利用率的最大百分比,在此之后磁盘被标记为坏.值的范围为0.0到100.0.

纱default.xml中

强迫健康状态, 例如:

<?xml version="1.0"?>
<configuration>    
  <property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>0.0</value>
  </property>
  <property>
     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
     <value>100.0</value>
  </property>
</configuration>
Run Code Online (Sandbox Code Playgroud)