hdfs Datanode拒绝与namenode通信,因为无法解析主机名

MrE*_*MrE 9 hadoop hdfs

我在LXC容器内的机器上有3个DataNode和1个NameNode.与NameNode在同一节点上的DataNode工作正常,但另外2个不能得到:

 Initialization failed for Block pool BP-232943349-10.0.3.112-1417116665984 
(Datanode Uuid null) service to hadoop12.domain.local/10.0.3.112:8022 
Datanode denied communication with namenode because hostname cannot be resolved 
(ip=10.0.3.233, hostname=10.0.3.233): DatanodeRegistration(10.0.3.114, 
datanodeUuid=49a6dc47-c988-4cb8-bd84-9fabf87807bf, infoPort=50075, ipcPort=50020, 
storageInfo=lv=-56;cid=cluster24;nsid=11020533;c=0)
Run Code Online (Sandbox Code Playgroud)

在日志文件中注意我的NameNode is at IP 10.0.3.112,DataNode failing is at 10.0.3.114在这种情况下.所有节点FQDN都在所有节点上的hosts文件中定义,我可以ping所有节点上的每个节点.

这里让我感到困惑的是,DataNode正试图找到10.0.3.233列表中不是IP的NameNode,也不知道NameNode 的IP么?这个设置在哪里?失败的第二个DataNode是,10.0.3.113并且还查找10.0.3.158它无法解析的不同IP(),因为它未定义且在我的设置中不存在.

工作的节点10.0.3.112就像NameNode一样,但在日志中我看到它正在使用src /和dst /文件,这些IP是我使用的范围之外的IP.像这样:

    src: /10.0.3.112:50010, dest: /10.0.3.180:53246, bytes: 60, op: HDFS_READ, 
cliID: DFSClient_NONMAPREDUCE_-939581249_2253, offset: 0, srvID: a83af9ba-4e1a-47b3-a5d4-
f437ef60c287, blockid: BP-232943349-10.0.3.112-1417116665984:blk_1073742468_1644, 
duration: 1685666
Run Code Online (Sandbox Code Playgroud)

那么到底发生了什么,以及当我的所有节点看到并解决彼此时,我怎么也无法到达NameNode?

感谢帮助

PS:/ etc/hosts文件如下所示:

127.0.0.1   localhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

10.0.3.1    bigdata.domain.local
192.168.10.33   bigdata.domain.local
10.0.3.111  hadoop11.domain.local
10.0.3.112  hadoop12.domain.local
10.0.3.113  hadoop13.domain.local
10.0.3.114  hadoop14.domain.local
10.0.3.115  hadoop15.domain.local
10.0.3.116  hadoop16.domain.local
10.0.3.117  hadoop17.domain.local
10.0.3.118  hadoop18.domain.local
10.0.3.119  hadoop19.domain.local
10.0.3.121  hadoop21.domain.local
10.0.3.122  hadoop22.domain.local
10.0.3.123  hadoop23.domain.local
10.0.3.124  hadoop24.domain.local
10.0.3.125  hadoop25.domain.local
10.0.3.126  hadoop26.domain.local
10.0.3.127  hadoop27.domain.local
10.0.3.128  hadoop28.domain.local
10.0.3.129  hadoop29.domain.local
Run Code Online (Sandbox Code Playgroud)

核心-site.xml中:

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://nameservice1</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
      </property>
      <property>
        <name>hadoop.security.authentication</name>
        <value>simple</value>
      </property>
      <property>
        <name>hadoop.security.authorization</name>
        <value>false</value>
      </property>
      <property>
        <name>hadoop.rpc.protection</name>
        <value>authentication</value>
      </property>
      <property>
        <name>hadoop.ssl.require.client.cert</name>
        <value>false</value>
        <final>true</final>
      </property>
      <property>
        <name>hadoop.ssl.keystores.factory.class</name>
        <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
        <final>true</final>
      </property>
      <property>
        <name>hadoop.ssl.server.conf</name>
        <value>ssl-server.xml</value>
        <final>true</final>
      </property>
      <property>
        <name>hadoop.ssl.client.conf</name>
        <value>ssl-client.xml</value>
        <final>true</final>
      </property>
      <property>
        <name>hadoop.security.auth_to_local</name>
        <value>DEFAULT</value>
      </property>
      <property>
        <name>hadoop.proxyuser.oozie.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.oozie.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.mapred.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.mapred.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.flume.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.flume.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.HTTP.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.HTTP.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hue.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hue.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.httpfs.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.httpfs.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hdfs.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.hdfs.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.security.group.mapping</name>
        <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
      </property>
      <property>
        <name>hadoop.security.instrumentation.requires.admin</name>
        <value>false</value>
      </property>
    </configuration>
Run Code Online (Sandbox Code Playgroud)

HDFS-site.xml中

<

!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>nameservice1</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.nameservice1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled.nameservice1</name>
    <value>true</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop12.domain.local:2181,hadoop13.domain.local:2181,hadoop14.domain.local:2181</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.nameservice1</name>
    <value>namenode114,namenode137</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.nameservice1.namenode114</name>
    <value>hadoop12.domain.local:8020</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address.nameservice1.namenode114</name>
    <value>hadoop12.domain.local:8022</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.nameservice1.namenode114</name>
    <value>hadoop12.domain.local:50070</value>
  </property>
  <property>
    <name>dfs.namenode.https-address.nameservice1.namenode114</name>
    <value>hadoop12.domain.local:50470</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.nameservice1.namenode137</name>
    <value>hadoop14.domain.local:8020</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address.nameservice1.namenode137</name>
    <value>hadoop14.domain.local:8022</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.nameservice1.namenode137</name>
    <value>hadoop14.domain.local:50070</value>
  </property>
  <property>
    <name>dfs.namenode.https-address.nameservice1.namenode137</name>
    <value>hadoop14.domain.local:50470</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>
Run Code Online (Sandbox Code Playgroud)

sha*_*ane 14

您只需更改hdfs-site.xmlnamenode 的配置即可dfs.namenode.datanode.registration.ip-hostname-check

  • 需要明确的是,在文件“hdfs-site”中添加“&lt;property&gt; &lt;name&gt;dfs.namenode.datanode.registration.ip-hostname-check&lt;/name&gt; &lt;value&gt;false&lt;/value&gt; &lt;/property&gt;” namenode 和 datanode 的 .xml` (当然在 `&lt;configuration&gt;` 标签内)为我解决了这个问题:*Datanode 拒绝与 namenode 通信,因为无法解析主机名*。 (3认同)

MrE*_*MrE 6

经过这个设置的很多问题后,我终于想到了什么是错的......即使我的配置在我设置时是正确的,实际上resolvconf(程序)倾向于重置/etc/resolv.conf配置文件并覆盖我搜索domain.local的设置

Cloudera和Hadoop也会使用各种方法来确定IP地址,不幸的是它们并不一致:Cloudera首先使用SSH查找IP,这与PING和其他程序一样使用GLIBC解析器,但后来使用HOST,后者通过了GLIBC解析器并直接使用DNS,包括/ etc/hosts文件和/etc/resolv.conf文件

所以,起初它可以正常工作,但RESOLVCONF会自动覆盖我的域名和搜索设置并搞砸了.

我最终从我的设置中删除了resolveconf,并且使用了适当的文件(hosts,resolv.conf)并确保HOST解析为FQDN,这一切都很好.所以诀窍是删除默认安装的RESOLVCONF,因为我相信Ubuntu 10.04.对于像我这样的本地设置,这当然是正确的.在使用DNS的网络上进行实际群集设置时,只需确保DNS正确解析节点.