nameservice1 和 nameservice2 之间的 distcp

roy*_*roy 2 hadoop bigdata cloudera cloudera-manager cloudera-cdh

我们有带有 Cloudera Manager 5 的 CDH 5.2。

我们想将数据从 nameservice2 复制到 nameservice1

两个集群都在相同的 CDH 版本上

当我尝试 hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo

我有错误

java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2

所以我将以下配置从 Nameservice2 添加到 Nameservice1

Cloudera 管理器(网关默认组)中 hdfs-site.xml 的 HDFS 客户端高级配置片段(安全阀)

<property>
<name>dfs.nameservices</name>
<value>nameservices2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>
Run Code Online (Sandbox Code Playgroud)

但我仍然遇到同样的错误。

任何解决方法?

谢谢

Sac*_*inJ 6

在启用 HA 的 HDFS namenode nameservice1,nameservice2 是逻辑名称,您不能将端口与该逻辑名称一起使用。

你有两种方法。

简单的方法是找到活动的名称节点并使用 distcp 命令中的活动名称节点:端口,如下所示。Namenode Web UI 可用于查找两个集群的活动名称节点。

hadoop distcp hdfs://hnn001.prod.cc:8020:8020/foo/bar hdfs://<dest-cluster-active-nn-hostname>:8020/bar/foo
Run Code Online (Sandbox Code Playgroud)

另一种方法是使用两个集群的逻辑名称如下,但在尝试以下命令之前,请确保您已在客户端 hdfs-site.xml 中正确配置了 nameservice1 和 nameservice2。

hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo
Run Code Online (Sandbox Code Playgroud)

在本地集群中配置远程集群的名称服务。

看起来 nameservice2 是您的本地,而 nameservice1 是您的远程。您需要将 nameservice1 和 nameservice2 的所有关联属性保留在本地集群中,即。您本地集群的客户端 hdfs-site.xml 文件应如下所示。

<configuration>
<!-- Available nameservices -->
<property>
<name>dfs.nameservices</name>
<value>nameservices1,nameservices2</value>
</property>

<!-- Local nameservice2 properties -->
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>

<!-- Remote nameservice1 properties -->
<!-- You can find these properties in the remote machine's hdfs-site.xml file -->

<property>
<name>dfs.client.failover.proxy.provider.nameservices1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices1</name>
<value>namenodeXX,namenodeYY</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenode**XX**</name>
<value><Remote-nn1>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50470</value>
</property>

<!-- Other properties --> 

</configuration>
Run Code Online (Sandbox Code Playgroud)

在上面的配置文件中,用远程机器的 hdfs site.xml 中的相应值替换所有占位符,如 YY XX。