Hadoop Datanode slave未连接到我的主服务器

fsi*_*fsi 6 hadoop hdfs

由于许多错误,我无法弄清楚为什么在没有将datanode slave vm连接到我的主vm时发生这种情况.任何建议都是受欢迎的,所以我可以尝试一下.首先,其中一个是我的slave vm日志中的错误:

WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000
Run Code Online (Sandbox Code Playgroud)

因此,我无法在我的主vm中运行我想要的工作:

hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
Run Code Online (Sandbox Code Playgroud)

这给了我这个错误

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/QuasiMonteCarlo_1386793331690_1605707775/in/part0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
Run Code Online (Sandbox Code Playgroud)

即便如此,hdfs dfsadmin -report(在主vm)给我全部0

Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
Run Code Online (Sandbox Code Playgroud)

为此,我建立了openstack 3 vms ubuntu,一个用于master和其他奴隶.在主人,它的建立etc/hosts

127.0.0.1 localhost
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
50.50.1.8 slave1
50.50.1.4 slave2
Run Code Online (Sandbox Code Playgroud)

核心的site.xml

<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
Run Code Online (Sandbox Code Playgroud)

HDFS-site.xml中

<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
<name>dfs.permissions</name>
<value>false</value>
Run Code Online (Sandbox Code Playgroud)

mapred-site.xml中

<name>mapreduce.framework.name</name>
<value>yarn</value>
Run Code Online (Sandbox Code Playgroud)

我的slave vm文件包含每一行:slave1和slave2.

来自master vm的所有日志都没有错误,但是当我使用slave vm时,它会给出连接错误.并且nodemanager在日志中也给出了错误:

Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
Run Code Online (Sandbox Code Playgroud)

从我的Slave Machine:core-site.xml

<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
Run Code Online (Sandbox Code Playgroud)

HDFS-site.xml中

<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
Run Code Online (Sandbox Code Playgroud)

在我的/ etc/hosts上

127.0.0.1 localhost
50.50.1.8 ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
Run Code Online (Sandbox Code Playgroud)

JPS大师

15863 ResourceManager
15205 SecondaryNameNode
14967 NameNode
16194 Jps
Run Code Online (Sandbox Code Playgroud)

奴隶

1988 Jps
1365 DataNode
1894 NodeManager
Run Code Online (Sandbox Code Playgroud)

fsi*_*fsi 4

导致所有错误显示的原因是,以下错误是主站无法连接到从站的主要原因:

\n\n
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;\n
Run Code Online (Sandbox Code Playgroud)\n\n

基本上,0.0.0.0:8031是港口yarn.resourcemanager.resource-tracker.address,所以我使用 lsof -i :8031 检查,该端口未启用/打开/允许。由于我使用的是 OpenStack(云),因此添加了 8031 和其他显示错误的端口,voil\xc3\xa1,按预期工作。

\n