cephadm:无法将节点添加到 ceph 集群(错误 EINVAL:无法连接到主机)

Sho*_*nak 4 ceph cephfs

我按照https://docs.ceph.com/en/latest/cephadm/install/中的以下步骤在 Centos 8.1 上设置 ceph 集群

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
chmod +x cephadm
./cephadm add-repo --release octopus
./cephadm install
Run Code Online (Sandbox Code Playgroud)

在执行上述命令后,我发现 ceph 需要 docker 或 podman 才能运行。因此,我从https://docs.docker.com/engine/install/centos/安装了 docker 的社区版本,并继续执行以下步骤。

./cephadm install
mkdir -p /etc/ceph
cephadm bootstrap --mon-ip *ip_of_the_current_machine (host1)*
cephadm install ceph-common
ssh-copy-id -f -i /etc/ceph/ceph.pub root@host2*
ceph orch host add host2
Run Code Online (Sandbox Code Playgroud)

上述命令失败并出现错误

[root@host1 home]# ceph orch host add host2
INFO:cephadm:Inferring fsid 12345678-2345-6789-1011-000129110013
INFO:cephadm:Inferring config /var/lib/ceph/12345678-2345-6789-1011-000129110013/mon.host1/config
INFO:cephadm:Using recent ceph image ceph/ceph:v15
Error EINVAL: Failed to connect to host2 (host2).
Check that the host is reachable and accepts connections using the cephadm SSH key
 
you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@host2
Run Code Online (Sandbox Code Playgroud)

我可以使用上述步骤登录到 host2。有人可以告诉我我是否做错了什么。我该如何解决这个问题。

Sho*_*nak 6

因此,经过几天的调试,我发现我想要添加的节点上缺少 python3。我所要做的就是使用命令检查最后几个日志。

ceph log last cephadm
Run Code Online (Sandbox Code Playgroud)

这给出了以下日志消息。

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1036, in _remote_connection
    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host `host2`, possibly because python3 is not installed there: cannot send (already closed?)
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 295, in _finalize
    next_result = self._on_complete(self._value)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 103, in <lambda>
    return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1201, in add_host
    return self._add_host(spec)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1187, in _add_host
    error_ok=True, no_fsid=True)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1104, in _run_cephadm
    with self._remote_connection(host, addr) as tpl:
  File "/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1055, in _remote_connection
    raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to host2 (host2).
Check that the host is reachable and accepts connections using the cephadm SSH key
Run Code Online (Sandbox Code Playgroud)

接下来添加我运行的节点。

ceph orch host add host2 ip_address
Run Code Online (Sandbox Code Playgroud)