MPI 进程或守护程序无法完成 TCP 连接

NoD*_*ion 5 mpi openmpi

打开 MPI:4.0.1a

主机文件:

  • 34bb0519eAAA
  • a2935f150BBB

我在机器里34bb0519eAAA。我可以使用 ssh成功a2935f150BBB连接a2935f150BBB。并且 ssh34bb0519eAAA在机器中a2935f150BBB 连接34bb0519eAAA成功。

但是当我 mpiexec 命令时。我收到错误消息

****Warning: Permanently added '[XX.XX.XX.XX]:XX' (a2935f150BBB'IP address) to the list of known hosts.**
----------------------**--------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    a2935f150BBB
  Remote host:   34bb0519eAAA
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
Run Code Online (Sandbox Code Playgroud)

我对此感到非常困惑。因为我成功地互相运行了 ssh。怎么可能失败呢。

这是 ssh 连接 ssh a2935f150BBB
警告:已将 '[XX.XX.XX.XX]:XX 永久添加到已知主机列表中。欢迎使用 Ubuntu 18.04.1 LTS (XXXXXXXXXXXXXXXXXX)

通过删除用户不登录的系统上不需要的软件包和内容,该系统已被最小化。

要恢复此内容,您可以运行“取消最小化”命令。上次登录:XXXXXXXXXXXXX 来自 XXXXXXXXXX