无法在群集模式下从覆盖网络中跨节点的服务访问端口

Rad*_*ads 5 docker docker-swarm docker-swarm-mode docker-network

我使用以下撰写文件进行堆栈部署

version: '3.8'
x-deploy: &Deploy
  replicas: 1
  placement: &DeployPlacement
    max_replicas_per_node: 1
  restart_policy:
    max_attempts: 15
    window: 60s
  resources: &DeployResources
    reservations: &DeployResourcesReservations
      cpus: '0.05'
      memory: 10M
services:
  serv1:
    image: alpine
    networks:
      - test_nw
    deploy:
      <<: *Deploy
    entrypoint: ["tail", "-f", "/dev/null"]
  serv2:
    image: nginx
    networks:
      - test_nw
    deploy:
      <<: *Deploy
      placement:
        <<: *DeployPlacement
        constraints:
          - "node.role!=manager"
    expose: # deprecated, but I leave it here anyway
      - "80"
networks:
  test_nw:
    name: test_nw
    driver: overlay
Run Code Online (Sandbox Code Playgroud)

为了方便起见,我将使用test_serv1running via containerinhost1test_serv2running via container2inhost2来处理此端口的其余部分,因为实际的主机和容器名称不断变化。

当我进入 的 shell 时test_serv1,当我 ping 时会发生以下情况serv2

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms
Run Code Online (Sandbox Code Playgroud)

container2但是,检查时显示的ipcontainer2是 10.0.7.6

ubuntu@host2:~$ sudo docker inspect test_serv2.1.container2
[
    {
****************
        "NetworkSettings": {
            "Bridge": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": null
            },
****************
            "Networks": {
                "test_nw": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.7.6"
                    },
                    "Links": null,
                    "Aliases": [
                        "80c06bb29a42"
                    ],
                    "NetworkID": "sp56aiqxnt56yglsd8mc1zqpv",
                    "EndpointID": "dac52f1d7fa148f5acac20f89d6b709193b3c11fc90201424cd052785121e706",
                    "Gateway": "",
                    "IPAddress": "10.0.7.6",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:07:06",
****************
            }
        }
    }
]
Run Code Online (Sandbox Code Playgroud)

我可以看到它container2正在侦听所有接口上的端口 80,并且它本身可以 ping 通 10.0.7.5 和 10.0.7.6 (!!),并且可以访问两个 ip 上的端口 80 (!!)。

ubuntu@host2:~$ sudo docker exec -it test_serv2.1.container2 bash
root@80c06bb29a42:/# ping 10.0.7.5
PING 10.0.7.5 (10.0.7.5) 56(84) bytes of data.
64 bytes from 10.0.7.5: icmp_seq=1 ttl=64 time=0.093 ms
64 bytes from 10.0.7.5: icmp_seq=2 ttl=64 time=0.094 ms
^C
--- 10.0.7.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 0.093/0.093/0.094/0.009 ms
root@80c06bb29a42:/# ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6) 56(84) bytes of data.
64 bytes from 10.0.7.6: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 10.0.7.6: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 10.0.7.6: icmp_seq=3 ttl=64 time=0.053 ms
^C
--- 10.0.7.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 50ms
rtt min/avg/max/mdev = 0.035/0.049/0.059/0.010 ms
root@80c06bb29a42:/# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          33110      1/nginx: master pro 
tcp        0      0 127.0.0.11:35491        0.0.0.0:*               LISTEN      0          32855      -                   
tcp6       0      0 :::80                   :::*                    LISTEN      0          33111      1/nginx: master pro 
udp        0      0 127.0.0.11:43477        0.0.0.0:*                           0          32854      -                   
root@80c06bb29a42:/# curl 10.0.7.5:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# curl 10.0.7.6:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# 
Run Code Online (Sandbox Code Playgroud)

然而,当我尝试以下操作时container1,我只是想把我的笔记本电脑扔到墙上,因为我无法弄清楚为什么没有其他人遇到这样的问题和/或发布这样的问题:/

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.086 ms
^C
--- serv2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.084/0.085/0.086 ms
/ # curl serv2:80
^C
/ # curl --max-time 10 serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping test_serv2
PING test_serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.071 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.064 ms
64 bytes from 10.0.7.5: seq=2 ttl=64 time=0.125 ms
^C
--- test_serv2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.064/0.086/0.125 ms
/ # curl --max-time 10 test_serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6): 56 data bytes
^C
--- 10.0.7.6 ping statistics ---
87 packets transmitted, 0 packets received, 100% packet loss
/ # curl --max-time 10 10.0.7.6:80
curl: (28) Connection timed out after 10001 milliseconds
/ # 
Run Code Online (Sandbox Code Playgroud)

我已检查所有 docker 端口(TCP 2376、2377、7946、80 和 UDP 7946、4789)在两个节点上均已打开。

这是怎么回事?任何帮助真的很感激!

Rad*_*ads 0

我将这个帖子发布给那些可能会来寻找的人,因为还没有答案。

需要考虑的一些事情(尽管问题中都提到了):

  1. 请确保所有端口再次打开。即使您已经设置过一次,也要彻底检查 iptables。Docker 引擎似乎会更改配置,并且如果您在 docker 启动后打开端口,有时会使其处于不可用状态(重新启动无法修复它,您需要硬停止 -> 重置 iptables -> 启动 docker ce)
  2. 确保您的计算机的本地 IP 地址不冲突。这是一件大事。虽然我无法描述,但你可以尝试了解各种IP类别,看看是否有冲突。
  3. --advertise-addr可能是最微不足道的,但几乎总是被排除的指令:记住始终使用和来初始化或加入群--listen-addr。该--advertise-addr地址应该是面向公众的 IP 地址(即使不是面向互联网,它也是其他主机用来访问该主机的 IP 地址)。记录得不够好,但这--listen-addr必须是 docker 应绑定到的接口的 IP。

完成上述内容后,请注意AWS Ec2不能很好地与跨提供商主机配合。如果您的机器分布在各个提供商(例如 IBM、Azure、GCP 等)中,那么 Ec2 就会在那里进行破坏活动。我很好奇它是如何完成的(必须是低级网络侵权),但我花了相当多的时间试图让它工作,但它不会。