ping: sendmsg: 不允许操作(有时)

Nyx*_*nyx 5 networking linux ubuntu ping

在运行 Haproxy 的 Ubuntu 14.04 上,在 之后service haproxy reload,Haproxy 突然将其背后的所有服务器报告为关闭。

经过一番挖掘,我发现 ping 不能正常工作,有时它能够成功 ping,然后几秒钟后我们得到错误ping: sendmsg: Operation not permitted

也无法解决subdomain.domain.com

iptables -L没有显示任何规则。iptables --flush没有帮助。

有任何想法吗?

root@some-test:~# ping 107.1.1.1

PING 107.1.1.1 (107.1.1.1) 56(84) bytes of data.
64 bytes from 107.1.1.1: icmp_seq=1 ttl=63 time=0.425 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=6 ttl=63 time=0.390 ms
64 bytes from 107.1.1.1: icmp_seq=7 ttl=63 time=0.533 ms
64 bytes from 107.1.1.1: icmp_seq=8 ttl=63 time=0.357 ms
64 bytes from 107.1.1.1: icmp_seq=9 ttl=63 time=0.343 ms
64 bytes from 107.1.1.1: icmp_seq=10 ttl=63 time=0.380 ms
64 bytes from 107.1.1.1: icmp_seq=11 ttl=63 time=0.398 ms
64 bytes from 107.1.1.1: icmp_seq=12 ttl=63 time=0.423 ms
64 bytes from 107.1.1.1: icmp_seq=13 ttl=63 time=0.293 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=16 ttl=63 time=0.371 ms
64 bytes from 107.1.1.1: icmp_seq=17 ttl=63 time=0.374 ms
64 bytes from 107.1.1.1: icmp_seq=18 ttl=63 time=0.305 ms
64 bytes from 107.1.1.1: icmp_seq=19 ttl=63 time=0.259 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=24 ttl=63 time=0.370 ms
64 bytes from 107.1.1.1: icmp_seq=25 ttl=63 time=0.316 ms
64 bytes from 107.1.1.1: icmp_seq=26 ttl=63 time=0.412 ms
64 bytes from 107.1.1.1: icmp_seq=27 ttl=63 time=0.512 ms
64 bytes from 107.1.1.1: icmp_seq=28 ttl=63 time=0.375 ms
64 bytes from 107.1.1.1: icmp_seq=29 ttl=63 time=0.352 ms
64 bytes from 107.1.1.1: icmp_seq=30 ttl=63 time=0.331 ms
64 bytes from 107.1.1.1: icmp_seq=31 ttl=63 time=0.290 ms
64 bytes from 107.1.1.1: icmp_seq=32 ttl=63 time=0.353 ms
64 bytes from 107.1.1.1: icmp_seq=33 ttl=63 time=0.378 ms
64 bytes from 107.1.1.1: icmp_seq=34 ttl=63 time=0.523 ms
64 bytes from 107.1.1.1: icmp_seq=35 ttl=63 time=0.351 ms
64 bytes from 107.1.1.1: icmp_seq=36 ttl=63 time=0.302 ms
64 bytes from 107.1.1.1: icmp_seq=37 ttl=63 time=0.496 ms
64 bytes from 107.1.1.1: icmp_seq=38 ttl=63 time=0.377 ms
64 bytes from 107.1.1.1: icmp_seq=39 ttl=63 time=0.357 ms
64 bytes from 107.1.1.1: icmp_seq=40 ttl=63 time=0.396 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=52 ttl=63 time=0.372 ms
64 bytes from 107.1.1.1: icmp_seq=53 ttl=63 time=0.412 ms
64 bytes from 107.1.1.1: icmp_seq=54 ttl=63 time=0.321 ms
64 bytes from 107.1.1.1: icmp_seq=55 ttl=63 time=0.366 ms
64 bytes from 107.1.1.1: icmp_seq=56 ttl=63 time=0.379 ms
64 bytes from 107.1.1.1: icmp_seq=57 ttl=63 time=0.395 ms
64 bytes from 107.1.1.1: icmp_seq=58 ttl=63 time=0.488 ms
64 bytes from 107.1.1.1: icmp_seq=59 ttl=63 time=0.513 ms
64 bytes from 107.1.1.1: icmp_seq=60 ttl=63 time=0.435 ms
^C
--- 107.1.1.1 ping statistics ---
60 packets transmitted, 39 received, 35% packet loss, time 59008ms
rtt min/avg/max/mdev = 0.259/0.385/0.533/0.067 ms
Run Code Online (Sandbox Code Playgroud)

und*_*ine 5

我认为问题是因为 conntrack 中的连接数超过了 - 然后在旧连接过期之前无法建立新连接.. 可能你可以在 dmesg 中看到类似的东西:

[1824447.285257] nf_conntrack: table full, dropping packet.
[1824447.522502] nf_conntrack: table full, dropping packet.
Run Code Online (Sandbox Code Playgroud)

您可以在以下位置看到 conntrack 的当前最大值:

undefine@uml:~$ sudo sysctl net.nf_conntrack_max
net.nf_conntrack_max = 65536
Run Code Online (Sandbox Code Playgroud)

和当前的 conntrack 计数:

undefine@uml:~$ sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 157
Run Code Online (Sandbox Code Playgroud)

您可以使用 conntrack -L(来自 conntrack 包的工具)显示的当前连接。在那里查看并检查它们是什么类型很有用 - 有些可能不是必需的。

你有三种可能:

  1. 不要使用 conntrack(简单地说 - 不要使用 nat 表并卸载 nf_conntrack 模块
  2. 禁用传出连接的 conntrack(在原始表中,对有问题的连接使用 -j NOTRACK
  3. 通过以下方式增加连接数:

    undefine@uml:~$ sudo sysctl net.nf_conntrack_max=512000 net.nf_conntrack_max = 512000 或将 net.nf_conntrack_max=512000 放入 /etc/sysctl.conf 然后调用 sysctl -w 重新加载它。