Linux 服务器在 __netif_receive_skb_core 中丢弃 RX 数据包

Question

Linux 服务器在 __netif_receive_skb_core 中丢弃 RX 数据包

Ber*_*ard 7 ubuntu linux-networking packetloss

我有一个 Ubuntu 18.04 服务器丢弃接收到的数据包，但我不知道为什么。

这是来自 netdata 的丢弃数据包的图表：

服务器运行多个 docker 容器和网络，因此有多个 Linux 网桥和 veth 接口。不过，问题与物理接口有关。没有配置 VLAN。

除了 Docker 生成的规则外，机器没有 IPtables 规则。

网卡是 Intel I210（igb驱动程序）。

通过 TCP (rsync) 复制数据以 1G 线速工作，因此不会损坏很多 TCP 数据包。（由于窗口大小的减小，我预计 TCP 下降会极大地损害传输性能。）

# uname -a
Linux epyc 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Run Code Online (Sandbox Code Playgroud)

可以在上看到 RX-DRP eno1，这是机器唯一的物理接口。它随着 ca 增加。2 数据包/秒，流量很小（管理员 ssh，仅少量 dns 查询）。

# netstat -ni
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
br-69eb0  1500    45768      0      0 0         41072      0      0      0 BMRU
br-bf2c7  1500       71      0      0 0            85      0      0      0 BMRU
br-f4e34  1500   187676      0      0 0        192128      0      0      0 BMRU
docker0   1500    62739      0      0 0         70194      0      0      0 BMRU
eno1      1500 55517866      0 271391 35     19796132      0      0      0 BMRU
lo       65536     7381      0      0 0          7381      0      0      0 LRU
veth078d  1500    40657      0      0 0         48148      0      0      0 BMRU
veth231e  1500     2582      0      0 0          2323      0      0      0 BMRU
veth2f4f  1500       19      0      0 0           164      0      0      0 BMRU

Run Code Online (Sandbox Code Playgroud)

网卡设置 (ethtool)

我试图禁用我可以编辑的所有 RX 硬件卸载设置，但它没有帮助。

我增加了缓冲区，也没有帮助。

# ethtool -S eno1 | grep rx
     rx_packets: 55580744
     rx_bytes: 76852450760
     rx_broadcast: 294019
     rx_multicast: 228993
     rx_crc_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     rx_long_byte_count: 76852450760
     rx_smbus: 66009
     os2bmc_rx_by_bmc: 19137
     os2bmc_rx_by_host: 190
     rx_hwtstamp_cleared: 0
     rx_errors: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 35
     rx_queue_0_packets: 16271369
     rx_queue_0_bytes: 22437386945
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 5913593
     rx_queue_1_bytes: 6458814275
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 1
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 29208019
     rx_queue_2_bytes: 42357497354
     rx_queue_2_drops: 35
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 4121883
     rx_queue_3_bytes: 5366292094
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0

# ethtool -k eno1 | grep -vE 'tx|fixed'
Features for eno1:
rx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
rx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
rx-all: off
hw-tc-offload: on

# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX:     4096
RX Mini:    0
RX Jumbo:   0
TX:     4096
Current hardware settings:
RX:     256
RX Mini:    0
RX Jumbo:   0
TX:     256

Run Code Online (Sandbox Code Playgroud)

滴滴手表

从这篇博文中，我找到了工具dropwatch，它给出了这个输出：

# sudo ./dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
6 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at icmpv6_rcv+310 (0xffffffff97a2e6a0)
1 drops at ip_rcv_finish_core.isra.18+1b4 (0xffffffff97976644)
1 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
3 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
6 drops at ip_forward+1b5 (0xffffffff97978615)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
6 drops at ip_forward+1b5 (0xffffffff97978615)
1 drops at tcp_v6_rcv+16c (0xffffffff97a3829c)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
^C

Run Code Online (Sandbox Code Playgroud)

我读到了这篇文章，因为大多数掉落都发生在__netif_receive_skb_core.

在红帽企业Linux网络性能调优指南说（章“适配器队列”）：

该netif_receive_skb()内核函数会发现一个包在CPU的队列相应的CPU，和排队的数据包。如果该处理器的队列已满且已达到最大大小，则数据包将被丢弃。要调整此设置，请首先确定是否需要增加积压。该/proc/net/softnet_stat文件在第 2 列中包含一个计数器，当 netdev 积压队列溢出时，该计数器会递增。如果此值随时间增加，则netdev_max_backlog需要增加。

增加netdev_max_backlog没有帮助，但这让我想到了 SoftIRQ：

软中断请求

根据 Red Hat 文档，SoftIRQ 有几个有趣的方面：

# cat /proc/net/softnet_stat
00024f83 00000000 000000e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000152c0 00000000 0000008d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00010238 00000000 00000061 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00010d8c 00000000 00000081 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000f3cb3 00000000 00000d83 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0009e391 00000000 0000050d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0025265b 00000000 00001023 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00111a24 00000000 0000095a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
008bcbf0 00000000 0000355d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
004875d8 00000000 00002408 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0001c93c 00000000 000000cc 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00025fdb 00000000 000000fa 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0005d1e5 00000000 000005f2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000f9bfd 00000000 00000b9e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000448bc 00000000 00000407 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00044f25 00000000 00000415 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Run Code Online (Sandbox Code Playgroud)

如您所见，第2列始终为 0，与上述关于的陈述不符netif_receive_skb()，但我看到第 3 列中的值不断增加。

这在“SoftIRQ Misses”中有描述。再次引用 Red Hat 文档：

有时，需要增加允许 SoftIRQ 在 CPU 上运行的时间。这被称为netdev_budget. 预算的默认值是300。如果第3列在/proc/net/softnet_stat增加，这个值可以加倍，这表明SoftIRQ没有得到足够的CPU时间

一个应该增加到net.core.netdev_budget600。没有改变任何事情。

我尝试过的其他事情

统计信息仍显示共享接口上 BMC 的流量。此后我尝试将 BMC 移至专用接口，但并没有改善情况。

SuSE 的这份文档给出了丢包的一些合法原因以及确认丢包无害的方法：当将接口设置为 PROMISC 模式时，它们应该消失，因为它们是由未知协议或错误的 VLAN 标记引起的。我启用了 promisc 模式，但数据包仍然被丢弃。

怀疑超大巨型帧，我将 MTU 大小修改为 9000。这没有帮助。

Answer 1

Ber*_*ard 5

我终于找到了问题的根源：这些是以太类型未知的以太网帧。我认为这些水滴应该在 PROMISC 模式下消失，但显然他们没有。

就我而言，它是由 AVM FritzBox 路由器发送的以太网类型0x8912和0x88e1帧来检测电源线适配器。为了确认，我通过nftables以下规则集阻止这些帧/etc/nftables.conf：

table netdev filter {
    chain ingress {
        type filter hook ingress device eno1 priority 0; policy accept;
        meta protocol {0x8912, 0x88e1} drop
    }
}

Run Code Online (Sandbox Code Playgroud)

之后，网络掉线就消失了！即使没有被阻止，这些也是无害的，不会干扰我的服务器。无论如何，我都会阻止它们，以保持监控干净并查看真正的接口下降/性能问题。

可以在此处找到有关这些框架的更多信息：

归档时间：	5 年，6 月前
查看次数：	3180 次
最近记录：	4 年，4 月前