如何使用 10Gb 光纤连接在 Linux 上微调 TCP 性能

Question

如何使用 10Gb 光纤连接在 Linux 上微调 TCP 性能

use*_*029 8 networking redhat kernel tcp

我们有 2 台 Red Hat 服务器，专门用于客户速度测试。它们都使用 10Gb 光纤连接并位于 10Gb 链路上。这些服务器之间的所有网络设备都完全支持 10Gb/s。使用 Iperf 或 Iperf3，我能得到的最好的速度大约是 6.67Gb/s。话虽如此，一台服务器正在生产中（客户正在使用它），另一台服务器在线但未使用。（我们用它来测试 atm） 6.67Gb/s 也是一种方式，我应该提一下。我们将这些称为服务器 A 和服务器 B。

当服务器 A 作为 iperf 服务器时，我们获得了 6.67Gb/s 的速度。当服务器 A 作为服务器 B 的客户端时，它只能推送大约 20Mb/s。

我做了什么：

到目前为止，我所做的唯一一件事就是将两台服务器上的 TX/RX 缓冲区增加到最大值。一个设置为 512，另一个设置为 453。（仅 RX，TX 已经最大化）所以这里是更新后的样子：

Server A:
Ring parameters for em1:
Pre-set maximums:
RX:     4096
RX Mini:    0
RX Jumbo:   0
TX:     4096
Current hardware settings:
RX:     4096
RX Mini:    0
RX Jumbo:   0
TX:     4096

Server B:
Ring parameters for p1p1:
Pre-set maximums:
RX:     4078
RX Mini:    0
RX Jumbo:   0
TX:     4078
Current hardware settings:
RX:     4078
RX Mini:    0
RX Jumbo:   0
TX:     4078

Run Code Online (Sandbox Code Playgroud)

NICS 看起来像这样：

Server A: 
ixgbe 0000:01:00.0: em1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

Serer B:
bnx2x 0000:05:00.0: p1p1: NIC Link is Up, 10000 Mbps full duplex,     Flow control: ON - receive & transmit

Server A ethtool stats:
 rx_errors: 0
 tx_errors: 0
 rx_over_errors: 0
 rx_crc_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 0
 rx_missed_errors: 0
 tx_aborted_errors: 0
 tx_carrier_errors: 0
 tx_fifo_errors: 0
 tx_heartbeat_errors: 0
 rx_long_length_errors: 0
 rx_short_length_errors: 0
 rx_csum_offload_errors: 123049

 Server B ethtool stats:
 [0]: rx_phy_ip_err_discards: 0
 [0]: rx_csum_offload_errors: 0
 [1]: rx_phy_ip_err_discards: 0
 [1]: rx_csum_offload_errors: 0
 [2]: rx_phy_ip_err_discards: 0
 [2]: rx_csum_offload_errors: 0
 [3]: rx_phy_ip_err_discards: 0
 [3]: rx_csum_offload_errors: 0
 [4]: rx_phy_ip_err_discards: 0
 [4]: rx_csum_offload_errors: 0
 [5]: rx_phy_ip_err_discards: 0
 [5]: rx_csum_offload_errors: 0
 [6]: rx_phy_ip_err_discards: 0
 [6]: rx_csum_offload_errors: 0
 [7]: rx_phy_ip_err_discards: 0
 [7]: rx_csum_offload_errors: 0
 rx_error_bytes: 0
 rx_crc_errors: 0
 rx_align_errors: 0
 rx_phy_ip_err_discards: 0
 rx_csum_offload_errors: 0
 tx_error_bytes: 0
 tx_mac_errors: 0
 tx_carrier_errors: 0
 tx_deferred: 0
 recoverable_errors: 0
 unrecoverable_errors: 0

Run Code Online (Sandbox Code Playgroud)

潜在问题：服务器 A 有大量 rx_csum_offload_errors。服务器 A 是生产中的服务器，我不禁想到 CPU 中断可能是这里的一个潜在因素，是什么导致了我看到的错误。

来自服务器 A 的 cat /proc/interrupts：

122:   54938283          0          0          0          0            0          0          0          0          0          0          0            0          0          0          0          0          0          0           0          0          0          0          0  IR-PCI-MSI-edge      em1-  TxRx-0
123:   51653771          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-1
124:   52277181          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-2
125:   51823314          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-3
126:   57975011          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-4
127:   52333500          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-5
128:   51899210          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-6
129:   61106425          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-7
130:   51774758          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-8
131:   52476407          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-9
132:   53331215          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      em1-TxRx-10
133:   52135886          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0

Run Code Online (Sandbox Code Playgroud)

如果这可能是问题所在，禁用 rx 校验和会有所帮助吗？此外，我在未投入生产的服务器上没有看到 CPU 中断，这是有道理的，因为它的 NIC 不需要 CPU 时间。

Server A:
 ethtool -k em1
Features for em1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-unneeded: off
tx-checksum-ip-generic: off
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: on [fixed]
tx-checksum-sctp: on [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
loopback: off [fixed]

Run Code Online (Sandbox Code Playgroud)

除了使用巨型帧，这是不可能的，因为我们的网络设备不支持它们，我还能做什么或检查什么来为我的 10Gb 网络提供最佳的 TCP 性能？考虑到其中一台服务器正在生产以及我对 NIC 产生的 CPU 中断的假设，我猜6.67Gb/s 还不错。但是在 10Gb 链路上的另一个方向上的 20Mb/s 速度是完全不可接受的。任何帮助将不胜感激。

服务器 A 规格：x64 24v CPU 32GB RAM RHEL 6.7

服务器 B 规格：x64 16v CPU 16GB ram RHEL 6.7

Answer 1

Sav*_*btz 5

在 Linux/Intel 中，我将使用以下方法进行性能分析：

硬件：

turbostat
查找内核、频率、SMI 数量的 C/P 状态。[1]
cpufreq-info
寻找当前的驱动器、频率和调速器。
atop
寻找跨内核的中断分布
寻找上下文切换、中断。
ethtool
-S 用于统计，查找错误、丢失、溢出、错过的中断等
-k 用于卸载，启用 GRO/GSO，rss(/rps/rfs)/xps
-g 用于环大小，增加
-c 用于中断合并

核心：

/proc/net/softirq[2] 和/proc/interrupts[3]
再次，分发，错过，延迟中断，（可选）NUMA-affinity
perf top
看看内核/基准测试在哪里花费时间。
iptables
查看是否有可能影响性能的规则（如果有）。
netstat -s, netstat -m,/proc/net/*
查找错误计数器和缓冲区计数
sysctl / grub
这里有很多调整。尝试增加哈希表大小、使用内存缓冲区、拥塞控制和其他旋钮。

在您的情况下，您的主要问题是跨内核的中断分布，因此修复它将是您最好的行动。

附注。不要忘记，在这些基准测试中，内核和驱动程序/固件版本起着重要作用。

聚苯乙烯。您可能想要安装ixgbe来自 Intel[4]的最新驱动程序。不要忘记在那里阅读自述文件并检查脚本目录。它有很多与性能相关的技巧。

[0] 英特尔也有关于扩展网络性能的很好的文档
https://www.kernel.org/doc/Documentation/networking/scaling.txt
[1] 您可以将处理器固定到特定的 C 状态：
https:// gist.github.com/SaveTheRbtz/f5e8d1ca7b55b6a7897b
[2] 您可以通过以下方式分析该数据：
https : //gist.github.com/SaveTheRbtz/172b2e2eb3cbd96b598d [
3] 您可以通过以下方式设置关联：
https : //gist.github.com /SaveTheRbtz/8875474
[4] https://sourceforge.net/projects/e1000/files/ixgbe%20stable/

Answer 2

eww*_*ite 4

服务器的规格（品牌和型号）是否相同？您是否对 sysctl.conf 进行了任何更改？

您应该启用 irqbalance，因为您的中断仅发生在 CPU0 上。

如果您没有使用 EL6 的调整配置文件，您应该根据此处的时间表选择一个接近您的工作负载的配置文件。

归档时间：	9 年，7 月前
查看次数：	19088 次
最近记录：	7 年，8 月前