rom*_*man 6 optimization fiber 10gbethernet linux-networking nvme
搭建实验实验室集群,通过10G光纤连接接收数据的写入速度为本地写入速度的10%。
测试两台相同机器之间的传输速度;iperf3
显示良好的内存到内存速度为9.43Gbits/s。磁盘(读取)到内存传输的速度(9.35Gbit/s):
test@rbox1:~$ iperf3 -s -B 10.0.0.21
test@rbox3:~$ iperf3 -c 10.0.0.21 -F /mnt/k8s/test.3g
Connecting to host 10.0.0.21, port 5201
Sent 9.00 GByte / 9.00 GByte (100%) of /mnt/k8s/test.3g
[ 5] 0.00-8.26 sec 9.00 GBytes 9.35 Gbits/sec
Run Code Online (Sandbox Code Playgroud)
但是发送超过 10G 的数据并写入另一台机器上的磁盘要慢一个数量级:
test@rbox1:~$ iperf3 -s 10.0.0.21 -F /tmp/foo -B 10.0.0.21
test@rbox3:~$ iperf3 -c 10.0.0.21
Connecting to host 10.0.0.21, port 5201
[ 5] local 10.0.0.23 port 39970 connected to 10.0.0.21 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 103 MBytes 864 Mbits/sec 0 428 KBytes
[ 5] 1.00-2.00 sec 100 MBytes 842 Mbits/sec 0 428 KBytes
[ 5] 2.00-3.00 sec 98.6 MBytes 827 Mbits/sec 0 428 KBytes
[ 5] 3.00-4.00 sec 99.3 MBytes 833 Mbits/sec 0 428 KBytes
[ 5] 4.00-5.00 sec 91.5 MBytes 768 Mbits/sec 0 428 KBytes
[ 5] 5.00-6.00 sec 94.4 MBytes 792 Mbits/sec 0 428 KBytes
[ 5] 6.00-7.00 sec 98.1 MBytes 823 Mbits/sec 0 428 KBytes
[ 5] 7.00-8.00 sec 91.2 MBytes 765 Mbits/sec 0 428 KBytes
[ 5] 8.00-9.00 sec 91.0 MBytes 764 Mbits/sec 0 428 KBytes
[ 5] 9.00-10.00 sec 91.5 MBytes 767 Mbits/sec 0 428 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 959 MBytes 804 Mbits/sec 0 sender
Sent 959 MByte / 9.00 GByte (10%) of /mnt/k8s/test.3g
[ 5] 0.00-10.00 sec 953 MBytes 799 Mbits/sec receiver
Run Code Online (Sandbox Code Playgroud)
NVME 驱动器能够更快地本地写入(详细dd
和fio
测量如下) - 对于单个进程和4k/8k/10m块:330/500/1300 MB/s 的fio
随机写入速度
我正在尝试实现接近 NVME 驱动器的实际本地写入速度的写入速度(所以,是的,很好地说明了这个假设——我希望能够达到非常相似的速度来写入单个 NVME 驱动器网络;但我什至无法获得其中的 20%)。
在这一点上,我完全被跺脚了,看不到还有什么可以尝试的——除了不同的内核/操作系统——任何指针、更正和帮助都将不胜感激。
这里有一些测量/信息/结果:
两台机器上的巨型帧 (MTU 9000) 并验证它们工作(与ping -mping -M do -s 8972
)
消除网络交换机的干扰,我通过 2m Dumplex OM3 多模光纤电缆直接连接两台机器(所有机器上的电缆和收发器都相同),并将 iperf3 服务器/客户端绑定到这些接口。结果是一样的(慢)。
在测试期间断开所有其他以太网/光纤电缆(以消除路由问题) - 没有变化。
更新了主板和光纤网卡的固件(同样,没有变化)。我还没有更新 NVME 固件 - 似乎已经是最新的了。
甚至尝试将 10G 卡从 PCIE 插槽 1 移到下一个可用插槽;想知道 NVME 和 10G NIC 是否正在共享和最大化物理集线器通道带宽(同样,没有可衡量的变化)。
编辑 06/07:
按照@shodanshok 的评论,通过 NFS 挂载远程机器;结果如下:
nfs exports: /mnt/nfs *(rw,no_subtree_check,async,insecure,no_root_squash,fsid=0)
cat /etc/mtab | grep nfs 10.0.0.21:/mnt/nfs /mnt/nfs1 nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.21,mountvers=3,mountport=52335,mountproto=udp,local_lock=none,addr=10.0.0.21 0 0
fio --name=random-write --ioengine=libaio --rw=randwrite --bs=$SIZE --numjobs=1 --iodepth=1 --runtime=30 --end_fsync=1 --size=3g
dd if=/dev/zero of=/mnt/nfs1/test bs=$SIZE count=$(3*1024/$SIZE)
| fio (bs=4k) | fio (bs=8k) | fio (bs=1M) | dd (bs=4k) | dd (bs=1M)
nfs (udp) | 153 | 210 | 984 | 907 |962
nfs (tcp) | 157 | 205 | 947 | 946 |916
Run Code Online (Sandbox Code Playgroud)
所有这些测量都是MB/s我很满意 1M 块非常接近 10G 连接的理论速度限制。
貌似iperf3 -F ...
不是测试网络写入速度的方法;但我也会尝试让iperf3
开发人员接受它。
每台机器都有AMD Ryzen 3 3200G 8GB RAM,MPG X570 GAMING PLUS (MS-7C37) 主板。1 个 1TB NVME 驱动器(消费级 WD Blue SN550 NVMe SSD WDS100T2B0C)在最靠近 CPU 的 M.2 插槽中。PCIe 插槽中还有一张 SolarFlare S7120 10G 光纤卡。
NVME 磁盘信息:
test@rbox1:~$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 21062Y803544 WDC WDS100T2B0C-00PXH0 1 1.00 TB / 1.00 TB 512 B + 0 B 211210WD
Run Code Online (Sandbox Code Playgroud)
NVME 磁盘写入速度(4k/8k/10M)
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=4k count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 0.00599554 s, 683 MB/s
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=8k count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB, 7.8 MiB) copied, 0.00616639 s, 1.3 GB/s
test@rbox1:~$ dd if=/dev/zero of=/tmp/temp.bin bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 7.00594 s, 1.5 GB/s
Run Code Online (Sandbox Code Playgroud)
使用 fio-3.16 测试随机写入速度:
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=9447MiB (9906MB), run=30174-30174msec
Disk stats (read/write):
dm-0: ios=2/969519, merge=0/0, ticks=0/797424, in_queue=797424, util=21.42%, aggrios=2/973290, aggrmerge=0/557, aggrticks=0/803892, aggrin_queue=803987, aggrutil=21.54%
nvme0n1: ios=2/973290, merge=0/557, ticks=0/803892, in_queue=803987, util=21.54%
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=8k --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=491MiB/s (515MB/s), 491MiB/s-491MiB/s (515MB/s-515MB/s), io=14.5GiB (15.6GB), run=30213-30213msec
Disk stats (read/write):
dm-0: ios=1/662888, merge=0/0, ticks=0/1523644, in_queue=1523644, util=29.93%, aggrios=1/669483, aggrmerge=0/600, aggrticks=0/1556439, aggrin_queue=1556482, aggrutil=30.10%
nvme0n1: ios=1/669483, merge=0/600, ticks=0/1556439, in_queue=1556482, util=30.10%
test@rbox1:~$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=10m --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=posixaio, iodepth=1
Run status group 0 (all jobs):
WRITE: bw=1250MiB/s (1310MB/s), 1250MiB/s-1250MiB/s (1310MB/s-1310MB/s), io=36.9GiB (39.6GB), run=30207-30207msec
Disk stats (read/write):
dm-0: ios=9/14503, merge=0/0, ticks=0/540252, in_queue=540252, util=68.96%, aggrios=9/81551, aggrmerge=0/610, aggrticks=5/3420226, aggrin_queue=3420279, aggrutil=69.16%
nvme0n1: ios=9/81551, merge=0/610, ticks=5/3420226, in_queue=3420279, util=69.16%
Run Code Online (Sandbox Code Playgroud)
核心:
test@rbox1:~$ uname -a
Linux rbox1 5.8.0-55-generic #62-Ubuntu SMP Tue Jun 1 08:21:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Run Code Online (Sandbox Code Playgroud)
光纤网卡:
test@rbox1:~$ sudo sfupdate
Solarflare firmware update utility [v8.2.2]
Copyright 2002-2020 Xilinx, Inc.
Loading firmware images from /usr/share/sfutils/sfupdate_images
enp35s0f0np0 - MAC: 00-0F-53-3B-7D-D0
Firmware version: v8.0.1
Controller type: Solarflare SFC9100 family
Controller version: v6.2.7.1001
Boot ROM version: v5.2.2.1006
The Boot ROM firmware is up to date
The controller firmware is up to date
Run Code Online (Sandbox Code Playgroud)
光纤网卡的初始化和 MTU 设置:
test@rbox1:~$ sudo dmesg | grep sf
[ 0.210521] ACPI: 10 ACPI AML tables successfully acquired and loaded
[ 1.822946] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): Solarflare NIC detected
[ 1.824954] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): Part Number : SFN7x22F
[ 1.825434] sfc 0000:23:00.0 (unnamed net_device) (uninitialized): no PTP support
[ 1.958282] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): Solarflare NIC detected
[ 2.015966] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): Part Number : SFN7x22F
[ 2.031379] sfc 0000:23:00.1 (unnamed net_device) (uninitialized): no PTP support
[ 2.112729] sfc 0000:23:00.0 enp35s0f0np0: renamed from eth0
[ 2.220517] sfc 0000:23:00.1 enp35s0f1np1: renamed from eth1
[ 3.494367] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 1748.247082] sfc 0000:23:00.0 enp35s0f0np0: link up at 10000Mbps full-duplex (MTU 1500)
[ 1809.625958] sfc 0000:23:00.1 enp35s0f1np1: link up at 10000Mbps full-duplex (MTU 9000)
Run Code Online (Sandbox Code Playgroud)
主板编号:
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7C37
Version: 2.0
Run Code Online (Sandbox Code Playgroud)
其他硬件信息(主要用于列出物理连接 - 网桥)
test@rbox1:~$ hwinfo --short
cpu:
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1500 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1775 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 1266 MHz
AMD Ryzen 3 3200G with Radeon Vega Graphics, 2505 MHz
storage:
ASMedia ASM1062 Serial ATA Controller
Sandisk Non-Volatile memory controller
AMD FCH SATA Controller [AHCI mode]
AMD FCH SATA Controller [AHCI mode]
network:
enp35s0f1np1 Solarflare SFN7x22F-R3 Flareon Ultra 7000 Series 10G Adapter
enp35s0f0np0 Solarflare SFN7x22F-R3 Flareon Ultra 7000 Series 10G Adapter
enp39s0 Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
network interface:
br-0d1e233aeb68 Ethernet network interface
docker0 Ethernet network interface
vxlan.calico Ethernet network interface
veth0ef4ac4 Ethernet network interface
enp35s0f0np0 Ethernet network interface
enp35s0f1np1 Ethernet network interface
lo Loopback network interface
enp39s0 Ethernet network interface
disk:
/dev/nvme0n1 Sandisk Disk
/dev/sda WDC WD5000AAKS-4
partition:
/dev/nvme0n1p1 Partition
/dev/nvme0n1p2 Partition
/dev/nvme0n1p3 Partition
/dev/sda1 Partition
bridge:
AMD Matisse Switch Upstream
AMD Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
AMD Raven/Raven2 Device 24: Function 3
AMD Raven/Raven2 PCIe GPP Bridge [6:0]
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 1
AMD Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
AMD FCH LPC Bridge
AMD Matisse PCIe GPP Bridge
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 6
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Root Complex
AMD Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
AMD Raven/Raven2 Device 24: Function 4
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 2
AMD Matisse PCIe GPP Bridge
AMD Raven/Raven2 Device 24: Function 0
AMD Raven/Raven2 Device 24: Function 7
AMD Raven/Raven2 PCIe GPP Bridge [6:0]
AMD Raven/Raven2 Device 24: Function 5
Run Code Online (Sandbox Code Playgroud)
这个答案的灵感来自@shodanshok,他发表了评论(所以我不能赞成他的贡献——而是发布一个答案)
编辑 2021/06/09 -iperf3
开发人员发现了一个可能的问题;该软件包的较新版本可能有不同的行为,YMMV。请参阅: https: //github.com/esnet/iperf/issues/1159
最初,我用来iperf3 -F ....
测量网络写入速度(以验证 10G 光纤连接)。fio
然而,它产生的结果比通过 NFS 写入数据(使用基准测试生成)要慢得多。
这是非常令人费解的,因为rsync
远低于这个速度100MB/s
,即使考虑到解密/加密,它在 10G 光纤上也不应该这么慢。所以我一直在错误的方向上挖掘。
下面的测量显示,带有 NVME(单)磁盘的 10G 网络能够超过 900MB/s,并且有空闲的 CPU 容量。
在我的设置中,我使用逻辑卷 (LVM),但奇怪的是 LVM 统计信息与分区不一致NVME
;这是系统上唯一的分区——因此,如果没有 LVM,看看会发生什么可能会很有趣。
nfs 导出:
/mnt/nfs *(rw,no_subtree_check,async,insecure,no_root_squash,fsid=0)
猫 /etc/mtab | grep NFS
10.0.0.21:/mnt/nfs /mnt/nfs1 nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.21,mountvers=3,mountport=52335,mountproto=udp,local_lock=none,addr=10.0.0.21 0 0
Run Code Online (Sandbox Code Playgroud)
用于产生以下测量结果的命令:
`fio --name=random-write --ioengine=libaio --rw=randwrite --bs=$SIZE --numjobs=1 --iodepth=1 --runtime=30 --end_fsync=1 --size=3g`
`dd if=/dev/zero of=/mnt/nfs1/test bs=$SIZE count=$(3*1024/$SIZE)`
| | fio (bs=4k) | fio (bs=8k) | fio (bs=1M) | dd (bs=4k) | dd (bs=1M) |
|------------|----------------|----------------|---------------|---------------|------------|
|nfs (udp) | 153 | 210 | 984 | 907 | 962 |
|nfs (tcp) | 157 | 205 | 947 | 946 | 916 |
Run Code Online (Sandbox Code Playgroud)
从:
`fio --name=random-write --ioengine=libaio --rw=randwrite --bs=1m --numjobs=1 --iodepth=1 --runtime=30 --end_fsync=1 --size=20g`
Run Code Online (Sandbox Code Playgroud)
这
本地写入速度 | NFS写入速度(10G光纤) |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
归档时间: |
|
查看次数: |
216 次 |
最近记录: |