Linux 软件 RAID 5 随机小写性能极差 - 重新配置建议

Rib*_*die 3 linux raid5 raid10

我有 3 个 1 TB 硬盘和 3 500 GB 硬盘。现在每个大小分组都在一个 RAID 5 中,两者都在一个 LVM 卷组中(带有条带化 LV)。

我发现这对于我在小型随机写入中的使用来说太慢了。我已经在 RAID 级别和 LVM 条带级别上处理了条带大小,以及条带缓存和预读缓冲区大小的增加。我还按照通常的建议禁用了 NCQ。

所以我完成了 Linux 软件 raid 5。没有专用控制器,它对我的​​目的没有用。

我要添加另一个 1 TB 驱动器和另一个 500 GB 驱动器,所以每个驱动器 4 个。

您将如何配置八个驱动器以获得最佳的小随机写入性能?当然不包括简单的RAID 0,因为这个设置的重点显然也是为了冗余。我曾考虑将 4 500 GB 磁盘放入 2 个 RAID 0,然后将其添加到其他 4 个 1 TB 硬盘的 RAID 10,对于 6 磁盘 RAID 10,但我不确定这是最佳解决方案。你说什么?

编辑:没有更多的硬件升级预算。我真正要问的是,就四个 1 TB 驱动器可以非常简单地作为 RAID 10 而言,我如何处理四个 500 GB 驱动器,以便它们最适合 4x1TB RAID 10 而不会成为冗余或性能问题?我的另一个想法是将所有四个 500 GB 驱动器一起 RAID 10,然后使用 LVM 将该容量添加到 4x1TB RAID10。你能想到什么更好的吗?

另一个编辑:现有数组的格式如下:

1 TB ext4 formatted lvm striped file share. Shared to two Macs via AFP.
1 500 GB lvm logical volume exported via iscsi to a Mac, formatted as HFS+. Used a Time Machine backup.
1 260 GB lvm logical volume exported via iscsi to a Mac, formatted as HFS+. Used as a Time Machine backup.
1 200 GB ext4 formatted lvm partition, used a disk device for a virtualised OS installtion.
An lvm snapshot of the 500 GB time machine backup.
Run Code Online (Sandbox Code Playgroud)

我还没有尝试过的一件事是用 ext4 文件系统上的文件替换 Time Machine LV(以便 iscsi 挂载指向该文件而不是块设备)。我有一种感觉可以解决我的速度问题,但它会阻止我拍摄这些分区的快照。所以我不确定是否值得进行权衡。

将来,我打算将 iPhoto 和 iTunes 库移动到另一个 HFS+ iSCSI 安装上的服务器上,测试是我开始注意到愚蠢的随机写入性能的方式。

如果你很好奇,我使用了这个 url 的 Raid Math 部分中的信息:http : //wiki.centos.org/HowTos/Disk_Optimization来弄清楚如何为 ext4 分区设置一切(因此我我看到了它的出色性能)但是这似乎对 iSCSI 共享 HFS+ 卷没有任何好处。

更多细节:

 output of lvdisplay:

  --- Logical volume ---
  LV Name                /dev/array/data
  VG Name                array
  LV UUID                2Lgn1O-q1eA-E1dj-1Nfn-JS2q-lqRR-uEqzom
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                1.00 TiB
  Current LE             262144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     2048
  Block device           251:0

  --- Logical volume ---
  LV Name                /dev/array/etm
  VG Name                array
  LV UUID                KSwnPb-B38S-Lu2h-sRTS-MG3T-miU2-LfCBU2
  LV Write Access        read/write
  LV snapshot status     source of
                         /dev/array/etm-snapshot [active]
  LV Status              available
  # open                 1
  LV Size                500.00 GiB
  Current LE             128000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     2048
  Block device           251:1

  --- Logical volume ---
  LV Name                /dev/array/jtm
  VG Name                array
  LV UUID                wZAK5S-CseH-FtBo-5Fuf-J3le-fVed-WzjpOo
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                260.00 GiB
  Current LE             66560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     2048
  Block device           251:2

  --- Logical volume ---
  LV Name                /dev/array/mappingvm
  VG Name                array
  LV UUID                69k2D7-XivP-Zf4o-3SVg-QAbD-jP9W-cG8foD
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                200.00 GiB
  Current LE             51200
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     2048
  Block device           251:3

  --- Logical volume ---
  LV Name                /dev/array/etm-snapshot
  VG Name                array
  LV UUID                92x9Eo-yFTY-90ib-M0gA-icFP-5kC6-gd25zW
  LV Write Access        read/write
  LV snapshot status     active destination for /dev/array/etm
  LV Status              available
  # open                 0
  LV Size                500.00 GiB
  Current LE             128000
  COW-table size         500.00 GiB
  COW-table LE           128000
  Allocated to snapshot  44.89% 
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     2048
  Block device           251:7


output of pvs --align -o pv_name,pe_start,stripe_size,stripes

PV         1st PE  Stripe  #Str
  /dev/md0   192.00k      0     1
  /dev/md0   192.00k      0     1
  /dev/md0   192.00k      0     1
  /dev/md0   192.00k      0     1
  /dev/md0   192.00k      0     0
  /dev/md11  512.00k 256.00k    2
  /dev/md11  512.00k 256.00k    2
  /dev/md11  512.00k 256.00k    2
  /dev/md11  512.00k      0     1
  /dev/md11  512.00k      0     1
  /dev/md11  512.00k      0     0
  /dev/md12  512.00k 256.00k    2
  /dev/md12  512.00k 256.00k    2
  /dev/md12  512.00k 256.00k    2
  /dev/md12  512.00k      0     0

output of cat /proc/mdstat

md12 : active raid5 sdc1[1] sde1[0] sdh1[2]
      976770560 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]

md11 : active raid5 sdg1[2] sdf1[0] sdd1[1]
      1953521152 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]



output of  vgdisplay:


--- Volume group ---
  VG Name               array
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  8
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                5
  Open LV               3
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               2.73 TiB
  PE Size               4.00 MiB
  Total PE              715402
  Alloc PE / Size       635904 / 2.43 TiB
  Free  PE / Size       79498 / 310.54 GiB
  VG UUID               PGE6Oz-jh96-B0Qc-zN9e-LKKX-TK6y-6olGJl



output of dumpe2fs /dev/array/data | head -n 100 (or so)

dumpe2fs 1.41.12 (17-May-2010)
Filesystem volume name:   <none>
Last mounted on:          /mnt/array/data
Filesystem UUID:          b03e8fbb-19e5-479e-a62a-0dca0d1ba567
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              67108864
Block count:              268435456
Reserved block count:     13421772
Free blocks:              113399226
Free inodes:              67046222
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      960
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              128
RAID stripe width:        128
Flex block group size:    16
Filesystem created:       Thu Jul 29 22:51:26 2010
Last mount time:          Sun Oct 31 14:26:40 2010
Last write time:          Sun Oct 31 14:26:40 2010
Mount count:              1
Maximum mount count:      22
Last checked:             Sun Oct 31 14:10:06 2010
Check interval:           15552000 (6 months)
Next check after:         Fri Apr 29 14:10:06 2011
Lifetime writes:          677 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      9e6a9db2-c179-495a-bd1a-49dfb57e4020
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x000059af
Journal start:            1




output of lvs array --aligned -o seg_all,lv_all

  Type    #Str Stripe  Stripe  Region Region Chunk Chunk Start Start SSize   Seg Tags PE Ranges                                       Devices                             LV UUID                                LV           Attr   Maj Min Rahead KMaj KMin KRahead LSize   #Seg Origin OSize   Snap%  Copy%  Move Convert LV Tags Log Modules 
  striped    2 256.00k 256.00k     0      0     0     0     0      0   1.00t          /dev/md11:0-131071 /dev/md12:0-131071           /dev/md11(0),/dev/md12(0)           2Lgn1O-q1eA-E1dj-1Nfn-JS2q-lqRR-uEqzom data         -wi-ao  -1  -1   auto 251  0      1.00m   1.00t    1             0                                                 
  striped    2 256.00k 256.00k     0      0     0     0     0      0 500.00g          /dev/md11:131072-195071 /dev/md12:131072-195071 /dev/md11(131072),/dev/md12(131072) KSwnPb-B38S-Lu2h-sRTS-MG3T-miU2-LfCBU2 etm          owi-ao  -1  -1   auto 251  1      1.00m 500.00g    1        500.00g                                        snapshot
  linear     1      0       0      0      0  4.00k 4.00k    0      0 500.00g          /dev/md11:279552-407551                         /dev/md11(279552)                   92x9Eo-yFTY-90ib-M0gA-icFP-5kC6-gd25zW etm-snapshot swi-a-  -1  -1   auto 251  7      1.00m 500.00g    1 etm    500.00g  44.89                                 snapshot
  striped    2 256.00k 256.00k     0      0     0     0     0      0 260.00g          /dev/md11:195072-228351 /dev/md12:195072-228351 /dev/md11(195072),/dev/md12(195072) wZAK5S-CseH-FtBo-5Fuf-J3le-fVed-WzjpOo jtm          -wi-ao  -1  -1   auto 251  2      1.00m 260.00g    1             0                                                 
  linear     1      0       0      0      0     0     0     0      0 200.00g          /dev/md11:228352-279551                         /dev/md11(228352)                   69k2D7-XivP-Zf4o-3SVg-QAbD-jP9W-cG8foD mappingvm    -wi-a-  -1  -1   auto 251  3      1.00m 200.00g    1             0                                                 




cat /sys/block/md11/queue/logical_block_size 
512
cat /sys/block/md11/queue/physical_block_size 
512
cat /sys/block/md11/queue/optimal_io_size 
524288
cat /sys/block/md11/queue/minimum_io_size 
262144

cat /sys/block/md12/queue/minimum_io_size 
262144
cat /sys/block/md12/queue/optimal_io_size 
524288
cat /sys/block/md12/queue/logical_block_size 
512
cat /sys/block/md12/queue/physical_block_size 
512
Run Code Online (Sandbox Code Playgroud)

编辑:所以没有人可以告诉我这里是否有问题?根本没有具体的建议?嗯。

Tom*_*Tom 5

抱歉,除非控制器有足够的缓存,否则 RAID 5 总是不适合小写。校验和有很多读取和写入。

你最好的床是硬件控制器上的 Raid 10 - 要获得真正的尖叫性能,请获得像adaptec 之类的东西,并将驱动器制作成 SSD 的一半……这样所有读取都将进入 SSD,这将为您提供大量性能,尽管写入显然必须被拆分。不确定 Linux 软件可以做同样的事情。

其余的完全取决于您的使用模式,并且基本上 - 您没有告诉我们任何关于此的信息。