具有 4 个磁盘的 RAID 5 无法在一个故障磁盘上运行?

Mei*_*Mei 6 linux raid mdadm

我发现了一个关于mdadm 备用磁盘的问题,它几乎回答了我的问题,但我不清楚发生了什么。

我们有一个 RAID5 设置了 4 个磁盘 - 所有在正常操作中都标记为active/sync

    Update Time : Sun Sep 29 03:44:01 2013
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
Run Code Online (Sandbox Code Playgroud)

...

Number   Major   Minor   RaidDevice State
   0     202       32        0      active sync   /dev/sdc
   1     202       48        1      active sync   /dev/sdd
   2     202       64        2      active sync   /dev/sde 
   4     202       80        3      active sync   /dev/sdf
Run Code Online (Sandbox Code Playgroud)

但是当其中一个磁盘出现故障时,RAID 停止工作:

    Update Time : Sun Sep 29 01:00:01 2013
          State : clean, FAILED 
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
Run Code Online (Sandbox Code Playgroud)

...

Number   Major   Minor   RaidDevice State
   0     202       32        0      active sync   /dev/sdc
   1     202       48        1      active sync   /dev/sdd
   2       0        0        2      removed
   3       0        0        3      removed

   2     202       64        -      faulty spare   /dev/sde
   4     202       80        -      spare   /dev/sdf
Run Code Online (Sandbox Code Playgroud)

这到底是怎么回事??

修复方法是重新安装 RAID - 幸运的是我可以做到这一点。下次它可能会有一些重要的数据。我需要了解这一点,这样我才能拥有一个不会因为单个磁盘故障而失败的 RAID。

我意识到我没有列出我的期望与发生的事情。

我希望具有 3 个好磁盘和 1 个坏磁盘的 RAID5 将在降级模式下运行 - 3 个活动/同步和 1 个故障。

发生的事情是无中生有地创建了一个备件并宣布有故障 - 然后凭空创建了一个新的备件并宣布了声音 - 之后 RAID 被宣布为无效。

这是来自的输出blkid

$ blkid
/dev/xvda1: LABEL="/" UUID="4797c72d-85bd-421a-9c01-52243aa28f6c" TYPE="ext4" 
/dev/xvdc: UUID="feb2c515-6003-478b-beb0-089fed71b33f" TYPE="ext3" 
/dev/xvdd: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" 
/dev/xvde: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" 
/dev/xvdf: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" 
Run Code Online (Sandbox Code Playgroud)

TYPE 和 SEC_TYPE 很有趣,因为 RAID 具有 XFS,而不是 ext3....

在此磁盘上尝试挂载的日志 - 导致前面列出的最终结果,就像其他所有挂载一样 - 具有以下日志条目:

Oct  2 15:08:51 it kernel: [1686185.573233] md/raid:md0: device xvdc operational as raid disk 0
Oct  2 15:08:51 it kernel: [1686185.580020] md/raid:md0: device xvde operational as raid disk 2
Oct  2 15:08:51 it kernel: [1686185.588307] md/raid:md0: device xvdd operational as raid disk 1
Oct  2 15:08:51 it kernel: [1686185.595745] md/raid:md0: allocated 4312kB
Oct  2 15:08:51 it kernel: [1686185.600729] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
Oct  2 15:08:51 it kernel: [1686185.608928] md0: detected capacity change from 0 to 2705221484544
Oct  2 15:08:51 it kernel: [1686185.615772] md: recovery of RAID array md0
Oct  2 15:08:51 it kernel: [1686185.621150] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Oct  2 15:08:51 it kernel: [1686185.627626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Oct  2 15:08:51 it kernel: [1686185.634024]  md0: unknown partition table
Oct  2 15:08:51 it kernel: [1686185.645882] md: using 128k window, over a total of 880605952k.
Oct  2 15:22:25 it kernel: [1686999.697076] XFS (md0): Mounting Filesystem
Oct  2 15:22:26 it kernel: [1686999.889961] XFS (md0): Ending clean mount
Oct  2 15:24:19 it kernel: [1687112.817845] end_request: I/O error, dev xvde, sector 881423360
Oct  2 15:24:19 it kernel: [1687112.820517] raid5_end_read_request: 1 callbacks suppressed
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423360 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Disk failure on xvde, disabling device.
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Operation continuing on 2 devices.
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423368 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423376 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423384 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423392 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423400 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423408 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423416 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423424 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423432 on xvde).
Oct  2 15:24:19 it kernel: [1687113.432129] md: md0: recovery done.
Oct  2 15:24:19 it kernel: [1687113.685151] Buffer I/O error on device md0, logical block 96
Oct  2 15:24:19 it kernel: [1687113.691386] Buffer I/O error on device md0, logical block 96
Oct  2 15:24:19 it kernel: [1687113.697529] Buffer I/O error on device md0, logical block 64
Oct  2 15:24:20 it kernel: [1687113.703589] Buffer I/O error on device md0, logical block 64
Oct  2 15:25:51 it kernel: [1687205.682022] Buffer I/O error on device md0, logical block 96
Oct  2 15:25:51 it kernel: [1687205.688477] Buffer I/O error on device md0, logical block 96
Oct  2 15:25:51 it kernel: [1687205.694591] Buffer I/O error on device md0, logical block 64
Oct  2 15:25:52 it kernel: [1687205.700728] Buffer I/O error on device md0, logical block 64
Oct  2 15:25:52 it kernel: [1687205.748751] XFS (md0): last sector read failed
Run Code Online (Sandbox Code Playgroud)

我没有看到那里列出了 xvdf...

slm*_*slm 2

我想象您正在像这样创建 RAID5 阵列:

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
Run Code Online (Sandbox Code Playgroud)

这不完全是你想要的。相反,您需要像这样添加磁盘:

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1
$ mdadm --add /dev/md0 /dev/sdd1
Run Code Online (Sandbox Code Playgroud)

或者您可以使用mdadm的选项来添加备件,如下所示:

$ mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
Run Code Online (Sandbox Code Playgroud)

列表中的最后一个驱动器将是备用驱动器。

摘自mdadm 手册页

-n, --raid-devices=
      Specify the number of active devices in the array.  This, plus the 
      number of spare devices (see below) must  equal the  number  of  
      component-devices (including "missing" devices) that are listed on 
      the command line for --create. Setting a value of 1 is probably a 
      mistake and so requires that --force be specified first.  A  value 
      of  1  will then be allowed for linear, multipath, RAID0 and RAID1.  
      It is never allowed for RAID4, RAID5 or RAID6. This  number  can only 
      be changed using --grow for RAID1, RAID4, RAID5 and RAID6 arrays, and
      only on kernels which provide the necessary support.

-x, --spare-devices=
      Specify the number of spare (eXtra) devices in the initial array.  
      Spares can also be  added  and  removed  later. The  number  of component
      devices listed on the command line must equal the number of RAID devices 
      plus the number of spare devices.
Run Code Online (Sandbox Code Playgroud)