故障磁盘的 LVM 恢复

CJS*_*ell 3 lvm

我在 Debian 7.8 上设置了 LVM,内核为 3.2.65-1+deb7u1,运行OpenMediaVault

LV由4块磁盘组成

Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
Run Code Online (Sandbox Code Playgroud)

从昨晚开始,对 LV 上的共享的访问开始变慢,最后共享变得完全没有响应。

Syslog 重复显示以下消息

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: BMDMA stat 0x45
ata3.00: failed command: READ DMA
ata3.00: cmd c8/00:80:80:01:00/00:00:00:00:00/e0 tag 0 dma 65536 in
         res 51/40:6f:85:01:00/00:00:4b:00:00/e0 Emask 0x9 (media error)
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3.01: configured for UDMA/133
ata3: EH complete 
Run Code Online (Sandbox Code Playgroud)

Smartd 也报道了

Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 23 to 22
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 22 to 21
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 21 to 20
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 689 Currently unreadable (pending) sectors (changed +688)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 197 Current_Pending_Sector changed from 200 to 198
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1416 Currently unreadable (pending) sectors (changed +727)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 197 Current_Pending_Sector changed from 198 to 195
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors (changed +49)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], ATA error count increased from 0 to 84
Run Code Online (Sandbox Code Playgroud)

我已经查出/dev/sde问题磁盘,但我无法再让 LVM 运行,因为它挂起。

我应该有足够的可用空间sdbsdcsdd删除sde任何命令,例如pvmove尝试读取时挂起sde

有什么建议或者是我的音量吐司吗?

谢谢!

# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb   storage lvm2 a--  3.64t    0
  /dev/sdc   storage lvm2 a--  1.82t    0
  /dev/sdd   storage lvm2 a--  1.82t    0
  /dev/sde   storage lvm2 a--  1.36t    0

# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  storage   4   1   0 wz--n- 8.64t    0

# lvs
  LV      VG      Attr     LSize Pool Origin Data%  Move Log Copy%  Convert
  storage storage -wi----- 8.64t
Run Code Online (Sandbox Code Playgroud)

CJS*_*ell 6

因此,经过一周的 ddrescue 和一天左右的 e2fsck 后,我已经恢复了一些内容。看起来大部分数据都在那里并且没有损坏,尽管其中很大一部分仍然在丢失+发现中,但它是可读的。

这是我是如何做到的。
重要提示:我的系统磁盘不是 LVM 的一部分。如果您的系统磁盘出现故障,要执行此操作可能需要从实时 CD/USB 驱动器启动

启动系统
在尝试启动 LVM 时,我的系统无法启动并挂起。为了解决这个问题,我拔掉了有问题的磁盘sde,然后启动机器并等待,直到我能够登录。然后我sde重新插上电源并运行,
echo '0 0 0' > /sys/class/scsi_host/host3/scan 之后sde被接起。(host3 是打开的端口sde,可能与您的磁盘不同)

安装 ddrescude (对于 debian )

apt-get install gddrescue
Run Code Online (Sandbox Code Playgroud)

使用 ddrescue 克隆死亡磁盘 (第一遍,跳过错误以快速读取尽可能多的好数据。需要很长时间,具体取决于错误和磁盘大小)

ddrescue -f -n /dev/sde /dev/sdf /root/sde.rescue.log


GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued:   644394 MB,  errsize:    372 kB,  current rate:    4390 kB/s
rescued:     1500 GB,  errsize:  22036 kB,  current rate:       66 B/s
   ipos:    200704 B,   errors:      77,    average rate:    4942 kB/s
   opos:    200704 B,     time since last successful read:       0 s
Finished
Run Code Online (Sandbox Code Playgroud)

尝试另一遍 (跳过我们已经复制的数据,在放弃之前重试 3 次。对我来说,这比第一遍花费的时间还要长)

ddrescue -d -f -r3 /dev/sde /dev/sdf /root/sde.rescue.log


GNU ddrescue 1.16
Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:     1500 GB,  errsize:  22036 kB,  errors:      77
Current status
rescued:     1500 GB,  errsize:  12014 kB,  current rate:      512 B/s
   ipos:    199680 B,   errors:     972,    average rate:      768 B/s
   opos:    199680 B,     time since last successful read:       0 s
Splitting failed blocks...
Run Code Online (Sandbox Code Playgroud)

然后我关闭机器并移除sde,然后将其插入sdf同一SATA端口,该端口sde已打开并启动了备份。
启动时 LVM 出现,但在尝试查看文件时出现很多错误。

修复文件系统 (对所有问题回答“是”,详细并强制检查文件系统)

e2fsck -y -v -f /dev/mapper/storage-storage
Run Code Online (Sandbox Code Playgroud)

然后我就可以挂载文件系统并开始查看损坏情况。如前所述,大量数据最终出现在失物招领中。到目前为止,它唯一丢失的文件夹名称。检查文件夹的内容,我可以将其全部所属的位置拼凑在一起

参考: