我在 Debian 7.8 上设置了 LVM,内核为 3.2.65-1+deb7u1,运行OpenMediaVault
LV由4块磁盘组成
Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
Run Code Online (Sandbox Code Playgroud)
从昨晚开始,对 LV 上的共享的访问开始变慢,最后共享变得完全没有响应。
Syslog 重复显示以下消息
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: BMDMA stat 0x45
ata3.00: failed command: READ DMA
ata3.00: cmd c8/00:80:80:01:00/00:00:00:00:00/e0 tag 0 dma 65536 in
res 51/40:6f:85:01:00/00:00:4b:00:00/e0 Emask 0x9 (media error)
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3.01: configured for UDMA/133
ata3: EH complete
Run Code Online (Sandbox Code Playgroud)
Smartd 也报道了
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 23 to 22
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 22 to 21
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 193 Load_Cycle_Count changed from 21 to 20
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 689 Currently unreadable (pending) sectors (changed +688)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 197 Current_Pending_Sector changed from 200 to 198
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1416 Currently unreadable (pending) sectors (changed +727)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], SMART Usage Attribute: 197 Current_Pending_Sector changed from 198 to 195
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors (changed +49)
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], 1465 Currently unreadable (pending) sectors
Device: /dev/disk/by-id/wwn-0x50014ee2af284bdd [SAT], ATA error count increased from 0 to 84
Run Code Online (Sandbox Code Playgroud)
我已经查出/dev/sde问题磁盘,但我无法再让 LVM 运行,因为它挂起。
我应该有足够的可用空间sdb,sdc并sdd删除sde任何命令,例如pvmove尝试读取时挂起sde。
有什么建议或者是我的音量吐司吗?
谢谢!
# pvs
PV VG Fmt Attr PSize PFree
/dev/sdb storage lvm2 a-- 3.64t 0
/dev/sdc storage lvm2 a-- 1.82t 0
/dev/sdd storage lvm2 a-- 1.82t 0
/dev/sde storage lvm2 a-- 1.36t 0
# vgs
VG #PV #LV #SN Attr VSize VFree
storage 4 1 0 wz--n- 8.64t 0
# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
storage storage -wi----- 8.64t
Run Code Online (Sandbox Code Playgroud)
因此,经过一周的 ddrescue 和一天左右的 e2fsck 后,我已经恢复了一些内容。看起来大部分数据都在那里并且没有损坏,尽管其中很大一部分仍然在丢失+发现中,但它是可读的。
这是我是如何做到的。
重要提示:我的系统磁盘不是 LVM 的一部分。如果您的系统磁盘出现故障,要执行此操作可能需要从实时 CD/USB 驱动器启动
启动系统
在尝试启动 LVM 时,我的系统无法启动并挂起。为了解决这个问题,我拔掉了有问题的磁盘sde,然后启动机器并等待,直到我能够登录。然后我sde重新插上电源并运行,
echo '0 0 0' > /sys/class/scsi_host/host3/scan
之后sde被接起。(host3 是打开的端口sde,可能与您的磁盘不同)
安装 ddrescude (对于 debian )
apt-get install gddrescue
Run Code Online (Sandbox Code Playgroud)
使用 ddrescue 克隆死亡磁盘 (第一遍,跳过错误以快速读取尽可能多的好数据。需要很长时间,具体取决于错误和磁盘大小)
ddrescue -f -n /dev/sde /dev/sdf /root/sde.rescue.log
GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued: 644394 MB, errsize: 372 kB, current rate: 4390 kB/s
rescued: 1500 GB, errsize: 22036 kB, current rate: 66 B/s
ipos: 200704 B, errors: 77, average rate: 4942 kB/s
opos: 200704 B, time since last successful read: 0 s
Finished
Run Code Online (Sandbox Code Playgroud)
尝试另一遍 (跳过我们已经复制的数据,在放弃之前重试 3 次。对我来说,这比第一遍花费的时间还要长)
ddrescue -d -f -r3 /dev/sde /dev/sdf /root/sde.rescue.log
GNU ddrescue 1.16
Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued: 1500 GB, errsize: 22036 kB, errors: 77
Current status
rescued: 1500 GB, errsize: 12014 kB, current rate: 512 B/s
ipos: 199680 B, errors: 972, average rate: 768 B/s
opos: 199680 B, time since last successful read: 0 s
Splitting failed blocks...
Run Code Online (Sandbox Code Playgroud)
然后我关闭机器并移除sde,然后将其插入sdf同一SATA端口,该端口sde已打开并启动了备份。
启动时 LVM 出现,但在尝试查看文件时出现很多错误。
修复文件系统 (对所有问题回答“是”,详细并强制检查文件系统)
e2fsck -y -v -f /dev/mapper/storage-storage
Run Code Online (Sandbox Code Playgroud)
然后我就可以挂载文件系统并开始查看损坏情况。如前所述,大量数据最终出现在失物招领中。到目前为止,它唯一丢失的文件夹名称。检查文件夹的内容,我可以将其全部所属的位置拼凑在一起
参考: