我的备份 NAS(基于 Arch)报告池性能降级。它还将降级磁盘报告为“正在修复”。我对此感到困惑。假设有缺陷比退化更糟糕,我应该担心吗?
zpool状态-v:
pool: zdata
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub in progress since Mon Dec 16 11:35:37 2019
1.80T scanned at 438M/s, 996G issued at 73.7M/s, 2.22T total
1.21M repaired, 43.86% done, 0 days 04:55:13 to go
config:
NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
wwn-0x50014ee0019b83a6-part1 ONLINE 0 0 0
wwn-0x50014ee057084591-part1 ONLINE 0 0 0
wwn-0x50014ee0ac59cb99-part1 DEGRADED 224 0 454 too many errors (repairing)
wwn-0x50014ee2b3f6d328-part1 ONLINE 0 0 0
logs
wwn-0x50000f0056424431-part5 ONLINE 0 0 0
cache
wwn-0x50000f0056424431-part4 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
zdata/backup:<0x86697>
Run Code Online (Sandbox Code Playgroud)
此外,报告的故障磁盘要小得多:zpool iostat -v:
capacity operations bandwidth
pool alloc free read write read write
------------------------------ ----- ----- ----- ----- ----- -----
zdata 2.22T 1.41T 33 34 31.3M 78.9K
wwn-0x50014ee0019b83a6-part1 711G 217G 11 8 10.8M 18.0K
wwn-0x50014ee057084591-part1 711G 217G 10 11 9.73M 24.6K
wwn-0x50014ee0ac59cb99-part1 103G 825G 0 10 0 29.1K
wwn-0x50014ee2b3f6d328-part1 744G 184G 11 2 10.7M 4.49K
logs - - - - - -
wwn-0x50000f0056424431-part5 4K 112M 0 0 0 0
cache - - - - - -
wwn-0x50000f0056424431-part4 94.9M 30.9G 0 1 0 128K
------------------------------ ----- ----- ----- ----- ----- -----
Run Code Online (Sandbox Code Playgroud)
[编辑] 由于硬盘不断报告错误,我决定用备用硬盘替换它。首先,我为新磁盘发出了添加备用命令,该磁盘包含在池中,然后我发出了替换命令,用备用磁盘替换降级的磁盘。它可能不会改善事情,因为池现在显示:
pool: zdata
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 22 10:20:20 2019
36.5G scanned at 33.2M/s, 27.4G issued at 24.9M/s, 2.21T total
0B resilvered, 1.21% done, 1 days 01:35:59 to go
config:
NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
wwn-0x50014ee0019b83a6-part1 ONLINE 0 0 0
wwn-0x50014ee057084591-part1 ONLINE 0 0 0
spare-2 DEGRADED 0 0 0
wwn-0x50014ee0ac59cb99-part1 DEGRADED 0 0 0 too many errors
wwn-0x50014ee25ea101ef ONLINE 0 0 0
wwn-0x50014ee2b3f6d328-part1 ONLINE 0 0 0
logs
wwn-0x50000f0056424431-part5 ONLINE 0 0 0
cache
wwn-0x50000f0056424431-part4 ONLINE 0 0 0
spares
wwn-0x50014ee25ea101ef INUSE currently in use
errors: No known data errors
Run Code Online (Sandbox Code Playgroud)
让我担心的是“出发”日期不断增加(!)。在我写这篇文章时,它现在显示为 1 天 05:40:10。我假设当另一个磁盘、控制器或电源出现故障时,池将永远丢失。
[编辑] 新驱动器在 4 小时左右后重新同步。ZFS的估计显然不太正确。卸下故障驱动器后,我现在遇到的情况是,新驱动器显示 1TB 磁盘仅使用了 103G。就像降级驱动器一样。我如何才能达到完整的 1TB?
一般来说,降级磁盘的状态比故障磁盘的状态要好。
来自zpool 手册页(稍微重新格式化):
降级:校验和错误的数量超过可接受的水平,设备降级,表明可能出现问题。ZFS 根据需要继续使用该设备
FAILED: I/O 错误数量超出可接受的水平,设备出现故障,无法进一步使用该设备
在您的具体情况下,scrub
在一个磁盘上发现许多读取和校验和错误,ZFS 开始修复受影响的磁盘。与此同时,ZED(ZFS 事件守护进程)注意到校验和错误的爆发并降低了磁盘的性能以避免使用/对其施加压力。
擦洗结束后,我建议您到zpool clear
游泳池再进行一次 zfs scrub
。如果第二次清理没有发现错误,您可以继续使用该池,但是考虑到当前清理中出现了多少错误,我会尽快更换磁盘。
如果您有充分的理由相信磁盘本身没有故障,则应该分析dmesg
并smartctl --all
输出以找出根本错误原因。举个例子:我有一个磁盘本身很好,但由于电源/电缆的噪音而产生了许多实际错误。
无论如何,黄金法则始终适用:请务必对池数据进行最新备份。
归档时间: |
|
查看次数: |
14257 次 |
最近记录: |