硬盘性能极度下降

Tae*_*ias 25 performance hard-drive hardware-failure

基本硬件信息:
\n有问题的硬盘是 Seagate BarraCuda 4TB(型号:ST4000DM004)。hdparm -I有关更多详细信息,请参阅最后附录中的输出。

\n

问题描述及测试:
\n问题从表面上看就像是缓存了要写入磁盘的数据,而写入速度却比这慢。然而,在这起案件中,事情似乎并没有那么简单。

\n

Copying files (on an NTFS file system):
\nWhen writing a reasonably large amount of data, the performance of the drive will drop suddenly and sharply. Again, usually this would be as simple as caching files in RAM, then the disk working afterward for a while. Here, however, when monitoring the /proc/meminfo file (under Ubuntu), the observed behavior does not seem to support this. Even after writing the data (either large files or several smaller ones) and calling sync, the amount of \xe2\x80\x9cdirty\xe2\x80\x9d memory will continue to decrease for a while, then grind to a near-complete halt. It will keep decreasing very slowly, until sometimes it eventually speeds up. This can repeat, depending on the amount of data left. Reading the device is also extremely sluggish when the writing speed decreases, and will remain so for a while even after sync completes if it does so in \xe2\x80\x9cslow mode\xe2\x80\x9d.

\n

These initial tests were performed both from Ubuntu 21.10 and Windows 10, with similar behavior.

\n

Additional remark for Windows:
\nWhen the disk stayed slow after completing the copy operation, and I tried reading files from the disk (e.g. playing a video, which kept lagging), Resource Monitor and Task Manager both showed a high percentage of disk usage on the device (100% or close to it) while the actual speed shown was <1 MB/s. (The OS also managed to freeze altogether at a point, but that may or may not be strictly related.)

\n

Disk benchmarks:
\nTo see if this is due to the file system or the hardware itself, I performed benchmarks on the device using the gnome-disks utility. The result of one such benchmark that I will show here exemplifies what I described above, the read and write speeds sharply dropping to almost nonexistence after a point, then recovering later (blue and red are respectively read and write speeds at each individual sample taken at locations going from the outside of the disk toward the inside, 1000 in total; the green dots and lines correspond to the access time benchmark which is separate from the others):

\n

读/写基准

\n

Note that, by my understanding, the benchmarking tool eliminates factors such as write caching. Additionally, /proc/meminfo showed little to no data waiting to be written being held in cache during the slow periods in any case; the complete content of it can be seen among the appendices.

\n

With the writes disabled in the benchmark, no such phenomenon presents itself, though there seems to be an anomalous sudden decrease in speed in the inner sections of the disk:

\n

只读基准

\n

(The location of the decrease is not dependent on time spent, but rather indeed the physical location on the disk, as indicated by other benchmarks with a different sample number where the cutoff happens at the same spot.)

\n

Equivalent benchmarks on other, presumably healthy hard disks in the system yield the expected, regular results like this:

\n

健康磁盘上的读/写基准测试

\n

Conclusion / Question:
\nFrom this I gather that the issue is likely caused by some hardware or firmware failure, but there may be any number of things I have overlooked.

\n

造成当前现象的可能原因是什么?我应该采取哪些后续步骤来进一步诊断问题?任何帮助是极大的赞赏。

\n

附录:
\n详细的硬件信息(由 输出hdparm -I):

\n
/dev/sdb:\n\nATA device, with non-removable media\n        Model Number:       ST4000DM004-2CV104\n        Serial Number:      ZFN3J8RH\n        Firmware Revision:  0001\n        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0\nStandards:\n        Used: unknown (minor revision code 0x006d)\n        Supported: 10 9 8 7 6 5\n        Likely used: 10\nConfiguration:\n        Logical         max     current\n        cylinders       16383   16383\n        heads           16      16\n        sectors/track   63      63\n        --\n        CHS current addressable sectors:    16514064\n        LBA    user addressable sectors:   268435455\n        LBA48  user addressable sectors:  7814037168\n        Logical  Sector size:                   512 bytes\n        Physical Sector size:                  4096 bytes\n        Logical Sector-0 offset:                  0 bytes\n        device size with M = 1024*1024:     3815447 MBytes\n        device size with M = 1000*1000:     4000787 MBytes (4000 GB)\n        cache/buffer size  = unknown\n        Form Factor: 3.5 inch\n        Nominal Media Rotation Rate: 5425\nCapabilities:\n        LBA, IORDY(can be disabled)\n        Queue depth: 32\n        Standby timer values: spec\'d by Standard, no device specific minimum\n        R/W multiple sector transfer: Max = 16  Current = 16\n        Recommended acoustic management value: 208, current value: 208\n        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6\n             Cycle time: min=120ns recommended=120ns\n        PIO: pio0 pio1 pio2 pio3 pio4\n             Cycle time: no flow control=120ns  IORDY flow control=120ns\nCommands/features:\n        Enabled Supported:\n           *    SMART feature set\n                Security Mode feature set\n           *    Power Management feature set\n           *    Write cache\n           *    Look-ahead\n           *    Host Protected Area feature set\n           *    WRITE_BUFFER command\n           *    READ_BUFFER command\n           *    DOWNLOAD_MICROCODE\n                Power-Up In Standby feature set\n           *    SET_FEATURES required to spinup after power up\n                SET_MAX security extension\n           *    48-bit Address feature set\n           *    Mandatory FLUSH_CACHE\n           *    FLUSH_CACHE_EXT\n           *    SMART error logging\n           *    SMART self-test\n           *    General Purpose Logging feature set\n           *    WRITE_{DMA|MULTIPLE}_FUA_EXT\n           *    64-bit World wide name\n                Write-Read-Verify feature set\n           *    WRITE_UNCORRECTABLE_EXT command\n           *    {READ,WRITE}_DMA_EXT_GPL commands\n           *    Segmented DOWNLOAD_MICROCODE\n           *    unknown 119[6]\n           *    unknown 119[7]\n           *    Gen1 signaling speed (1.5Gb/s)\n           *    Gen2 signaling speed (3.0Gb/s)\n           *    Gen3 signaling speed (6.0Gb/s)\n           *    Native Command Queueing (NCQ)\n           *    Host-initiated interface power management\n           *    Phy event counters\n           *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT\n           *    DMA Setup Auto-Activate optimization\n                Device-initiated interface power management\n           *    Software settings preservation\n                unknown 78[7]\n           *    SMART Command Transport (SCT) feature set\n           *    SCT Write Same (AC2)\n           *    SCT Data Tables (AC5)\n                unknown 206[7]\n                unknown 206[12] (vendor specific)\n                unknown 206[13] (vendor specific)\n           *    DOWNLOAD MICROCODE DMA command\nSecurity:\n        Master password revision code = 65534\n                supported\n        not     enabled\n        not     locked\n                frozen\n        not     expired: security count\n                supported: enhanced erase\n        490min for SECURITY ERASE UNIT. 490min for ENHANCED SECURITY ERASE UNIT.\nLogical Unit WWN Device Identifier: 5000c500c6a79fae\n        NAA             : 5\n        IEEE OUI        : 000c50\n        Unique ID       : 0c6a79fae\nChecksum: correct\n
Run Code Online (Sandbox Code Playgroud)\n

/proc/meminfo在第一次基准测试期间,当读/写速度很慢时:

\n
MemTotal:       16323712 kB\nMemFree:         9894056 kB\nMemAvailable:   12815716 kB\nBuffers:          138380 kB\nCached:          3038420 kB\nSwapCached:            0 kB\nActive:          1533040 kB\nInactive:        4396560 kB\nActive(anon):       2960 kB\nInactive(anon):  2817480 kB\nActive(file):    1530080 kB\nInactive(file):  1579080 kB\nUnevictable:          32 kB\nMlocked:              32 kB\nSwapTotal:      17577980 kB\nSwapFree:       17577980 kB\nDirty:               176 kB\nWriteback:             0 kB\nAnonPages:       2752844 kB\nMapped:           694816 kB\nShmem:             73200 kB\nKReclaimable:     137092 kB\nSlab:             260112 kB\nSReclaimable:     137092 kB\nSUnreclaim:       123020 kB\nKernelStack:       13872 kB\nPageTables:        33292 kB\nNFS_Unstable:          0 kB\nBounce:                0 kB\nWritebackTmp:          0 kB\nCommitLimit:    25739836 kB\nCommitted_AS:    9749696 kB\nVmallocTotal:   34359738367 kB\nVmallocUsed:       76616 kB\nVmallocChunk:          0 kB\nPercpu:             8128 kB\nHardwareCorrupted:     0 kB\nAnonHugePages:         0 kB\nShmemHugePages:        0 kB\nShmemPmdMapped:        0 kB\nFileHugePages:         0 kB\nFilePmdMapped:         0 kB\nHugePages_Total:       0\nHugePages_Free:        0\nHugePages_Rsvd:        0\nHugePages_Surp:        0\nHugepagesize:       2048 kB\nHugetlb:               0 kB\nDirectMap4k:      512904 kB\nDirectMap2M:     7813120 kB\nDirectMap1G:     8388608 kB\n
Run Code Online (Sandbox Code Playgroud)\n

Eug*_*eck 45

希捷ST4000DM004使用SMR将数据写入磁盘表面。这意味着,为了写入单个字节,可能必须重写多个千兆字节

在“正常使用模式”(由 HDD 供应商而不是用户指定!)中,这不会产生太大问题 - 数据被写入磁盘外缘的CMR缓存。稍后,当磁盘使用量下降时,固件会将日期移动到 SMR 带中的最终位置。

当一次写入大量数据时,该 CMR 缓存会耗尽,并且必须由 I/O 到 SMR 带的过程来接管 - 这会慢几个数量级。

注意:这不是 RAM 缓存 - 它是磁盘表面的一小部分,以 CMR 写入(即没有重叠磁道),以使用户不那么容易看到 SMR 的恐怖。

  • @TooTea 不,SMR 绝对不适合 RAID,并且没有固件或其他技巧可以使其工作。 (7认同)
  • @manassehkatz-Moving2Codidact 哦,我使用 SSD 来存储系统文件和经常使用的/IO 密集型程序,您不用担心这一点。 :) 硬盘空间仍然便宜很多,但用于备份/存档目的,或者用于其他很少访问的文件,这些文件通常不需要能够读取,尤其是快速写入。 (5认同)
  • @TooTea 我认为这种观点不公平地边缘化了许多用例。 HDD RAID 存储在许多地方仍然很常见,并且非常适合 SMR 磁盘非常不适合的中等规模文件和数据库应用程序。不是每个人都需要 SSD 速度,但没有人能接受 SMR 速度。 (5认同)
  • 是的,欢迎来到科技领域的另一个黑暗角落。叠瓦驱动器已经在雷达下悄悄进入供应链,只有在人们安装它们之后,他们才会发现它们有多么糟糕。 WD 遭到[集体诉讼](https://arstechnica.com/gadgets/2020/05/western-digital-gets-sued-for-sneaking-smr-disks-into-its-nas-channel/) 的猛烈攻击对此很满意,但我很想建议OP将驱动器作为有效缺陷退回并购买真正有效的东西。 SMR 对于归档很有用,除此之外基本上没有什么用处。 (4认同)
  • @J...是的,我在这样一个系统上工作(很难找到不受 NDA 阻碍的公共评论,这就是为什么关于它的讨论如此稀疏)。这与“没有固件或其他技巧可以使其工作”不同,这就是我为了完整性而添加它的原因。 (3认同)
  • 伙计,HDD 公司真的希望我们转向 SSD…… (2认同)
  • @J...SMR 在[数据中心环境](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44830.pdf) 中得到了很好的理解,包括以下情况需要冗余。除了数据中心环境之外,只要覆盖的文件系统具有 SMR 友好的访问模式,期望相同的 SMR 驱动器能够完美地处理覆盖在其 SMR 区域之上的 RAID 式冗余并不是没有道理的。 (2认同)

plu*_*ash 5

硬盘驱动器将数据写入磁道上的扇区中,但是磁道在不相互干扰的情况下可以放置在多近的位置是有限的。

硬盘驱动器供应商意识到,如果他们放弃传统的随机写入访问模型并按顺序写入大面积数据,则可以减轻相邻磁道相互干扰的问题。每首曲目都会与上一首曲目略有重叠。这意味着每个盘片可以容纳更多数据,这意味着更高的容量和/或更低的成本。这被称为“木瓦磁记录”(SMR),类似于屋顶木瓦重叠的方式。

当然,需要对操作系统进行重大更改的硬盘销量不会很好。因此,他们添加了转换固件和CMR缓存区域,这样 SMR 驱动器对于操作系统来说就像是常规驱动器。这与 SSD 供应商已经做的事情并没有太大不同。

不同之处在于,虽然闪存速度很快,所以即使有转换层,SSD 仍然比 HDD 快得多。另一方面,当 CMR 缓存区域耗尽并且驱动器必须在重写瓦片的缓慢过程中阻止新的写入操作时,SMR HDD 的性能就会急剧下降。

不幸的是,剩下的三个 HDD 供应商决定他们发布这项技术的方式是在不告诉人们的情况下将其纳入产品阵容。因此,人们并没有能够有意识地选择是否接受性能悬崖以换取稍低的单位存储成本,而是在不知不觉中收到了这些驱动器。在媒体的压力下,他们最终确实公布了哪些驱动器型号是SMR的信息,但对客户来说仍然不明显。

由于是三大硬盘厂商干的,所以你不能只是抵制罪魁祸首,所以看来唯一的选择就是从现在开始仔细检查你购买的每一个硬盘。

奇怪的是,尽管 SMR 背后的最初动机是容量,但似乎最大的驱动器通常仍然是 CMR,其中 SMR 主要出现在低个位数 TB 的驱动器上。

  • 老实说,我不太明白为什么对 SMR 的_线性_写入必须很慢。看来 CMR 缓存对于随机访问更为重要(其中相同的区域必须重写,而 SMR 无法做到)——但是两者中的大型线性写入都是以相同的方式完成的。 (2认同)