罕见的输入/输出错误 - Linux 服务器

R. *_*tzi 3 ubuntu hard-drive dell-poweredge dell-perc

偶尔,我们会在其中一个磁盘上遇到输入/输出错误。

我们的服务器(DELL PowerEdge R720,Ubuntu 14.04)使用Perc H710 Raid 控制器,产生错误的磁盘是Dell 600GB SAS 6Gbps 15k 3.5"磁盘。

我们总是可以使用 修复错误,fsck.ext4但我们不知道是什么原因导致它们发生。

我们已将服务器固件更新到最新版本,并运行了我们能想到的所有测试。

我们还能做些什么来找到问题的根源?

编辑:

大约一周前我们联系了戴尔,在他们指导我如何运行几个测试后,他们得出结论,服务器很好,测试中没有出现任何异常。

我无法为设备启用 SMART 支持:

$ sudo smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-55-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               DELL
Product:              PERC H710
Revision:             3.13
User Capacity:        1,199,101,181,952 bytes [1.19 TB]
Logical block size:   512 bytes
Logical Unit id:      0x6b8ca3a0f210dc0019eead8c1111fb0a
Serial number:        000afb11118cadee1900dc10f2a0a38c
Device type:          disk
Local Time is:        Wed Jul  8 10:47:35 2015 IDT
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported

Device does not support Self Test logging
Run Code Online (Sandbox Code Playgroud)

我试过:

$ sudo smartctl -s on /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-55-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
unable to fetch IEC (SMART) mode page [unsupported field in scsi command]
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
Run Code Online (Sandbox Code Playgroud)

另外,我不知道该怎么做(谷歌搜索没有帮助):

$ sudo hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
    Likely used: 1
Configuration:
    Logical     max current
    cylinders   0   0
    heads       0   0
    sectors/track   0   0
    --
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:           0 MBytes
    device size with M = 1000*1000:           0 MBytes 
    cache/buffer size  = unknown
Capabilities:
    IORDY not likely
    Cannot perform double-word IO
    R/W multiple sector transfer: not supported
    DMA: not supported
    PIO: pio0 
Run Code Online (Sandbox Code Playgroud)

任何建议是最受欢迎的!

Dan*_*com 8

您的 RAID 中有一个驱动器行为异常,并且偶尔会产生错误?听起来像是硬件问题,而且可能会变得更糟。您应该考虑更换驱动器。是的,它很贵,但是您的时间值多少钱,如果整个驱动器在不合时宜的时刻向南行驶,情况会有多糟糕?