I/O 错误、dev sda、扇区 xxxxxxxxxx

use*_*067 13 hard-drive

标题

这周我的机器崩溃了几次。运行 smartmontools 测试并得到以下结果:

=== START OF INFORMATION SECTION ===
Model Family:     Fujitsu MJA BH
Device Model:     FUJITSU MJA2250BH G2
Serial Number:    K94PT972B7RS
LU WWN Device Id: 5 00000e 043bcbddd
Firmware Version: 8919
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3f
Local Time is:    Mon Feb 10 09:24:22 2014 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 118) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        (  783) seconds.
Offline data collection
capabilities:            (0x51) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 111) minutes.
SCT capabilities:          (0x003f) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   078   046    Pre-fail  Always       -       41112
  2 Throughput_Performance  0x0025   253   253   030    Pre-fail  Offline      -       33619968
  3 Spin_Up_Time            0x0023   100   100   025    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       4448
  5 Reallocated_Sector_Ct   0x0033   253   253   024    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   100   100   047    Pre-fail  Always       -       2140
  8 Seek_Time_Performance   0x0025   253   253   019    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       5655
 10 Spin_Retry_Count        0x0033   253   253   020    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0032   253   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4319
180 Unused_Rsvd_Blk_Cnt_Tot 0x002f   100   100   098    Pre-fail  Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   253   100   000    Old_age   Always       -       327680
184 End-to-End_Error        0x0033   253   253   097    Pre-fail  Always       -       0
185 Unknown_Attribute       0x0030   100   100   000    Old_age   Offline      -       2
186 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       1441792
187 Reported_Uncorrect      0x0032   100   026   000    Old_age   Always       -       281470684365183
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   253   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   050   045    Old_age   Always       -       33 (Min/Max 23/33)
191 G-Sense_Error_Rate      0x0032   253   098   000    Old_age   Always       -       16580617
192 Power-Off_Retract_Count 0x0032   096   096   000    Old_age   Always       -       71566404
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       35363
195 Hardware_ECC_Recovered  0x003a   253   253   000    Old_age   Always       -       20430
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   087   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 517 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 517 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 08      00:03:39.320  IDENTIFY DEVICE
  c8 00 80 80 28 97 ec 08      00:03:30.939  READ DMA
  c8 00 80 20 2a 97 ec 08      00:03:27.409  READ DMA
  c8 00 90 c0 5b e2 e5 08      00:03:27.394  READ DMA
  ca 00 98 00 9b 98 ec 08      00:03:27.393  WRITE DMA

Error 516 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 08      00:03:23.216  IDENTIFY DEVICE
  c8 00 40 40 28 97 ec 08      00:03:14.822  READ DMA
  ef 10 02 00 00 00 a0 08      00:03:14.821  SET FEATURES [Reserved for Serial ATA]
  ec 00 00 00 00 00 a0 08      00:03:14.819  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08      00:03:14.819  SET FEATURES [Set transfer mode]

Error 515 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 08      00:03:14.815  IDENTIFY DEVICE
  c8 00 40 40 28 97 ec 08      00:03:06.445  READ DMA
  c8 00 08 18 2a 97 ec 08      00:03:04.772  READ DMA
  ef 10 02 00 00 00 a0 08      00:03:04.772  SET FEATURES [Reserved for Serial ATA]
  ec 00 00 00 00 00 a0 08      00:03:04.770  IDENTIFY DEVICE

Error 514 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 03 1d 2a 97 ec  Error: UNC 3 sectors at LBA = 0x0c972a1d = 211233309

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 2a 97 ec 08      00:03:00.416  READ DMA
  ef 10 02 00 00 00 a0 08      00:03:00.415  SET FEATURES [Reserved for Serial ATA]
  ec 00 00 00 00 00 a0 08      00:03:00.413  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08      00:03:00.413  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 08      00:03:00.413  SET FEATURES [Reserved for Serial ATA]

Error 513 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 03 1d 2a 97 ec  Error: UNC 3 sectors at LBA = 0x0c972a1d = 211233309

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 2a 97 ec 08      00:02:56.010  READ DMA
  ea 00 00 00 00 00 a0 08      00:02:55.973  FLUSH CACHE EXT
  35 00 08 20 44 d6 e0 08      00:02:55.973  WRITE DMA EXT
  ea 00 00 00 00 00 a0 08      00:02:55.949  FLUSH CACHE EXT
  35 00 38 e8 43 d6 e0 08      00:02:55.949  WRITE DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%      5618         201724230
# 2  Short offline       Completed without error       00%      5617         -
# 3  Short offline       Completed without error       00%      5617         -
# 4  Extended offline    Completed without error       00%      5600         -
# 5  Short offline       Completed: read failure       90%      5595         239457889
# 6  Short offline       Completed: read failure       90%      5595         239457889
# 7  Short captive       Completed without error       00%      5305         -
# 8  Short captive       Completed without error       00%      5301         -
# 9  Short captive       Completed without error       00%      5301         -
#10  Short captive       Completed without error       00%      5301         -
#11  Short captive       Completed: read failure       90%      5301         214242167
#12  Extended offline    Completed: read failure       60%      4819         176075039
#13  Short offline       Completed without error       00%      4819         -
#14  Short offline       Aborted by host               90%       214         -
#15  Short offline       Aborted by host               90%       214         -
#16  Short offline       Completed without error       00%       214         -
#17  Short offline       Completed without error       00%       214         -
#18  Short offline       Completed without error       00%         4         -
#19  Short offline       Completed without error       00%         3         -
#20  Short offline       Completed without error       00%         2         -
#21  Short offline       Completed without error       00%         1         -
4 of 5 failed self-tests are outdated by newer successful extended offline self-test # 4

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Run Code Online (Sandbox Code Playgroud)

有人可以让我知道这是什么意思吗?我应该立即更换硬盘吗?

更新:正如landroni 建议的那样,我使用gsmartcontrol 进行了简短和扩展的自测。简短的自检运行没有抛出任何错误。由于错误,扩展测试在 40% 时中止。这是自测日志中的粘贴:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-51-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Fujitsu MJA BH
Device Model:     FUJITSU MJA2250BH G2
Serial Number:    K94PT972B7RS
LU WWN Device Id: 5 00000e 043bcbddd
Firmware Version: 8919
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3f
Local Time is:    Sun Feb 23 21:13:50 2014 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 118) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        (  783) seconds.
Offline data collection
capabilities:            (0x51) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 111) minutes.
SCT capabilities:          (0x003f) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   078   046    Pre-fail  Always       -       124861
  2 Throughput_Performance  0x0025   253   253   030    Pre-fail  Offline      -       33619968
  3 Spin_Up_Time            0x0023   100   100   025    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       4489
  5 Reallocated_Sector_Ct   0x0033   253   253   024    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   100   100   047    Pre-fail  Always       -       1157
  8 Seek_Time_Performance   0x0025   253   253   019    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       5693
 10 Spin_Retry_Count        0x0033   253   253   020    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0032   253   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4342
180 Unused_Rsvd_Blk_Cnt_Tot 0x002f   100   100   098    Pre-fail  Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   253   100   000    Old_age   Always       -       327680
184 End-to-End_Error        0x0033   253   253   097    Pre-fail  Always       -       0
185 Unknown_Attribute       0x0030   100   100   000    Old_age   Offline      -       2
186 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       1441792
187 Reported_Uncorrect      0x0032   100   026   000    Old_age   Always       -       281470684365183
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   059   050   045    Old_age   Always       -       41 (Min/Max 37/42)
191 G-Sense_Error_Rate      0x0032   253   098   000    Old_age   Always       -       16580617
192 Power-Off_Retract_Count 0x0032   096   096   000    Old_age   Always       -       71566404
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       35590
195 Hardware_ECC_Recovered  0x003a   253   253   000    Old_age   Always       -       68959
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   087   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 519 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 519 occurred at disk power-on lifetime: 5685 hours (236 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 03 10 00 00 00  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 01 01 00 00 00 ff      00:01:40.036  NOP [Abort queued commands]
  00 00 01 01 00 00 00 ff      00:01:30.023  NOP [Abort queued commands]
  00 00 01 01 00 00 00 ff      00:01:20.011  NOP [Abort queued commands]
  2f 00 01 10 00 00 a0 08      00:01:15.009  READ LOG EXT
  60 08 38 f0 68 47 40 08      00:01:08.725  READ FPDMA QUEUED

Error 518 occurred at disk power-on lifetime: 5685 hours (236 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 03 d8 5b e2 40  Error: UNC at LBA = 0x00e25bd8 = 14834648

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 38 f0 68 47 40 08      00:01:08.725  READ FPDMA QUEUED
  60 08 30 40 09 84 40 08      00:01:08.568  READ FPDMA QUEUED
  61 08 28 70 09 9d 40 08      00:01:08.243  WRITE FPDMA QUEUED
  61 a0 20 00 55 d6 40 08      00:01:07.961  WRITE FPDMA QUEUED
  61 08 18 68 09 9d 40 08      00:01:07.594  WRITE FPDMA QUEUED

Error 517 occurred at disk power-on lifetime: 5654 hours (235 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After comman

lan*_*oni 10

gsmartcontrol输入即可下载sudo apt-get install gsmartcontrol

使用gsmartcontrol

  • 运行一个short self-test;
  • 如果它完成且没有错误,则运行extended self-test.

如果这个也很好,那么可能没有理由恐慌。然而,如果测试检测到一些坏块,那么您可能需要使用ddrescue尽快进行备份,然后尝试了解您的硬盘驱动器出了什么问题。它可能会失败,或者可能只有少数不相关的坏扇区。

也可以看看:

更新:
鉴于似乎只有少数坏扇区存在,您可以尝试告诉 FS 应该避免使用哪些坏扇区fsck.ext3 -c。但是请man fsck.ext3在使用前阅读(假设这是您的 FS)。

看:


Eld*_*eek 7

我最近也遇到了类似的问题,smart 报告了 9 个坏块。我从实时媒体启动,然后修复了 ext4 文件系统,其中e2fsck -c /dev/SDxSDx 是有问题的驱动器(在我的例子中是 sda)。这导致了几次短读取,我忽略了这些短读取并强制重写,并找到并修复了具有多重声明块的 5 个 inode。

如果驱动器包含关键数据,您当然应该在执行其他操作之前使用正确的策略来备份数据。如果不像我的情况,请继续阅读。dmesg报告的坏扇区数量几乎是 SMART 发现的坏扇区数量的两倍,因此我e2fsck -cc /dev/SDx在 SDx 是有问题的驱动器的位置运行,以便执行非破坏性读/写测试。这显然是一个耗时的过程,但是,因为我的目标只是从用于所有意图和目的的“临时驱动器”中挤出几个小时,用于没有关键数据的实验,同时等待更换开车去送货,我觉得这可能是值得的。一小时后,TB 驱动器完成 15%,我不太确定,但由于距离更换还有 3 天,我坚持了下来。最后,所有坏扇区都被添加到坏块索引节点列表中,从而阻止它们被分配到文件或目录。


use*_*046 5

看起来你的磁盘坏了,我会尽快备份我的数据并更换故障磁盘。