描述
今天,我插入了另一个硬盘驱动器并拔掉了我的RAID 驱动器,以确保在擦除驱动器时不会意外选择错误的驱动器。
现在我已经重新插入我的驱动器,软件 raid 1 阵列不再被安装/识别/找到。使用磁盘实用程序,我可以看到驱动器是 /dev/sda 和 /dev/sdb,所以我尝试运行sudo mdadm -A /dev/sda /dev/sdb
不幸的是,我一直收到一条错误消息,指出mdadm: device /dev/sda exists but is not an md array
规格:
操作系统:Ubuntu 12.04 LTS 桌面(64 位)
驱动器:安装在第三个驱动器上的 2 个 3TB WD Red(相同型号全新)操作系统(64GB ssd)(许多 linux 安装)
主板: P55 FTW
处理器:Intel i7-870全规格
的结果 sudo mdadm --assemble --scan
mdadm: No arrays found in config file or automatically
当我从恢复模式启动时,我会收到无数个“ata1 错误”代码飞了很长时间。
谁能告诉我恢复阵列的正确步骤?
如果这是重建阵列的可能替代方案,我会很高兴恢复数据。我已经阅读了关于“测试磁盘”的内容,它在 wiki 上声明它可以找到 Linux RAID md 0.9/1.0/1.1/1.2 丢失的分区,但我似乎正在运行 mdadm 3.2.5 版。有没有其他人有使用它来恢复软件 raid 1 数据的经验?
的结果 sudo mdadm --examine /dev/sd* | grep -E "(^\/dev|UUID)"
mdadm: No md superblock detected on /dev/sda.
mdadm: No md superblock detected on /dev/sdb.
mdadm: No md superblock detected on /dev/sdc1.
mdadm: No md superblock detected on /dev/sdc3.
mdadm: No md superblock detected on /dev/sdc5.
mdadm: No md superblock detected on /dev/sdd1.
mdadm: No md superblock detected on /dev/sdd2.
mdadm: No md superblock detected on /dev/sde.
/dev/sdc:
/dev/sdc2:
/dev/sdd:
Run Code Online (Sandbox Code Playgroud)
mdadm.conf 的内容:
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
# This file was auto-generated on Tue, 08 Jan 2013 19:53:56 +0000
# by mkconf $Id$
Run Code Online (Sandbox Code Playgroud)
结果sudo fdisk -l
,你可以看到SDA和SDB丢失。
Disk /dev/sdc: 64.0 GB, 64023257088 bytes
255 heads, 63 sectors/track, 7783 cylinders, total 125045424 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009f38d
Device Boot Start End Blocks Id System
/dev/sdc1 * 2048 2000895 999424 82 Linux swap / Solaris
/dev/sdc2 2002942 60594175 29295617 5 Extended
/dev/sdc3 60594176 125044735 32225280 83 Linux
/dev/sdc5 2002944 60594175 29295616 83 Linux
Disk /dev/sdd: 60.0 GB, 60022480896 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x58c29606
Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 206847 102400 7 HPFS/NTFS/exFAT
/dev/sdd2 206848 234455039 117124096 7 HPFS/NTFS/exFAT
Disk /dev/sde: 60.0 GB, 60022480896 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sde doesn't contain a valid partition table
Run Code Online (Sandbox Code Playgroud)
dmesg 的输出 | grep ata很长,所以这里有一个链接:http : //pastebin.com/raw.php?i=H2dph66y
dmesg 的输出 | grep ata | head -n 200 将 bios 设置为 ahci 并且必须在没有这两个光盘的情况下启动。
[ 0.000000] BIOS-e820: 000000007f780000 - 000000007f78e000 (ACPI data)
[ 0.000000] Memory: 16408080k/18874368k available (6570k kernel code, 2106324k absent, 359964k reserved, 6634k data, 924k init)
[ 1.043555] libata version 3.00 loaded.
[ 1.381056] ata1: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4100 irq 47
[ 1.381059] ata2: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4180 irq 47
[ 1.381061] ata3: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4200 irq 47
[ 1.381063] ata4: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4280 irq 47
[ 1.381065] ata5: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4300 irq 47
[ 1.381067] ata6: SATA max UDMA/133 abar m2048@0xfbff4000 port 0xfbff4380 irq 47
[ 1.381140] pata_acpi 0000:0b:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 1.381157] pata_acpi 0000:0b:00.0: setting latency timer to 64
[ 1.381167] pata_acpi 0000:0b:00.0: PCI INT A disabled
[ 1.429675] ata_link link4: hash matches
[ 1.699735] ata1: SATA link down (SStatus 0 SControl 300)
[ 2.018981] ata2: SATA link down (SStatus 0 SControl 300)
[ 2.338066] ata3: SATA link down (SStatus 0 SControl 300)
[ 2.657266] ata4: SATA link down (SStatus 0 SControl 300)
[ 2.976528] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.979582] ata5.00: ATAPI: HL-DT-ST DVDRAM GH22NS50, TN03, max UDMA/100
[ 2.983356] ata5.00: configured for UDMA/100
[ 3.319598] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 3.320252] ata6.00: ATA-9: SAMSUNG SSD 830 Series, CXM03B1Q, max UDMA/133
[ 3.320258] ata6.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 3.320803] ata6.00: configured for UDMA/133
[ 3.324863] Write protecting the kernel read-only data: 12288k
[ 3.374767] pata_marvell 0000:0b:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 3.374795] pata_marvell 0000:0b:00.0: setting latency timer to 64
[ 3.375759] scsi6 : pata_marvell
[ 3.376650] scsi7 : pata_marvell
[ 3.376704] ata7: PATA max UDMA/100 cmd 0xdc00 ctl 0xd880 bmdma 0xd400 irq 18
[ 3.376707] ata8: PATA max UDMA/133 cmd 0xd800 ctl 0xd480 bmdma 0xd408 irq 18
[ 3.387938] sata_sil24 0000:07:00.0: version 1.1
[ 3.387951] sata_sil24 0000:07:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 3.387974] sata_sil24 0000:07:00.0: Applying completion IRQ loss on PCI-X errata fix
[ 3.388621] scsi8 : sata_sil24
[ 3.388825] scsi9 : sata_sil24
[ 3.388887] scsi10 : sata_sil24
[ 3.388956] scsi11 : sata_sil24
[ 3.389001] ata9: SATA max UDMA/100 host m128@0xfbaffc00 port 0xfbaf0000 irq 19
[ 3.389004] ata10: SATA max UDMA/100 host m128@0xfbaffc00 port 0xfbaf2000 irq 19
[ 3.389007] ata11: SATA max UDMA/100 host m128@0xfbaffc00 port 0xfbaf4000 irq 19
[ 3.389010] ata12: SATA max UDMA/100 host m128@0xfbaffc00 port 0xfbaf6000 irq 19
[ 5.581907] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 5.618168] ata9.00: ATA-8: OCZ-REVODRIVE, 1.20, max UDMA/133
[ 5.618175] ata9.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 5.658070] ata9.00: configured for UDMA/100
[ 7.852250] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 7.891798] ata10.00: ATA-8: OCZ-REVODRIVE, 1.20, max UDMA/133
[ 7.891804] ata10.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 7.931675] ata10.00: configured for UDMA/100
[ 10.022799] ata11: SATA link down (SStatus 0 SControl 0)
[ 12.097658] ata12: SATA link down (SStatus 0 SControl 0)
[ 12.738446] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
Run Code Online (Sandbox Code Playgroud)
对驱动器的智能测试都恢复了“健康”状态,但是当机器处于 AHCI 模式时,我无法在插入驱动器的情况下启动机器(我不知道这是否重要,但这些是 3tb WD 红色)。我希望这意味着驱动器很好,因为它们很值得购买并且是全新的。磁盘实用程序显示如下所示的大量灰色“未知”:
从那以后,我删除了我的 RevoDrive,试图让事情变得更简单/更清晰。
据我所知,主板没有两个控制器。也许我从那以后移除的 Revodrive 是通过 pci 插入的令人困惑的事情?
有没有人对如何从驱动器恢复数据而不是重建阵列有任何建议?即逐步使用 testdisk 或其他一些数据恢复程序....
我试过把驱动器放在另一台机器上。我遇到了同样的问题,机器无法通过 bios 屏幕,但这个问题会不断重启。让机器启动的唯一方法是拔下驱动器。我也尝试使用不同的 sata 电缆,但没有任何帮助。我曾经设法让它发现驱动器,但再次 mdadm --examine 显示没有阻止。这是否表明我的磁盘本身是 #@@#$#@ 即使简短的智能测试表明它们“健康”?
看来驱动器确实无法挽救。我什至无法在磁盘实用程序中格式化卷。Gparted 不会看到放置分区表的驱动器。我什至无法发出安全擦除命令来完全重置驱动器。这绝对是我在发现我最初尝试的硬件突袭实际上是“假”突袭并且比软件突袭慢之后设置的软件突袭。
感谢您为帮助我所做的一切努力。我想“答案”是如果您以某种方式设法同时杀死两个驱动器,则您无能为力。
我重试了 SMART 测试(这次是在命令行而不是磁盘实用程序中)并且驱动器确实成功响应“没有错误”。但是,我无法格式化驱动器(使用磁盘实用程序)或让 Gparted 在该机器或其他机器上识别它们。我也无法在驱动器上运行 hdparm 安全擦除或 security-set-password 命令。也许我需要 dd /dev/null 整个驱动器?他们到底是如何对 SMART 做出反应,但两台计算机却无法对他们做任何事情?我现在正在两个驱动器上运行长时间的智能测试,并将在 255 分钟内发布结果(这是它所说的需要多长时间)。
我已经将处理器信息与其他技术规格(通过主板等)放在一起,结果证明它是一种前沙架构。
一个驱动器的扩展智能扫描的输出:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-36-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WMC1T1480750
LU WWN Device Id: 5 0014ee 058d18349
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 9
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Jan 27 18:21:48 2013 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (41040) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 196 176 021 Pre-fail Always - 5175
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 29
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 439
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 29
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 24
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 4
194 Temperature_Celsius 0x0022 121 113 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 437 -
# 2 Short offline Completed without error 00% 430 -
# 3 Extended offline Aborted by host 90% 430 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Run Code Online (Sandbox Code Playgroud)
它说完成没有错误。这是否意味着驱动器应该没问题还是只是测试能够完成?我是否应该开始一个新问题,因为我现在更关心的是重新使用驱动器而不是数据/raid 阵列......
好吧,今天我正在查看我的文件系统,看看在设置 centOS 之前是否有任何数据要保留。我注意到我的主文件夹中有一个名为 dmraid.sil 的文件夹。我猜这是从我最初用假突袭控制器设置突袭阵列的时候开始的?在使用 mdadm 创建“软件突袭”之前,我已确保移除该设备(很久以前就在此问题之前)。有什么办法让我在某处错过了一个技巧,这以某种方式在没有设备的情况下运行“假”突袭,这就是这个 dmraid.sil 文件夹的全部内容?如此迷茫。那里有像 sda.size sda_0.dat sda_0.offset 等文件。关于这个文件夹代表什么的任何建议都会有帮助。
原来驱动器被锁定了!我用 hdparm 命令很容易地解锁了它们。这可能是导致所有输入输出错误的原因。不幸的是我现在有这个问题:
我已经成功安装了 md 设备。是否可以拔下一个驱动器,将其格式化为普通驱动器并将数据复制到该驱动器?我已经在raid中获得了足够的“乐趣”,并且我认为我将使用rsync沿着自动备份路线走下去。在我做任何可能导致数据完整性问题的事情之前,我想问一下。
问题是驱动器在某个时候被“锁定”了。 这说明:
使用简单的 hdparm 命令解锁sudo hdparm --user-master u --security-unlock p /dev/sdb(c)
并重新启动后,我的 mdxxx 设备在 gparted 中可见。然后我就可以 sudo 将它挂载到一个文件夹中并查看我的所有数据!我不知道是什么导致驱动器“锁定”。我似乎也缺少 e2label。我不知道这是什么。也许有人可以提供更好的答案来解释: