Phi*_*bin 48 linux sata hotswap scsi linux-kernel
热交换出故障的 SATA /dev/sda 驱动器工作正常,但是当我去交换一个新驱动器时,它没有被识别:
[root@fs-2 ~]# tail -18 /var/log/messages
May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake }
May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset
May 5 16:54:45 fs-2 kernel: ata1: soft resetting link
May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:54:55 fs-2 kernel: ata1: soft resetting link
May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:05 fs-2 kernel: ata1: soft resetting link
May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
May 5 16:55:40 fs-2 kernel: ata1: soft resetting link
May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up
May 5 16:55:45 fs-2 kernel: ata1: EH complete
Run Code Online (Sandbox Code Playgroud)
我尝试了一些方法让服务器找到新的 /dev/sda,例如rescan-scsi-bus.sh,但它们不起作用:
[root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan
-bash: echo: write error: Invalid argument
[root@fs-2 ~]#
[root@fs-2 ~]# /root/rescan-scsi-bus.sh -l
[snip]
0 new device(s) found.
0 device(s) removed.
[root@fs-2 ~]#
[root@fs-2 ~]# ls /dev/sda
ls: /dev/sda: No such file or directory
Run Code Online (Sandbox Code Playgroud)
我最终重新启动了服务器。/dev/sda 被识别,我修复了软件 RAID,现在一切正常。但是下一次,我怎样才能让 Linux 在不重新启动的情况下识别我热插拔的新 SATA 驱动器?
有问题的操作系统是 RHEL5.3:
[root@fs-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Run Code Online (Sandbox Code Playgroud)
硬盘是 Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB,型号 ST3500320NS。
这是 lscpi 输出:
[root@fs-2 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
Run Code Online (Sandbox Code Playgroud)
更新:在大约十几种情况下,我们被迫重新启动服务器,因为热插拔并没有“正常工作”。感谢您提供更多有关 SATA 控制器的答案。我已经包含了上面有问题的系统的 lspci 输出(主机名:fs-2)。我仍然可以使用一些帮助来理解在该系统的热插拔方面,硬件方面究竟不支持什么。请让我知道除了 lspci 之外还有哪些其他输出可能有用。
好消息是热插拔今天在我们的一台服务器(主机名:www-1)上“正常工作”,这对我们来说非常罕见。这是 lspci 输出:
[root@www-1 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)
Run Code Online (Sandbox Code Playgroud)
小智 52
如果您的 SATA 控制器支持热插拔,它应该“可以正常工作(tm)”。
要强制重新扫描 SCSI 总线(每个 SATA 端口显示为 SCSI 总线)并查找新驱动器,您将使用:
echo "0 0 0" >/sys/class/scsi_host/host<n>/scan
Run Code Online (Sandbox Code Playgroud)
上面,<n>是BUS号。
小智 22
echo "- - -" >/sys/class/scsi_host/host<n>/scan
^ ^
\_\_______ note spaces between the dashes.
Run Code Online (Sandbox Code Playgroud)
小智 15
在某些情况下,当驱动器发生故障时,Linux 不会意识到您实际上已将其从阵列中物理拉出。如果您遇到这个问题(就像我今天早上所做的那样),您可以执行以下操作:
echo 1 > /sys/block/<devnode>/device/delete
Run Code Online (Sandbox Code Playgroud)
例如,就我而言, /dev/sda 失败了,我不想重新启动服务器,所以我做了:
echo 1 > /sys/block/sda/device/delete
Run Code Online (Sandbox Code Playgroud)
在我这样做之后,新驱动器(实际上已经物理添加)立即可见。
如果此时不可见,您也可以执行此操作以强制重新扫描:
echo "- - -" > /sys/class/scsi_host/host<n>/scan
Run Code Online (Sandbox Code Playgroud)
“- - -”分别是通道、id 和 LUN 的通配符,因此您可以根据需要通过指定数字来将扫描限制为某个子集。
在开始之前,您还可以:
readlink /sys/block/<devnode>
Run Code Online (Sandbox Code Playgroud)
这将显示带有正确主机号的路径,以便在 /proc/scsi/scsi 中检查删除后是否消失。
我不敢相信还没有人提到 AHCI……您的 SATA 控制器必须处于 AHCI 模式才能启用热插拔。通过查看您正在使用的驱动程序来检查这一点:
root@peter:~ # find /sys -name sdk
/sys/devices/pci0000:00/0000:00:11.0/ata5/host4/target4:0:0/4:0:0:0/block /sdk
/sys/block/sdk
/sys/class/block/sdk
root@peter:~ # readlink /sys/devices/pci0000:00/0000:00:11.0/driver
../../../bus/pci/drivers/ahci
root@peter:~ # lspci -k | less
[... big long output... search for ahci or your pci address, or use the awk below ...]
root@peter:~ # lspci -k | awk '$1 == "00:11.0" {x=1}; x && /in use/ {print $0; exit}'
Kernel driver in use: ahci
Run Code Online (Sandbox Code Playgroud)
看看那里怎么说“ahci”。
如果没有,那么只需在您的 BIOS 中启用它。此外,某些 BIOS,尤其是在服务器或 UEFI 上,每个磁盘都有一个“热插拔 = 启用/禁用”设置,如果存在,您也应该启用该设置。
| 归档时间: |
|
| 查看次数: |
206922 次 |
| 最近记录: |