如何修复mpt2sas“端口启用:超时失败(超时= 300s)”?

Pro*_*kup 6 debian scsi

由于已将连接到 LSI SAS 2008 控制器的驱动器设置为“开机待机”,因此不再检测到这些驱动器。待机开机也称为 PUIS、POIS 或“ATA6 待机模式开机”。

在 Super Micro X8SI6-F BIOS 中,“Load onboard SAS Option Rom”设置为“Disabled”。

使用以下命令禁用在引导期间加载内核模块 mpt2sas: echo 'blacklist mpt2sas' >> /etc/modprobe.d/mpt2sas.conf; depmod; update-initramfs -u -k $(uname -r)

modprobe mpt2sas 在 /etc/rc.local 中完成

驱动器上的 PUIS 是使用“ /sbin/hdparm -s1 --yes-i-know-what-i-am-doing /dev/sdX”设置的

# tail /var/log/messages
Dec 19 21:07:21 debian kernel: [   14.503509] mpt2sas0: Scatter Gather Elements per IO(128)
Dec 19 21:07:22 debian kernel: [   14.735785] mpt2sas0: LSISAS2008: FWVersion(14.00.01.00), ChipRevision(0x03), BiosVersion(07.27.00.00)
Dec 19 21:07:22 debian kernel: [   14.735878] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
Dec 19 21:07:22 debian kernel: [   14.736748] mpt2sas0: sending port enable !!
Dec 19 21:07:22 debian kernel: [   15.294663] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
Dec 19 21:07:22 debian kernel: [   15.294759] e1000e 0000:03:00.0: eth0: 10/100 speed: disabling TSO
Dec 19 21:07:22 debian kernel: [   15.296146] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Dec 19 21:07:23 debian kernel: [   16.257786] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5003048007abbc00), phys(8)
Dec 19 21:12:22 debian kernel: [  314.234004] mpt2sas0: port enable: FAILED with timeout (timeout=300s)
Dec 19 21:12:54 debian kernel: [  346.439736] mpt2sas0: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x50014380182cf0e6), phys(37)

# tail -n40 /var/log/syslog
Dec 19 21:41:11 debian kernel: [  240.376096] INFO: task modprobe:1341 blocked for more than 120 seconds.
Dec 19 21:41:11 debian kernel: [  240.376171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 19 21:41:11 debian kernel: [  240.376263] modprobe      D 0000000000000000     0  1341   1287 0x00000000
Dec 19 21:41:11 debian kernel: [  240.376414]  ffff88023f06b880 0000000000000082 0000000000000000 000000000000bfc5
Dec 19 21:41:11 debian kernel: [  240.376656]  0000000000000096 ffffffff8104e54b 000000000000f9e0 ffff88023d681fd8
Dec 19 21:41:11 debian kernel: [  240.376895]  0000000000015780 0000000000015780 ffff88023bc80000 ffff88023bc802f8
Dec 19 21:41:11 debian kernel: [  240.377134] Call Trace:
Dec 19 21:41:11 debian kernel: [  240.377204]  [<ffffffff8104e54b>] ? release_console_sem+0x17e/0x1af
Dec 19 21:41:11 debian kernel: [  240.377278]  [<ffffffff8105aeba>] ? __mod_timer+0x141/0x153
Dec 19 21:41:11 debian kernel: [  240.377350]  [<ffffffff812fbec4>] ? schedule_timeout+0xa5/0xdd
Dec 19 21:41:11 debian kernel: [  240.377422]  [<ffffffff8105aa34>] ? process_timeout+0x0/0x5
Dec 19 21:41:11 debian kernel: [  240.377492]  [<ffffffff812fbd04>] ? wait_for_common+0xde/0x15b
Dec 19 21:41:11 debian kernel: [  240.377566]  [<ffffffff8104a461>] ? default_wake_function+0x0/0x9
Dec 19 21:41:11 debian kernel: [  240.377647]  [<ffffffffa021d6e1>] ? _base_make_ioc_operational+0x929/0xa6f [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.377743]  [<ffffffffa021fa85>] ? mpt2sas_base_attach+0xb73/0xc61 [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.377817]  [<ffffffff810412ee>] ? enqueue_task_fair+0x3e/0x82
Dec 19 21:41:11 debian kernel: [  240.377889]  [<ffffffff8103a311>] ? enqueue_task+0x5f/0x68
Dec 19 21:41:11 debian kernel: [  240.377956]  [<ffffffff8103a403>] ? activate_task+0x22/0x28
Dec 19 21:41:11 debian kernel: [  240.378037]  [<ffffffffa0222e21>] ? _scsih_probe+0x32c/0x501 [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.378115]  [<ffffffff811a2d46>] ? local_pci_probe+0x12/0x16
Dec 19 21:41:11 debian kernel: [  240.378188]  [<ffffffff811a3996>] ? pci_device_probe+0xc0/0xe9
Dec 19 21:41:11 debian kernel: [  240.378263]  [<ffffffff81221520>] ? driver_probe_device+0xa3/0x14b
Dec 19 21:41:11 debian kernel: [  240.378333]  [<ffffffff81221617>] ? __driver_attach+0x4f/0x6f
Dec 19 21:41:11 debian kernel: [  240.378404]  [<ffffffff812215c8>] ? __driver_attach+0x0/0x6f
Dec 19 21:41:11 debian kernel: [  240.378477]  [<ffffffff81220def>] ? bus_for_each_dev+0x43/0x74
Dec 19 21:41:11 debian kernel: [  240.378549]  [<ffffffff812207af>] ? bus_add_driver+0xaf/0x1f8
Dec 19 21:41:11 debian kernel: [  240.378621]  [<ffffffff812218cf>] ? driver_register+0xa7/0x111
Dec 19 21:41:11 debian kernel: [  240.378698]  [<ffffffffa015f000>] ? _scsih_init+0x0/0x112 [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.378772]  [<ffffffff811a3bdc>] ? __pci_register_driver+0x50/0xb8
Dec 19 21:41:11 debian kernel: [  240.378849]  [<ffffffffa015f000>] ? _scsih_init+0x0/0x112 [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.378928]  [<ffffffffa015f0fc>] ? _scsih_init+0xfc/0x112 [mpt2sas]
Dec 19 21:41:11 debian kernel: [  240.379002]  [<ffffffff8100a065>] ? do_one_initcall+0x64/0x174
Dec 19 21:41:11 debian kernel: [  240.379072]  [<ffffffff8107ab54>] ? sys_init_module+0xc5/0x21a
Dec 19 21:41:11 debian kernel: [  240.379144]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Dec 19 21:42:18 debian kernel: [  307.313037] mpt2sas0: _base_send_port_enable: timeout
Dec 19 21:42:18 debian kernel: [  307.313106] mpt2sas0: port enable: FAILED
Dec 19 21:42:18 debian kernel: [  307.313171] mpt2sas0: sending diag reset !!
Dec 19 21:42:19 debian kernel: [  308.430890] mpt2sas0: diag reset: SUCCESS
Dec 19 21:42:19 debian kernel: [  308.431001] mpt2sas 0000:01:00.0: PCI INT A disabled
Dec 19 21:42:19 debian kernel: [  308.431102] mpt2sas0: failure at /build/buildd-linux-2.6_2.6.32-46-amd64-_ApuPc/linux-2.6-2.6.32/debian/build/source_amd64_none/drivers/scsi/mpt2sas/mpt2sas_scsih.c:6021/_scsih_probe()!
Run Code Online (Sandbox Code Playgroud)

回归

使用 Debian Linux 内核测试:

  1. Linux debian 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64
  2. BPO.3(Linux debian 3.2.0-0.bpo.3-amd64 #1 SMP Thu Aug 23 07:41:30 UTC 2012 x86_64)
  3. BPO.4 (Linux debian 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.32-1~bpo60+1 x86_64)

在启用 SAS BIOS 的情况下进行测试 = 无变化。

超时后,bpo.3 和 bpo4。冻结。甚至 PgUp/PgDown 键在控制台上也不起作用。

modinfo /lib/modules/2.6.32-5-amd64/kernel/drivers/scsi/mpt2sas/mpt2sas.ko | grep ^version:
version:        02.100.03.00
Run Code Online (Sandbox Code Playgroud)

更新 #1:在 BIOS 和固件升级适配器上使用 LSI 驱动程序版本 15.00.00.00 进行测试:

mpt2sas0: LSISAS2008: FWVersion(15.00.00.00), ChipRevision(0x03), BiosVersion(07.29.00.00)
mpt2sas0: port enable: FAILED with timeout (timeout=300s)
Run Code Online (Sandbox Code Playgroud)

并且系统在使用 mpt2sas 驱动程序版本 15 启动后 3492 秒冻结。通过刷入 IT 固件解决了冻结问题。

更新 #2:一些更详细的 SMP 报告

# smp_rep_phy_err_log /dev/bsg/expander-0\:0 -vvv
    Report phy error log request: 40 11 06 02 00 00 00 00 00 00 00 00 00 00 00 00 
Report phy error log response:
  Expander change count: 303
  phy identifier: 0
  invalid dword count: 18518
  running disparity error count: 18492
  loss of dword synchronization count: 2
  phy reset problem count: 0

# smp_rep_phy_err_log /dev/bsg/expander-0\:1 -vvv
    Report phy error log request: 40 11 06 02 00 00 00 00 00 00 00 00 00 00 00 00 
Report phy error log response:
  Expander change count: 715
  phy identifier: 0
  invalid dword count: 36103
  running disparity error count: 35004
  loss of dword synchronization count: 4
  phy reset problem count: 0

# smp_rep_phy_sata --phy=5 /dev/bsg/expander-0\:0 -vvv
    Report phy SATA request: 40 12 10 02 00 00 00 00 00 05 00 00 00 00 00 00 
Report phy SATA response:
  expander change count: 303
  phy identifier: 5
  STP I_T nexus loss occurred: 0
  affiliations supported: 1
  affiliation valid: 1
  STP SAS address: 0x50014380182cf0c5
  register device to host FIS:
    34 00 50 01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 
  affiliated STP initiator SAS address: 0x5003048007abbc00
  STP I_T nexus loss SAS address: 0x0
  affiliation context: 0
  current affiliation contexts: 1
  maximum affiliation contexts: 1

# smp_rep_exp_route_tbl /dev/bsg/expander-0\:0 -vvv
    Report expander route table request: 
      40 22 ff 06 00 00 00 00  00 3e 00 00 00 00 00 00 
      00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
Report expander route table response header:
  expander change count: 303
  expander route table change count: 1
  self configuring: 0
  zone configuring: 0
  configuring: 0
  zone enabled: 0
  expander route table descriptor length: 4 dwords
  number of expander route table descriptors: 0
  first routed SAS address index: 0
  last routed SAS address index: 0
  starting phy id: 0
Run Code Online (Sandbox Code Playgroud)

更新 #3:在 mpt2sas.ko 中启用详细日志记录,并且使用dev.scsi.logging_level = 0x180000F1in增加了 scsi 日志记录/etc/sysctl.conf。结果 /var/log/messages 输出:

debian kernel: [    0.927392] setting logging_level(0x00080000)
debian kernel: [    1.591808] mpt2sas0: sending port enable !!
debian kernel: [    3.113480] mpt2sas0: host_add: handle(0x0001), sas_addr(0x5003048007abbc00), phys(8)
debian kernel: [    3.124224] mpt2sas0: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x50014380182cf0e6), phys(37)
debian kernel: [    3.137436] mpt2sas0: detecting: handle(0x000a), sas_address(0x50014380182cf0c0), phy(0)
debian kernel: [    3.137520] mpt2sas0: REPORT_LUNS: handle(0x000a), retries(0)
debian kernel: [    8.127417] mf:
debian kernel: [    8.127417]   0000000a 00000000 00000000 3a580000 00600000 00000018 00000000 000007f8 
debian kernel: [    8.127842]   00000000 0000000c 00000000 00000000 00000000 00000000 00000000 02000000 
debian kernel: [    8.128261]   000000a0 00000000 0000f807 00000000 00000000 00000000 00000000 00000000 
debian kernel: [    8.128679]   d30007f8 3c3a7000 00000002 00000000 
debian kernel: [    8.128980] mpt2sas0: issue target reset: handle(0x000a)
debian kernel: [    8.352363] mpt2sas0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
debian kernel: [    8.352500] mpt2sas0: target reset completed: handle(0x000a)
debian kernel: [    8.352563] mpt2sas0: issue retry: handle (0x000a)
debian kernel: [   11.347175] mpt2sas0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
debian kernel: [   11.347276] mpt2sas0: TEST_UNIT_READY: handle(0x000a), lun(0)
debian kernel: [   14.591621] mpt2sas0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
debian kernel: [   14.591720] mpt2sas0: SATA Initialization Timeout,sending a retry
debian kernel: [   14.591785] mpt2sas0: TEST_UNIT_READY: handle(0x000a), lun(0)
debian kernel: [   17.586480] mpt2sas0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
debian kernel: [   17.586836] mpt2sas0: detecting: handle(0x000b), sas_address(0x50014380182cf0c1), phy(1)
Run Code Online (Sandbox Code Playgroud)

更新 #4:当 PUIS = 禁用时,没有超时和正确的驱动器初始化。

其他适配器的 BIOS,如 HP Smart Array P410/256MB 控制器 (462862-B21) 和 Highpoint Rocket 2720SGL,也无法检测到任何 PUIS/POIS Hitachi 驱动器。Highpoint 控制器 BIOS 表示它正在启动 Group #1,但仍然无法检测到扩展器后面的任何驱动器。

如何使用 LSISA2008 控制器检测 HP SAS 扩展器后面的 POIS/PUIS 模式驱动器?

Nil*_*ils 0

在我看来,好像您必须在这种状态下发出 SCSI 启动命令。也许启用您的 SCSI-BIOS 就可以做到这一点 - 有时每个 SCSI-id 都有一个用于“启动单元”是/否的选项。