Xen DomU 根文件系统在 iSCSI 虚拟 IP 故障转移时变为只读

Question

Xen DomU 根文件系统在 iSCSI 虚拟 IP 故障转移时变为只读

我的 Xen 服务器是 openSUSE 11.1，带有 open-iscsi 到我们的 iSCSI SAN 集群。SAN 模块位于启动器连接到的虚拟 IP 后面的 IP 故障转移组中。

如果主 SAN 服务器出现故障，则辅助服务器将充当目标服务器的角色。这一切都由 LeftHand SAN/iQ 软件处理，并且在大多数情况下运行良好。

我遇到的问题是，有时我的一些 Xen DomU 在 IP 故障转移后会使其根文件系统变为只读。它不一致，并且每次发生故障转移时都会发生在不同的子集上。它们都运行相同的 openSUSE 11.1 软件映像。

每个 DomU 的根文件系统通过 open-iscsi 挂载在 Dom0 中，然后 Xen 使用标准块设备驱动程序将其公开给 DomU。

确切的症状是作为 root 运行touch /test返回错误“只读文件系统”。但是，输出mount显示它以读写方式挂载。当然，此时 domU 上的所有其他 I/O 也都失败了，因此机器很难停机。只需xm从 Dom0重新启动它，甚至无需重新连接 iSCSI 会话，一切都会再次运行。

在 Dom0 端，故障转移期间的 syslog 消息类似于以下内容：

kernel: connection1:0: iscsi: detected conn error (1011)
iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)

Run Code Online (Sandbox Code Playgroud)

我很难弄清楚在哪一层调试这个问题，它是 DomU 内核中的东西吗？还是在 Dom0 或 Xen 级别？我认为某处可能有一些参数需要调整以增加某种超时，但我不确定在哪里查看。

我真的不认为这是 open-iscsi 的问题，因为连接的块设备仍然可以从 Dom0 读取和写入。

Answer 1

Kam*_*iel 6

我最终通过使用 open-iscsi 文档中的以下建议和设置解决了这个问题：

8.2 iSCSI settings for iSCSI root
---------------------------------

When accessing the root parition directly through a iSCSI disk, the
iSCSI timers should be set so that iSCSI layer has several chances to try to
re-establish a session and so that commands are not quickly requeued to
the SCSI layer. Basically you want the opposite of when using dm-multipath.

For this setup, you can turn off iSCSI pings by setting:

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

And you can turn the replacement_timer to a very long value:

node.session.timeo.replacement_timeout = 86400

Run Code Online (Sandbox Code Playgroud)

如上所述设置到每个 LUN 的连接后，故障转移就像一个魅力，即使需要几分钟才能发生。

归档时间：	16 年，4 月前
查看次数：	9755 次
最近记录：	14 年，5 月前