由于神秘的 IP 冲突,故障转移集群无法进行故障转移?

Sen*_*eer 5 windows-server-2008 failovercluster microsoft-cluster-server

我的故障转移集群有一个神秘的问题,

Cluster name: PrintCluster01.domain.com
Members: PrintServer01.domain.com  andPrintServer02.domain.com
Run Code Online (Sandbox Code Playgroud)

在故障转移群集管理 - 群集事件中,我收到了严重错误消息 1135 和 1177:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:49 PM
Event ID: 1177
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. 
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.


Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:28 PM
Event ID: 1135
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Run Code Online (Sandbox Code Playgroud)

经过进一步调查,我在这里发现了一些有趣的错误,来自 PrintServer02 上事件查看器中记录的第一条严重错误消息:

Log Name: System
Source: Tcpip
Date: 15/06/2011 9:07:29 PM
Event ID: 4199
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.
Run Code Online (Sandbox Code Playgroud)

192.168.127.142 --> PrintServer01 的辅助 IP 怎么可能与 PrintServer01 节点之一发生冲突?详情如下:

**From PrintServer01**
Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled
Run Code Online (Sandbox Code Playgroud)

我仔细检查了所有集群成员,所有 IP 地址现在都是唯一的。

但是我确定我的 IP 是静态的,而不是 DHCP,因为从下面的 IPCONFIG 结果来看:

From **PrintServer01** (the Active Node)
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer01
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.10
 Secondary WINS Server . . . . . . : 192.168.127.11
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled


From **PrintServer02**
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer02
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.11
 Secondary WINS Server . . . . . . : 192.168.127.10
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled
Run Code Online (Sandbox Code Playgroud)

任何帮助将不胜感激。

谢谢,AWT

小智 2

当群集中的多个节点尝试同时使资源组(及其关联的 IP)联机时,就会出现 IP 地址冲突错误。

如果集群节点彼此暂时失去联系,就会发生这种情况。每个节点都假设另一个节点发生故障,因此“被动”节点将使所有资源组联机,而实际上它们在“主动”节点上仍然联机。

我在我们的 VMWare 环境中看到了这个问题,当其中一台 ESX(i) 主机过载时 - 有时甚至只是在 HBA 总线重新扫描期间,MSCS 节点突然非常短暂地失去联系并发生这种混乱。