The*_*per 6 linux haproxy fault-tolerance keepalived
我有两个系统,两个都是虚拟机。配置为使用桥接网络。我正在尝试使用 keepalived 来管理 VIP 的所有权 - 10.190.1.230。我尝试了从源代码构建的两个版本的 keepalived-1.2.2 和 keepalived-1.2.1。
ServerA - RHEL5.2 x64 - 10.190.1.228 - PRIORITY 50
ServerB - RHEL6 x64 - 10.190.1.229 - PRIORITY 101
VIP - 10.190.1.230
Run Code Online (Sandbox Code Playgroud)
我的问题似乎是在 ServerB 上 keepalived 没有发送多播广告。它正在看到多播广告。来自服务器A:
[root@ServerB~]# tcpdump -vv -c 3 -i eth0 vrrp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:18:10.760577 IP (tos 0x0, ttl 255, id 856, offset 0, flags [none], proto VRRP (112), length 40)
10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
10:18:11.762039 IP (tos 0x0, ttl 255, id 857, offset 0, flags [none], proto VRRP (112), length 40)
10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
10:18:12.762883 IP (tos 0x0, ttl 255, id 858, offset 0, flags [none], proto VRRP (112), length 40)
10.190.1.228 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 50, authtype none, intvl 1s, length 20, addrs: 10.190.1.230
3 packets captured
3 packets received by filter
0 packets dropped by kernel
[root@ServerB~]#
Run Code Online (Sandbox Code Playgroud)
如果我杀死 ServerA 上的 keepalived,并保持 tcpdump 运行,我看不到数据包。我正在使用以下简单的 keepalived 配置:
vrrp_instance VI_1 {
interface eth0
state BACKUP
virtual_router_id 151
priority 50
virtual_ipaddress {
10.190.1.230
}
}
Run Code Online (Sandbox Code Playgroud)
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 151
priority 100
virtual_ipaddress {
10.190.1.230
}
}
Run Code Online (Sandbox Code Playgroud)
ServerA,我猜是正确的,因为它看不到来自 ServerB 上更高优先级 keepalived 的 VRRPv2 广告,因此持有 VIP:
[root@ServerA~]# ip add sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 08:00:27:59:58:c0 brd ff:ff:ff:ff:ff:ff
inet 10.190.1.228/24 brd 10.190.1.255 scope global eth0
inet 10.190.1.230/32 scope global eth0
inet6 fe80::a00:27ff:fe59:58c0/64 scope link
valid_lft forever preferred_lft forever
[root@ServerA~]#
Run Code Online (Sandbox Code Playgroud)
两台机器上的防火墙都被禁用。两个接口都设置了 MULTICAST 标志。
我已经使用 iperf 发布到 VRRP 组:
[root@ServerB~]# iperf -u -c 224.0.0.18
------------------------------------------------------------
Client connecting to 224.0.0.18, UDP port 5001
Sending 1470 byte datagrams
Setting multicast TTL to 1
UDP buffer size: 122 KByte (default)
------------------------------------------------------------
[ 3] local 10.190.1.229 port 32929 connected with 224.0.0.18 port 5001
^C[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 0.6 sec 73.2 KBytes 1.05 Mbits/sec
[ 3] Sent 51 datagrams
[root@ServerB~]#
Run Code Online (Sandbox Code Playgroud)
ServerA 可以看到这个流量:
[root@ServerA~]# tcpdump -c 3 -i eth0 host 224.0.0.18
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:37:30.460427 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
10:37:30.472247 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
10:37:30.482908 IP 10.190.1.229.33088 > vrrp.mcast.net.commplex-link: UDP, length 1470
3 packets captured
10 packets received by filter
0 packets dropped by kernel
[root@ServerA~]#
Run Code Online (Sandbox Code Playgroud)
以上会让我认为这不是网络问题。我在路由表中没有多播路由,但上面表明我不需要一个。多播流量正在使用 eth0。
最后,这里是从 ServerB 上的 keepalived 注销:
May 18 10:40:46 ServerB Keepalived: Starting Keepalived v1.2.1 (05/17,2011)
May 18 10:40:46 ServerB Keepalived: Remove a zombie pid file /var/run/keepalived.pid
May 18 10:40:46 ServerB Keepalived: Registering Kernel netlink reflector
May 18 10:40:46 ServerB Keepalived: Registering Kernel netlink command channel
May 18 10:40:46 ServerB Keepalived: Registering gratutious ARP shared channel
May 18 10:40:46 ServerB Keepalived: Configuration is using : 55219 Bytes
May 18 10:40:46 ServerB Keepalived: Using LinkWatch kernel netlink reflector...
Run Code Online (Sandbox Code Playgroud)
我没有用 -D 开关运行它,这似乎是内存调试,对我来说意义不大。我已将 strace 输出上传到此处。
当我使用 -n 标志(不要分叉)跟踪 keepalived 时,在上面链接的输出之后,我得到以下输出:
sendto(3, "<30>May 18 10:58:50 Keepalived: "..., 68, MSG_NOSIGNAL, NULL, 0) = 68
sendto(3, "<30>May 18 10:58:50 Keepalived: "..., 75, MSG_NOSIGNAL, NULL, 0) = 75
rt_sigaction(SIGCHLD, {0x411b60, [], SA_RESTORER|SA_RESTART, 0x3db5a32a20}, {SIG_DFL, [], 0}, 8) = 0
select(1024, [4 6], [], [], {1, 0}) = 0 (Timeout)
select(1024, [4 6], [], [], {1, 0}) = 0 (Timeout)
select(1024, [4 6], [], [], {1, 0}) = 0 (Timeout)
select(1024, [4 6], [], [], {1, 0}) = 0 (Timeout)
[ etc ..]
Run Code Online (Sandbox Code Playgroud)
这与 ServerA 上工作 keepalived 的 strace 输出形成对比,在其中我可以看到 sendto()、sendmdg() 和 recmsg() 调用。
男孩,我觉得自己很愚蠢。我将 keepalived.conf 文件保存为 /etc/keepalived/ 中的 keepalived.cfg (我想我是从 haproxy.cfg 中选择的)。Keepalived 查找 /etc/keepalived/keepalive.conf。我在没有 -f 标志的情况下启动 keepalived,因此它是在没有配置的情况下启动的。
如果我使用了 -d 选项(将conf转储到系统日志),我会看到它使用默认配置而不是采用我的设置。