RabbitMQ HAProxy 配置

msa*_*nce 3 load-testing load-balancing haproxy rabbitmq

我有一个向 RabbitMQ 发送消息的 API。

我在 HAProxy 后面有一个高可用性 RabbitMQ 集群。

当我对 API 进行负载测试时,我开始看到很多这样的内容:

Recovering from a network failure... Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.

在我的独角兽日志中。

如果我通过 haproxy 直接连接到 RabbitMQ,我不会。我哪里出错了,我的 haproxy 配置如下所示:

global
  log 127.0.0.1   local0
  log 127.0.0.1   local1 notice
  #log loghost    local0 info
  maxconn 4096
  #debug
  #quiet
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  retries 3
  timeout client 50s
  timeout connect 10s
  timeout server 50s
  option dontlognull
  option forwardfor
  option httplog
  option redispatch
  balance  roundrobin

# Set up application listeners here.

listen http_frontend
  bind *:80
  mode http
  default_backend http_backend
  option httpclose
  reqadd X-Forwarded-Proto:\ http

listen https_frontend
  bind *:443 ssl crt /etc/haproxy.pem
  mode http
  default_backend http_backend
  reqadd X-Forwarded-Proto:\ https

listen http_bucky_frontend
  bind *:1880
  mode http
  default_backend http_bucky_backend
  option httpclose
  reqadd X-Forwarded-Proto:\ http

listen https_bucky_frontend
  bind *:1443 ssl crt /etc/haproxy.pem
  mode http
  default_backend http_bucky_backend
  reqadd X-Forwarded-Proto:\ https

listen rabbitmq_frontend
  bind *:5672
  mode tcp
  default_backend rabbitmq_backend
  option tcplog

listen admin
  bind 127.0.0.1:22002
  mode http
  stats uri /


backend http_backend
  mode http
  server 0-http_backend x.x.x.x:9000 maxconn 100 check
  server 1-http_backend x.x.x.x:9000 maxconn 100 check

backend http_bucky_backend
  mode http
  option httpchk GET /status
  http-check expect string up
  server 0-http_bucky_backend x.x.x.x:9000 maxconn 100 check
  server 1-http_bucky_backend x.x.x.x:9000 maxconn 100 check

backend rabbitmq_backend
  balance roundrobin
  mode tcp
  server 0-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
  server 1-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
Run Code Online (Sandbox Code Playgroud)

负载均衡时,负载均衡器通常为 20-30% cpu

Ton*_*ony 9

Nerijus 是对的,这个问题是由 HAProxy 的客户端超时引起的,这意味着,如果连接被认为空闲超过 X 毫秒,则连接将被丢弃。

TCP 可以发送保活数据包以确保空闲连接保持打开状态。

您可以使用以下命令检查保持活动数据包的 TCP 参数:

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
Run Code Online (Sandbox Code Playgroud)

默认情况下,此配置等于 7200 秒,这意味着只有在连接空闲超过 2 小时后,TCP 才会开始发送保活数据包。

因此,只需将您的 HAProxy 客户端超时值更新为 > 2 小时,例如:

timeout client 3h
Run Code Online (Sandbox Code Playgroud)

并将 clitcpka 选项添加到您的后端:

backend rabbitmq_backend
  balance roundrobin
  mode tcp
  option          clitcpka
  server 0-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
  server 1-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
Run Code Online (Sandbox Code Playgroud)