HAProxy 高 CPU 使用率 - 配置问题?

UpT*_*eek 3 haproxy

我们最近遇到了流量高峰,虽然规模不大,但导致 haproxy 将一个 CPU 核心最大化(​​并且服务器变得无响应)。我猜我在做一些低效的配置,所以想问问所有的 haproxy 专家,他们是否愿意批评我下面的配置文件(主要是从性能角度)。

该配置旨在分布在一组 http 应用程序服务器、一组处理 websockets 连接的服务器(在不同端口上有多个单独的进程)和一个静态文件 web 服务器之间。从性能问题来看,它运行良好。(一些细节已被编辑。)

您能提供的任何指导将不胜感激!

HAProxy v1.4.8

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
        global
                daemon
                maxconn         100000
                log             127.0.0.1 local0 notice
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
        defaults
                log                     global
                mode                    http
                option                  httplog
                option                  httpclose               #http://serverfault.com/a/104782/52811
                timeout connect         5000ms
                timeout client          50000ms
                timeout server          5h                      #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel';
#---------------------------------------------------------------------
# FRONTEND
#---------------------------------------------------------------------
        frontend public
                bind *:80
                maxconn         100000
                reqidel ^X-Forwarded-For:.*                     #Remove any x-forwarded-for headers
                option forwardfor                               #Set the forwarded for header (needs option httpclose)

                default_backend app
                redirect prefix http://xxxxxxxxxxxxxxxxx code 301 if { hdr(host) -i www.xxxxxxxxxxxxxxxxxxx }
                timeout client  5h                              #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel';

        # ACLs
        ##########
        acl static_request hdr_beg(host) -i i.
        acl static_request hdr_beg(host) -i static.
        acl static_request path_beg /favicon.ico /robots.txt
        acl test_request hdr_beg(host) -i test.
        acl ws_request hdr_beg(host) -i ws
        # ws11
        acl ws11x1_request hdr_beg(host) -i ws11x1
        acl ws11x2_request hdr_beg(host) -i ws11x2
        acl ws11x3_request hdr_beg(host) -i ws11x3
        acl ws11x4_request hdr_beg(host) -i ws11x4
        acl ws11x5_request hdr_beg(host) -i ws11x5
        acl ws11x6_request hdr_beg(host) -i ws11x6
        # ws12
        acl ws12x1_request hdr_beg(host) -i ws12x1
        acl ws12x2_request hdr_beg(host) -i ws12x2
        acl ws12x3_request hdr_beg(host) -i ws12x3
        acl ws12x4_request hdr_beg(host) -i ws12x4
        acl ws12x5_request hdr_beg(host) -i ws12x5
        acl ws12x6_request hdr_beg(host) -i ws12x6

        # Which backend....
        ###################
        use_backend static   if static_request
        #ws11
        use_backend ws11x1 if ws11x1_request
        use_backend ws11x2 if ws11x2_request
        use_backend ws11x3 if ws11x3_request
        use_backend ws11x4 if ws11x4_request
        use_backend ws11x5 if ws11x5_request
        use_backend ws11x6 if ws11x6_request
        #ws12
        use_backend ws12x1 if ws12x1_request
        use_backend ws12x2 if ws12x2_request
        use_backend ws12x3 if ws12x3_request
        use_backend ws12x4 if ws12x4_request
        use_backend ws12x5 if ws12x5_request
        use_backend ws12x6 if ws12x6_request
#---------------------------------------------------------------------
# BACKEND - APP
#---------------------------------------------------------------------
        backend app
                timeout server          50000ms #To counter the WS default
                mode http
                balance roundrobin
                option httpchk HEAD /upchk.txt
                server app1 app1:8000             maxconn 100000 check
                server app2 app2:8000             maxconn 100000 check
                server app3 app3:8000             maxconn 100000 check
                server app4 app4:8000             maxconn 100000 check
#---------------------------------------------------------------------
# BACKENDs - WS
#---------------------------------------------------------------------
#Server ws11
        backend ws11x1
                server ws11 ws11:8001 maxconn 100000
        backend ws11x2
                server ws11 ws11:8002 maxconn 100000
        backend ws11x3
                server ws11 ws11:8003 maxconn 100000
        backend ws11x4
                server ws11 ws11:8004 maxconn 100000
        backend ws11x5
                server ws11 ws11:8005 maxconn 100000
        backend ws11x6
                server ws11 ws11:8006 maxconn 100000
#Server ws12
        backend ws12x1
                server ws12 ws12:8001 maxconn 100000
        backend ws12x2
                server ws12 ws12:8002 maxconn 100000
        backend ws12x3
                server ws12 ws12:8003 maxconn 100000
        backend ws12x4
                server ws12 ws12:8004 maxconn 100000
        backend ws12x5
                server ws12 ws12:8005 maxconn 100000
        backend ws12x6
                server ws12 ws12:8006 maxconn 100000
#---------------------------------------------------------------------
# BACKEND - STATIC
#---------------------------------------------------------------------
        backend static
                server static1 static1:80   maxconn 40000
Run Code Online (Sandbox Code Playgroud)

Dan*_*ick 5

100,000 个连接太多了……你有那么多推力吗?如果是这样......可能会拆分前端,使其绑定在一个用于静态内容的 ip 和一个用于应用程序内容的 ip 上,然后将静态和应用程序变体作为单独的 haproxy 进程运行(假设您在服务器上有第二个核心/cpu) ...

如果不出意外,它会将使用范围缩小到应用程序或静态流......


如果我正确地记住了我的网络 101 类...... HaProxy 应该无法100,000连接到ws12:8001或任何其他后端主机:端口,因为 ~65536 端口限制更接近28232大多数系统 ( cat /proc/sys/net/ipv4/ip_local_port_range)。您可能正在耗尽本地端口,这反过来可能导致 CPU 在等待端口释放时挂起。

也许将每个后端的最大连接数降低到接近 28000 会缓解这个问题?或者将本地端口范围更改为更具包容性?