为什么 NGINX 负载均衡器被动健康检查无法检测到上游服务器离线?

Kuy*_*hii 8 load-balancing nginx nginx-config

upstream我的Nginx 配置文件中有一个块。此块列出了多个后端服务器,通过这些服务器来平衡请求的负载。

\n
...\nupstream backend {\n    server backend1.com;\n    server backend2.com;\n    server backend3.com;\n}\n...\n
Run Code Online (Sandbox Code Playgroud)\n

上述 3 个后端服务器中的每一个都运行一个 Node 应用程序。

\n
    \n
  1. 如果我stop the application process在 backend1 - Nginx 识别到这一点,通过被动健康检查,流量将仅定向到 backend2 和 backend3,如预期的那样。
  2. \n
  3. 但是,如果我power down the server托管 backend1,Nginx 不会识别出它已离线,并继续尝试向其发送流量/请求。Nginx 仍然尝试将流量引导到离线服务器,导致错误:504
  4. \n
\n

有人可以解释一下为什么会发生这种情况(上面的场景 2)以及我是否缺少一些进一步的配置吗?

\n

更新: \n我开始怀疑我所看到的行为是否是因为上面的上游块位于HTTP {}Nginx 上下文中。如果 backend1 确实断电,这将是一个连接错误,所以(也许这里偏离了目标,但只是大声思考)这应该是 TCP 健康检查吗?

\n

更新2:

\n

nginx.conf

\n
user www-data;\nworker_processes auto;\npid /run/nginx.pid;\ninclude /etc/nginx/modules-enabled/*.conf;\n\nevents {\n    worker_connections 768;\n    # multi_accept on;\n}\n\nhttp {\n\n\n       upstream backends {\n          server xx.xx.xx.37:3000 fail_timeout=2s;\n          server xx.xx.xx.52:3000 fail_timeout=2s;\n          server xx.xx.xx.69:3000 fail_timeout=2s;\n        }\n\n    ##\n    # Basic Settings\n    ##\n\n    sendfile on;\n    tcp_nopush on;\n    tcp_nodelay on;\n    keepalive_timeout 65;\n    types_hash_max_size 2048;\n    # server_tokens off;\n\n    # server_names_hash_bucket_size 64;\n    # server_name_in_redirect off;\n\n    include /etc/nginx/mime.types;\n    default_type application/octet-stream;\n\n    ##\n    # SSL Settings\n    ##\n        ssl_certificate     \xe2\x80\xa6\n        ssl_certificate_key \xe2\x80\xa6\n        ssl_ciphers         \xe2\x80\xa6;\n    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE\n    ssl_prefer_server_ciphers on;\n\n    ##\n    # Logging Settings\n    ##\n\n    access_log /var/log/nginx/access.log;\n    error_log /var/log/nginx/error.log;\n\n    ##\n    # Gzip Settings\n    ##\n\n    gzip on;\n\n    # gzip_vary on;\n    # gzip_proxied any;\n    # gzip_comp_level 6;\n    # gzip_buffers 16 8k;\n    # gzip_http_version 1.1;\n    # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;\n\n    ##\n    # Virtual Host Configs\n    ##\n\n    include /etc/nginx/conf.d/*.conf;\n    include /etc/nginx/sites-enabled/*;\n}\n
Run Code Online (Sandbox Code Playgroud)\n

default

\n
server {\n    listen 80;\n    listen [::]:80;\n    return 301 https://$host$request_uri;\n    #server_name ...;\n}\nserver {\n\n    listen              443 ssl;\n    listen              [::]:443 ssl;\n    # SSL configuration\n    ...\n    # Add index.php to the list if you are using PHP\n    index index.html index.htm;\n\n    server_name _;\n\n    location / {\n        # First attempt to serve request as file, then\n        # as directory, then fall back to displaying a 404.\n                 try_files $uri $uri/ /index.html;\n                 #try_files $uri $uri/ =404;\n\n    }\n\n        location /api {\n            rewrite /api/(.*) /$1  break;\n            proxy_pass http://backends;\n            proxy_redirect     off;\n            proxy_set_header   Host $host;\n            proxy_set_header   X-Real-IP $remote_addr;\n            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;\n            proxy_set_header   X-Forwarded-Host $server_name;\n         }\n\n        # Requests for socket.io are passed on to Node on port 3000\n       location /socket.io/ {\n             proxy_http_version 1.1;\n\n             proxy_set_header Upgrade $http_upgrade;\n             proxy_set_header Connection "upgrade";\n\n             proxy_pass http://backends;\n        }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

Amj*_*yed 3

您得到 a 的原因504是当 nginx 进行 HTTP 运行状况检查时,它会尝试连接到您配置的位置(例如:/状态200代码)。由于backend1断电且端口未侦听并且套接字已关闭。

需要一些时间才能获得超时异常,因此504: gateway timeout.

当您停止应用程序进程时,情况会有所不同。端口将不会侦听,并且它将connection refused很快被识别并将实例标记为unavailable

为了克服这个问题,您可以设置fail_timeout=2s将服务器标记为不可用,默认值是10秒。

https://nginx.org/en/docs/http/ngx_http_upstream_module.html?&_ga=2.174685482.969425228.1595841929-1716500038.1594281802#fail_timeout