Jch*_*ppa 7 nginx reverse-proxy load-balancing
我正在尝试在 centos 7 虚拟机上使用 nginx 作为负载平衡器来替换老化的 Coyote Point 硬件设备。但是,在我们的一个 web 应用程序中,我们在日志中看到频繁且持续的上游超时错误,并且客户端在使用系统时报告会话问题。
这是我们 nginx.conf 中的相关部分
user nginx;
worker_processes 4;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
upstream farm {
ip_hash;
server www1.domain.com:8080;
server www2.domain.com:8080 down;
server www3.domain.com:8080;
server www4.domain.com:8080;
}
server {
listen 192.168.1.87:80;
server_name host.domain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 192.168.1.87:443 ssl;
server_name host.domain.com;
## Compression
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 4;
gzip_http_version 1.0;
gzip_min_length 1280;
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript image/x-icon image/bmp;
gzip_vary on;
tcp_nodelay on;
tcp_nopush on;
sendfile off;
location / {
proxy_connect_timeout 10;
proxy_send_timeout 180;
proxy_read_timeout 180; #to allow for large managers reports
proxy_buffering off;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://farm;
location ~* \.(css|jpg|gif|ico|js)$ {
proxy_cache mypms_cache;
add_header X-Proxy-Cache $upstream_cache_status;
proxy_cache_valid 200 60m;
expires 60m;
proxy_pass http://farm;
}
}
location /basic_status {
stub_status;
}
error_page 502 502 = /maintenance.html;
location = /maintenance.html {
root /www/;
}
}
Run Code Online (Sandbox Code Playgroud)
在日志中,我经常看到类似的条目
2015/03/13 15:22:58 [error] 4482#0: *557390 upstream timed out (110: Connection timed out) while connecting to upstream, client: 72.160.92.101, server: host.domain.com, request: "GET /tapechart.php HTTP/1.1", upstream: "http://192.168.1.50:8080/tapechart.php", host: "host.domain.com", referrer: "https://host.domain.com/tapechart.php"
2015/03/13 15:23:14 [error] 4481#0: *557663 upstream timed out (110: Connection timed out) while connecting to upstream, client: 174.53.144.4, server: host.domain.com, request: "GET /bkgtabs.php?bookingID=3105543&show=0 HTTP/1.1", upstream: "http://192.168.1.50:8080/bkgtabs.php?bookingID=3105543&show=0", host: "host.domain.com", referrer: "https://host.domain.com/bkgtabs.php?bookingID=3105543&show=0"
2015/03/13 15:23:19 [error] 4481#0: *557550 upstream timed out (110: Connection timed out) while connecting to upstream, client: 50.134.133.213, server: host.domain.com, request: "GET /tbltapechart.php?numNights=30&startDate=1-Aug-2015&roomTypeID=-1&hideNav=N&bookingID=&roomFilter=-1 HTTP/1.1", upstream: "http://192.168.1.50:8080/tbltapechart.php?numNights=30&startDate=1-Aug-2015&roomTypeID=-1&hideNav=N&bookingID=&roomFilter=-1", host: "host.domain.com", referrer: "https://host.domain.com/tapechart.php"
2015/03/13 15:23:37 [error] 4483#0: *561705 upstream timed out (110: Connection timed out) while connecting to upstream, client: 74.223.167.14, server: host.domain.com, request: "GET /js/multiselect/jquery.multiselect.filter.css HTTP/1.1", upstream: "http://192.168.1.55:8080/js/multiselect/jquery.multiselect.filter.css", host: "host.domain.com", referrer: "https://host.domain.com/fdhome.php"
2015/03/13 15:23:40 [error] 4481#0: *561099 upstream timed out (110: Connection timed out) while connecting to upstream, client: 74.223.167.14, server: host.domain.com, request: "GET /img/tabs_left_bc.jpg HTTP/1.1", upstream: "http://192.168.1.55:8080/img/tabs_left_bc.jpg", host: "host.domain.com", referrer: "https://host.domain.com/fdhome.php"
2015/03/13 15:23:45 [error] 4481#0: *557214 upstream timed out (110: Connection timed out) while connecting to upstream, client: 75.37.141.182, server: host.domain.com, request: "GET /tapechart.php HTTP/1.1", upstream: "http://192.168.1.50:8080/tapechart.php", host: "host.domain.com", referrer: "https://host.domain.com/tapechart.php"
2015/03/13 15:23:52 [error] 4482#0: *557330 upstream timed out (110: Connection timed out) while connecting to upstream, client: 173.164.149.18, server: host.domain.com, request: "GET /bkgtabs.php?bookingID=658108460B&show=1&toFolioID=3361434 HTTP/1.1", upstream: "http://192.168.1.50:8080/bkgtabs.php?bookingID=658108460B&show=1&toFolioID=3361434", host: "host.domain.com", referrer: "https://host.domain.com/bkgtabs.php?bookingID=658108460B&show=1&toFolioID=3361434"
2015/03/13 15:24:14 [error] 4481#0: *557663 upstream timed out (110: Connection timed out) while connecting to upstream, client: 174.53.144.4, server: host.domain.com, request: "GET /bkgtabs.php?bookingID=3105543&show=0 HTTP/1.1", upstream: "http://192.168.1.50:8080/bkgtabs.php?bookingID=3105543&show=0", host: "host.domain.com", referrer: "https://host.domain.com/bkgtabs.php?bookingID=3105543&show=0"
2015/03/13 15:24:15 [error] 4481#0: *557752 upstream timed out (110: Connection timed out) while connecting to upstream, client: 24.158.4.70, server: host.domain.com, request: "GET /bkgtabs.php?bookingID=2070569 HTTP/1.1", upstream: "http://192.168.1.50:8080/bkgtabs.php?bookingID=2070569", host: "host.domain.com", referrer: "https://host.domain.com/tapechart.php"
2015/03/13 15:24:15 [error] 4482#0: *558613 upstream timed out (110: Connection timed out) while connecting to upstream, client: 199.102.121.3, server: host.domain.com, request: "GET /rptlanding.php HTTP/1.1", upstream: "http://192.168.1.50:8080/rptlanding.php", host: "host.domain.com", referrer: "https://host.domain.com/tapechart.php"
2015/03/13 15:24:17 [error] 4482#0: *557353 upstream timed out (110: Connection timed out) while connecting to upstream, client: 174.53.144.4, server: host.domain.com, request: "GET /js/multiselect/demo/assets/prettify.js HTTP/1.1", upstream: "http://192.168.1.50:8080/js/multiselect/demo/assets/prettify.js", host: "host.domain.com", referrer: "https://host.domain.com/bkgtabs.php?bookingID=3186044"
Run Code Online (Sandbox Code Playgroud)
我最初发现我必须设置如此高的 proxy_read_timeout,因为我们有 1 个非常大的报告,并且至少需要 20 秒才能为具有中等数据集的用户完全呈现。我们拥有最大数据集的用户最多可能需要 2 分钟才能呈现报告。然而,它很少运行,通常每天使用不到一次,并且从来没有成为日志中 GET 字符串中的 URL。
四个后端服务器是相同的 Apache 服务器,它们都运行从源代码构建的 httpd 2.2.29 和 php 5.5.22,并且都在同一版本的 centos 上并且是最新的。正如我最初在日志中看到 MaxClients 命中一样,我在每个 Apache 主机上定义了以下内容
<IfModule mpm_prefork_module>
StartServers 10
MinSpareServers 10
MaxSpareServers 20
MaxClients 200
MaxRequestsPerChild 300
</IfModule>
Run Code Online (Sandbox Code Playgroud)
nginx 服务器和 apache 服务器都位于同一个数据中心,在同一个子网和 vlan 上,我在 apache 服务器端的 error_log 中没有看到任何表明超时原因的内容。
我们尝试解决的其他问题包括
在这一点上,我怀疑这是网络问题还是后端问题,因为我已将 web 应用程序移回土狼点负载均衡器,并且投诉已经减少。
我真的很想弄清楚这一点,但我有点不知道从哪里开始。请给点建议?
小智 2
我在 nginx<->apache2 设置中遇到了类似的情况。由于 MySQL 陷入困境,apache 在负载下花费了太长时间。为了了解 apache 花费了多长时间,我将日志格式更改为:
\n\nLogFormat "%{X-Forwarded-For}i %l %u %t \\"%r\\" %>s %O \\"%{Referer}i\\" \\"%{User-Agent}i\\" %D\xc2\xb5SEC" timed\n
Run Code Online (Sandbox Code Playgroud)\n\nnginx 日志到:
\n\nlog_format timed_combined \'$remote_addr - $remote_user [$time_local] \'\n
Run Code Online (Sandbox Code Playgroud)\n\n然后就更容易看到,虽然 apache 正在完成所有请求,但在将数据传递回 nginx 方面已经很晚了(晚了很多秒)。
\n\n我不确定为什么 haproxy 对你的情况有帮助,除非一台 apache 服务器比其他服务器慢得多。当一台机器出现可恢复磁盘错误时,同一台机器可能会发生这种情况。错误应该显示在系统日志中。
\n 归档时间: |
|
查看次数: |
8927 次 |
最近记录: |