池 www 在信号 15 (SIGTERM) 上退出:执行超时

rah*_*123 6 nginx timeout php-fpm

我在 1GB 单核 VPS 上运行带有 Nginx 1.5.1 和 PHP-FPM (PHP 5.3.26) 的 Drupal 6,在 SSD 存储上有 3GB 的交换空间。我刚刚从共享主机切换到这个非托管 VPS,因为我的网站变得太重了,所以我仍在学习绳索。我有中等高的流量,我并没有真正密切监控它,但 Google Adsense 通常每天记录接近 30K 的页面浏览量。我通常有 50 到 80 个经过身份验证的用户登录,还有几百个匿名用户在任何给定时刻访问 Boost 静态 HTML 缓存。我最多配置了 10 个 PHP-FPM 子进程。我正在使用“按需”PHP-FPM 流程管理器。

我偶尔会遇到一个很难调试的错误,因为它似乎是随机的。在用户有意发布的大约 30 个帖子中,其中 1 个被复制。我是这样配置的,第一次点击后post按钮就被禁用了,所以不是双击的结果。事实上,当我发帖时,它甚至发生在我身上。双柱会在几秒钟内发生。我检查了日志文件,重复的帖子似乎总是对应于 nginx POST 错误:recv() failed (104: Connection reset by peer) while reading response header from upstream。而这个事件似乎与 index.php 的执行超时错误和 PHP-FPM 工作进程的后续 SIGTERM 的 PHP-FPM 错误有关。

以下是nginx访问和错误日​​志以及 PHP-FPM 错误日志:

nginx_acess.log 文件:

1.2.3.4 - - [02/Jul/2013:12:34:34 -0500] "POST /comment/reply/22802/420734?quote=1 HTTP/1.1" 302 5 "http://example.com/comment/reply/22802/420734?quote=1" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:34 -0500] "GET /node/22802 HTTP/1.1" 200 18775 "http://example.com/comment/reply/22802/420734?quote=1" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:35 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21333 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:35 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21121 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:35 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21122 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:43 -0500] "GET /comment/delete/420748 HTTP/1.1" 200 5262 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:44 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=20342 HTTP/1.1" 200 707 "http://example.com/comment/delete/420748" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:44 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21333 HTTP/1.1" 200 707 "http://example.com/comment/delete/420748" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:44 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21121 HTTP/1.1" 200 707 "http://example.com/comment/delete/420748" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:45 -0500] "POST /comment/delete/420748 HTTP/1.1" 302 5 "http://example.com/comment/delete/420748" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:46 -0500] "GET /node/22802 HTTP/1.1" 200 18533 "http://example.com/comment/delete/420748" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:47 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21406 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:47 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=21121 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
1.2.3.4 - - [02/Jul/2013:12:34:47 -0500] "GET /sites/all/modules/ad/serve.php?o=image&a=20343 HTTP/1.1" 200 707 "http://example.com/node/22802" "Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0"
Run Code Online (Sandbox Code Playgroud)

nginx_error.log 文件:

2013/07/02 11:12:52 [error] 1821#0: *2140 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22802"
2013/07/02 11:16:23 [error] 1821#0: *3020 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22802"
2013/07/02 11:18:13 [error] 1821#0: *3375 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /node/22763 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22763"
2013/07/02 11:18:43 [error] 1821#0: *3576 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /comment/edit/420694 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22763"
2013/07/02 11:19:33 [error] 1821#0: *3576 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /node/22763 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22763/edit"
2013/07/02 11:22:33 [error] 1821#0: *4397 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /forum HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/"
2013/07/02 11:29:23 [error] 1821#0: *5811 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /node/22470 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/"
2013/07/02 11:34:43 [error] 1821#0: *6794 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /recent-posts HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/forum"
2013/07/02 11:41:33 [error] 1821#0: *8082 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /sites/all/modules/ad/serve.php?o=image&a=20343 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/node/22802"
2013/07/02 11:50:03 [error] 1821#0: *9435 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /forum HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/"
2013/07/02 11:55:21 [error] 1821#0: *10378 open() "/var/www/drupal6/sites/all/modules/smileys/packs/Roving/no-swear.png" failed (2: No such file or directory), client: 1.2.3.4, server: example.com, request: "GET /sites/all/modules/smileys/packs/Roving/no-swear.png HTTP/1.1", host: "example.com", referrer: "http://example.com/node/22802/edit"
2013/07/02 12:02:33 [error] 1821#0: *11677 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /user/5170/track/navigation HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/user/5170"
2013/07/02 12:03:03 [error] 1821#0: *11736 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET /node/15888 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/user/5170/track/navigation"
2013/07/02 12:15:23 [error] 1821#0: *13882 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/admin/reports/access/44258972"
2013/07/02 12:34:33 [error] 1821#0: *17088 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "POST /comment/reply/22802/420734?quote=1 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9532", host: "example.com", referrer: "http://example.com/comment/reply/22802/420734?quote=1"
Run Code Online (Sandbox Code Playgroud)

php-fpm_error.log 文件:

[02-Jul-2013 12:34:13] WARNING: [pool www] child 5768, script '/var/www/drupal6/index.php' (request: "GET /index.php") execution timed out (39.990074 sec), terminating
[02-Jul-2013 12:34:13] WARNING: [pool www] child 5767, script '/var/www/drupal6/index.php' (request: "GET /index.php") execution timed out (40.002037 sec), terminating
[02-Jul-2013 12:34:13] WARNING: [pool www] child 5767 exited on signal 15 (SIGTERM) after 50.005181 seconds from start
[02-Jul-2013 12:34:13] NOTICE: [pool www] child 5796 started
[02-Jul-2013 12:34:13] WARNING: [pool www] child 5768 exited on signal 15 (SIGTERM) after 40.019244 seconds from start
[02-Jul-2013 12:34:13] NOTICE: [pool www] child 5797 started
[02-Jul-2013 12:34:33] WARNING: [pool www] child 5769, script '/var/www/drupal6/index.php' (request: "POST /index.php") execution timed out (59.990557 sec), terminating
[02-Jul-2013 12:34:33] WARNING: [pool www] child 5769 exited on signal 15 (SIGTERM) after 60.014359 seconds from start
[02-Jul-2013 12:34:33] NOTICE: [pool www] child 5801 started
Run Code Online (Sandbox Code Playgroud)

它们被修剪并稍微混淆以显示我今天在 12:34 发生的重复发布错误,IP 地址为“1.2.3.4”。发生欺骗的节点是 22802。

当我在以前的共享虚拟主机上运行 Apache/FastCGI 时,没有出现这个问题。我还应该提到我使用 Redis 进行缓存和 Zend Optimizer+ opcache。但是我尝试禁用这两种机制以避免欺骗错误,但没有任何区别。

感谢您的任何帮助,您可以提供!

rah*_*123 8

好吧,我要回答我自己的问题。问题显然与request_terminate_timeout = 30s我使用的值有关,可能与ondemandFPM 流程管理器相结合。双重帖子总是与 PHP-FPM 超时错误相结合,并在之后立即杀死子进程。所以我禁用了request_terminate_timeout, 无论如何它似乎是多余的,因为 php.ini 文件已经指定了 30 秒超时。我也意识到我真的不需要ondemand进程管理器,因为我是这个盒子上唯一一个负载相当稳定的用户,所以我切换到static并将其设置为pm.max_requests相当低的 100。这可以防止内存泄漏。

这些变化中的一个或两个有效地消除了重复的帖子。

  • 请注意,PHP 的超时仅包括 CPU 时间,而 FPM 测量挂钟时间。我已经提交了 PHP 8.1 的 PHP RFC 以便更改此内容:https://wiki.php.net/rfc/max_execution_wall_time (2认同)