Monit:如何以最佳方式监控 URL

Bur*_*Leo 7 monitoring web-server monit

我的网络服务器使用 php5-fpm 运行 nginx。如果出现一些问题,通常是php5-fpm挂了,导致“bad gateway”服务器错误。当然,我永远不知道 nginx 是否会在某一天崩溃。

当发生某些事情时,这两个进程(广告它们的线程)通常都存在并且需要重新启动。我对当前问题的原因不太感兴趣,但想重新启动两个进程。为此,我创建了两个 bash 脚本 /etc/monit/webserver.start.sh 和 /etc/monit/webserver.stop.sh。

这是我的 monit 配置文件(在 conf.d 中):

check process webserver with pidfile /var/run/nginx.pid
   start program = "/etc/monit/webserver.start.sh"
   stop program  = "/etc/monit/webserver.stop.sh"
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     then alert
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     for 2 cycles
     then restart
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     for 4 cycles
     then exec "/sbin/reboot"
Run Code Online (Sandbox Code Playgroud)

这并非完全错误,但仍有一些问题:

  1. 实际上,我不想在nginx这里监视进程,而是监视端口/ URL。我可以使用任何其他支票代替check process吗?
  2. 要在 1 次失败、2 次失败和 4 次失败后执行不同的操作,我需要三个if failed条件,从而导致三个服务器请求。有没有办法在每个周期运行一个请求并在不同次数的失败后执行不同的活动?

我试图从官方 monit 参考中找到答案,但显然,我不理解该来源中描述的可能性。因此,我非常希望得到一些建议。

更新

在 monit 手册页上花了一些时间(在我看来,它的结构比在线手册要好得多)后,我发现了这个优化:

CHECK HOST webserver WITH ADDRESS 127.0.0.1
  START PROGRAM = "/etc/monit/webserver.start.sh"
  STOP PROGRAM  = "/etc/monit/webserver.stop.sh"
  IF NOT EXIST THEN ALERT
  IF FAILED (url https://www.mydomain.tld/example/ and content == 'test content' and timeout 20 seconds)
    FOR 2 CYCLES
    THEN RESTART
  IF 2 RESTARTS WITHIN 5 CYCLES
    THEN EXEC "/sbin/reboot"
Run Code Online (Sandbox Code Playgroud)

此修改不包括第一个 URL 失败时的警报(解决方法是在此处使用虚拟启动/停止命令),但可以在 2 次失败后重新启动,并且在 4 次失败后重新启动 - 只有一个服务器请求。

它仍然不完美。如果有人知道如何做得更好,建议仍然受到赞赏:) 谢谢!

更新

经过一些测试,我建议将 monit 的超时功能 ( IF 2 REsTARTS WITHIN...) 用于二阶操作。似乎在某些情况下重新启动后会重新运行超时操作。就我而言,这导致了多次重启:

[CET Dec 28 05:59:50] error    : skipping queued event /var/monit/id - unknown data format
[CET Dec 28 05:59:50] error    : skipping queued event /var/monit/state - unknown data format
[CET Dec 30 03:10:52] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:08:10] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:09:30] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:09:31] info     : 'webserver' trying to restart
[CET Jan  1 03:09:31] info     : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan  1 03:09:31] info     : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan  1 03:10:31] error    : 'webserver' failed, cannot open a connection to INET[www.myserver.com/example/] via TCPSSL
[CET Jan  1 03:10:31] info     : 'webserver' trying to restart
[CET Jan  1 03:10:31] info     : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan  1 03:10:31] info     : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan  1 03:10:31] error    : 'php-fpm' process is not running
[CET Jan  1 03:10:31] info     : 'php-fpm' trying to restart
[CET Jan  1 03:10:31] info     : 'php-fpm' start: /usr/sbin/service
[CET Jan  1 03:10:31] error    : 'nginx' process is not running
[CET Jan  1 03:10:31] info     : 'nginx' trying to restart
[CET Jan  1 03:10:31] info     : 'nginx' start: /usr/sbin/service
[CET Jan  1 03:11:32] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:11:32] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:12:24] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:12:24] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:16:24] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:16:24] info     : monit HTTP server started
[CET Jan  1 03:16:24] info     : 'Memory' Monit started
[CET Jan  1 03:16:24] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:16:24] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:16:24] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:16:24] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:17:04] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:17:04] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:21:04] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:21:04] info     : monit HTTP server started
[CET Jan  1 03:21:04] info     : 'Memory' Monit started
[CET Jan  1 03:21:04] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:21:04] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:21:04] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:21:04] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:21:44] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:21:44] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:25:44] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:25:44] info     : monit HTTP server started
[CET Jan  1 03:25:44] info     : 'Memory' Monit started
[CET Jan  1 03:25:44] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:25:44] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:25:44] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:25:44] info     : 'webserver' exec: /sbin/reboot
Run Code Online (Sandbox Code Playgroud)

除非有人有好主意,否则我将切换回多个请求。最后,它们不是那么耗时......

BurninLeo

小智 7

我不想在这里监视 nginx 进程,而是监视端口/URL。我可以使用任何其他支票代替支票流程吗?

您可以使用主机检查,这是来自 monit 站点的示例:

check host mmonit.com with address mmonit.com 
    if failed
        port 80 protocol http
        with http headers [Host: mmonit.com, Cache-Control: no-cache, Cookie: csrftoken=nj1bI3CnMCaiNv4beqo8ZaCfAQQvpgLH]
        and request /monit/ with content = "Monit [0-9.]+"
    then alert
Run Code Online (Sandbox Code Playgroud)

要在 1 次失败、2 次失败和 4 次失败后执行不同的操作,我需要三个 if 失败条件,从而导致三个服务器请求。有没有办法在每个周期运行一个请求并在不同次数的失败后执行不同的活动?

EXEC 可用于执行任意程序并发送警报。如果选择此操作,则必须说明要执行的程序,如果程序需要参数,则必须将程序及其参数括在带引号的字符串中。您可以选择指定执行的程序在启动时应切换到的 uid 和 gid。例如:

exec "/usr/local/tomcat/bin/startup.sh"
    as uid nobody and gid nobody
Run Code Online (Sandbox Code Playgroud)