主管未能重启一半的时间

Pau*_* K. 21 debian uwsgi supervisord systemd

我正在尝试在运行Debian 8.1的机器上使用Uwsgi和supervisor部署Django应用程序.

当我重新启动时,sudo systemctl restart supervisor它无法重启一半的时间.

$ root@host:/# systemctl start supervisor
    Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.
$ root@host:/# systemctl status supervisor.service
    ? supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: failed (Result: exit-code) since Wed 2015-09-23 11:12:01 UTC; 16s ago
      Process: 21505 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 21511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)
    Sep 23 11:12:01 host supervisor[21511]: Starting supervisor:
    Sep 23 11:12:01 host systemd[1]: supervisor.service: control process exited, code=exited status=1
    Sep 23 11:12:01 host systemd[1]: Failed to start LSB: Start/stop supervisor.
    Sep 23 11:12:01 host systemd[1]: Unit supervisor.service entered failed state.
Run Code Online (Sandbox Code Playgroud)

但是主管或uwsgi日志中没有任何内容.Supervisor 3.0正在使用uwsgi的配置运行:

[program:uwsgi]
stopsignal=QUIT
command = uwsgi --ini uwsgi.ini
directory = /dir/
environment=ENVIRONMENT=STAGING
logfile-maxbytes = 300MB
Run Code Online (Sandbox Code Playgroud)

stopsignal = QUIT已被添加,因为UWSGI在停止时忽略默认信号(SIGTERM)并且在SIGKILL离开孤儿工作人员时遭到残酷杀害.

有没有办法可以调查发生了什么?

编辑:

试图为mnencia建议: /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start 但它仍然失败了一半的时间.

 root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start
    [ ok ] Stopping supervisor (via systemctl): supervisor.service.
    ? supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: inactive (dead) since Tue 2015-11-24 13:04:32 UTC; 89ms ago
      Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 23349 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)

    Nov 24 13:04:30 xxx supervisor[23349]: Starting supervisor: supervisord.
    Nov 24 13:04:30 xxx systemd[1]: Started LSB: Start/stop supervisor.
    Nov 24 13:04:32 xxx systemd[1]: Stopping LSB: Start/stop supervisor...
    Nov 24 13:04:32 xxx supervisor[23490]: Stopping supervisor: supervisord.
    Nov 24 13:04:32 xxx systemd[1]: Stopped LSB: Start/stop supervisor.
    [....] Starting supervisor (via systemctl): supervisor.serviceJob for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.
     failed!
    root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start
    [ ok ] Stopping supervisor (via systemctl): supervisor.service.
    ? supervisor.service - LSB: Start/stop supervisor
       Loaded: loaded (/etc/init.d/supervisor)
       Active: failed (Result: exit-code) since Tue 2015-11-24 13:04:32 UTC; 1s ago
      Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
      Process: 23526 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)

Nov 24 13:04:32 xxx systemd[1]: supervisor.service: control process exited, code=exited status=1
Nov 24 13:04:32 xxx systemd[1]: Failed to start LSB: Start/stop supervisor.
Nov 24 13:04:32 xxx systemd[1]: Unit supervisor.service entered failed state.
Nov 24 13:04:32 xxx supervisor[23526]: Starting supervisor:
Nov 24 13:04:33 xxx systemd[1]: Stopped LSB: Start/stop supervisor.
[ ok ] Starting supervisor (via systemctl): supervisor.service.
Run Code Online (Sandbox Code Playgroud)

mne*_*cia 20

这不一定是主管的错误.我从systemctl status输出中看到supervisor是通过sysv-init兼容层启动的,因此失败可能在/etc/init.d/supervisor脚本中.它可以解释supervisord日志中没有错误.

要调试init脚本,最简单的方法是set -x在该文件中添加第一个非注释指令,并在journalctl输出中查看脚本执行的跟踪.

编辑:

我在Debian Sid的测试系统上复制并调试了它.

问题是超级用户init-script 的停止目标不检查守护进程是否已真正终止,而是仅在进程存在时发送信号.如果守护程序进程需要一段时间才能关闭,则后续启动操作将因为正在运行的死亡守护程序进程而失败.

我在Debian Bug Tracker上打开了一个错误:http://bugs.debian.org/805920

解决方法:

您可以使用以下方法解决问题:

/etc/init.d/supervisor force-stop && \
/etc/init.d/supervisor stop && \
/etc/init.d/supervisor start
Run Code Online (Sandbox Code Playgroud)
  • force-stop 将确保supervisord已被终止(在systemd之外).
  • stop 确保systemd知道它已被终止
  • start 再次启动它

stopforce-stop是必需的,否则systemd将忽略任何后续start请求.stop并且start可以结合使用restart,但在这里我已经把它们都用来展示它是如何工作的.