为什么 systemd 在重启时挂起?

Mar*_*tin 13 linux systemd

10 次中有 1 次,systemd 在重启期间挂起。我不明白原因。我应该看什么/在哪里解决问题?我正在使用 systemd v196 并且无法将其升级到版本 >=198,因为后者需要最新的内核(支持 cgroups),无法根据客户要求进行更新。我想知道是否有合理的方法可以发现这种行为的原因并让systemd无条件重启系统。

请注意,此链接没有帮助:http : //freedesktop.org/wiki/Software/systemd/Debugging/#index2h1

正如你可以在那里读到的:

关机永不结束

如果正常重启或关机等待几分钟后仍然无法完成,则上述创建关机日志的方法无济于事,必须使用其他方法获取日志。对调试启动问题有用的两个选项也可用于关闭问题:

use a serial console
use a debug shell - not only is it available from early boot, it also stays active until late shutdown.
Run Code Online (Sandbox Code Playgroud)

我正在使用串行控制台,出于某种原因,我什至可以登录,因为 eth 接口已启动或已启动(在重新启动步骤期间断开连接后)。

我看不出原因。

# cat /etc/systemd/system/
basic.target.wants/                          getty.target.wants/                          multi-user.target.wants/                     sysinit.target.wants/                        
dbus-org.freedesktop.NetworkManager.service  local-fs-pre.target.wants/                   sockets.target.wants/                        syslog.service                               
display-manager.service                      local-fs.target.wants/                       swap.target
Run Code Online (Sandbox Code Playgroud)

注意 swap.target 。它在那里,但我们根本不使用交换分区。我试图屏蔽交换,但挂起问题仍然存在。控制台的最后一行是:

[OK] Stopped target shutdown.
Run Code Online (Sandbox Code Playgroud)

编辑:正如我所说,我可以通过 ssh 通过 eth 重新登录。

现在我将向您展示两个日志。第一个日志发生在重启/shutdwon 挂起时,而第二个日志发生在重启成功时:

挂起案例,输出总是这样(完整日志):

[  OK  ] Stopped Network Time Service (one-shot ntpdate mode).
         Stopping Modem and VPN connections autoconnect...
         Stopping Login Service...
         Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[  OK  ] Stopped Monitoring free system resources.
[  OK  ] Stopped Monitoring dropbear socket.
[  OK  ] Stopped Login Service.
[  OK  ] Stopped Modem and VPN c[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Serial Getty on ttyO0.
[  OK  ] Unmounted /var/lib/opkg.
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped Suspend manager.
         Stopping X Server...
[  OK  ] Stopped X Server.
         Stopping System Logging Service...
[  OK  ] Stopped System Logging Service.
[   77.580000] g_ether gadget: using random self ethernet address
[   77.580000] g_ether gadget: using random host ethernet address
[   77.590000] usb0: MAC 6e:0d:de:b0:33:4f
[   77.590000] usb0: HOST MAC 62:7a:81:02:f3:ff
[   77.600000] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[   77.600000] g_ether gadget: g_ether ready
[   77.610000] musb-hdrc musb-hdrc.0: MUSB HDRC host driver
[   77.610000] musb-hdrc musb-hdrc.0: new USB bus registered, assigned bus number 2
[   77.620000] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   77.630000] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   77.640000] usb usb2: Product: MUSB HDRC host driver
[   77.640000] usb usb2: Manufacturer: Linux 2.6.37 musb-hcd
[   77.650000] usb usb2: SerialNumber: musb-hdrc.0
[   77.650000] hub 2-0:1.0: USB hub found
[   77.660000] hub 2-0:1.0: 1 port detected
[   77.690000] ADDRCONF(NETDEV_UP): usb0: link is not ready
[  OK  ] Stopped target Reboot.
[  OK  ] Stopped Reboot.
[  OK  ] Stopped target Unmount All Filesystems.
[  OK  ] Stopped target Shutdown.
[   78.330000] <46>systemd-journald[328]: Received SIGUSR1
<hang>
Run Code Online (Sandbox Code Playgroud)

正常重启:

         Unmounting /var/lib/opkg...
[  OK  ] Stopped target Network.
         Stopping SSH Per-Connection Server...
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Multi-User.
         Stopping Monitoring free system resources...
         Stopping Monitoring dropbear socket...
         Stopping Network Time Service (one-shot ntpdate mode)...
[  OK  ] Stopped Network Time Service (one-shot ntpdate mode).
         Stopping Modem and VPN connections autoconnect...
         Stopping Login Service...
         Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[  OK  ] Stopped Monitoring free system resources.
[  OK  ] Stopped Monitoring dropbear socket.
[  OK  ] Stopped Login Service.
[  OK  ] Unmounted /var/lib/opkg.
         Stopping Network Manager...
[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped Serial Getty on ttyO0.
[  OK  ] Stopped Suspend manager.
[  OK  ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
         Stopping D-Bus System Message Bus...
         Stopping X Server...
         Stopping Permit User Sessions...
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped X Server.
[  OK  ] Stopped D-Bus System Message Bus.
         Stopping System Logging Service...
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed dropbear.socket.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped target System Initialization.
         Stopping Import configuration from SD card...
[  OK  ] Stopped Import configuration from SD card.
         Stopping Load Kernel Modules...
         Stopping Apply Kernel Variables...
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped target Local File Systems.
         Unmounting /var...
         Unmounting /tmp...
[  OK  ] Closed Syslog Socket.
[  OK  ] Failed unmounting /var.
[  OK  ] Unmounted /tmp.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped target Local File Systems (Pre).
         Stopping Remount Root and Kernel File Systems...
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target Shutdown.
[   52.340000] omap_wdt: Unexpected close, not stopping!
Sending SIGTERM to remaining processes...
[   52.490000] <46>systemd-journald[335]: Received SIGTERM
Sending SIGKILL to remaining processes...
Unmounting file systems.
Unmounting /sys/fs/fuse/connections.
Unmounting /var.
All filesystems unmounted.
Deactivating swaps.
All swaps deactivated.
Run Code Online (Sandbox Code Playgroud)

更新:

经过一番调查和调试,我发现了关机中断的原因,虽然我仍然无法解决。发生的情况是,由于某些原因,在关闭完成之前启动了一个自定义服务,这使得关闭过程挂起。这是挂起的一种情况。另一种挂起是当关机没有中断但在某个时刻停止时。为此,在解决所有冲突和其他可能的一次一个挂起之前,我想无条件激活硬件看门狗。为了通过 systemd 做到这一点,我单独或一起启用并测试了 RuntimeWatchdogSec 和 ShutdownWatchdogSec。不幸的是,他们没有帮助。通过查看源代码,

我被困住了。我要问你的是找到一种方法: 1.至少从关闭开始的点开始无条件启用看门狗2. 以简单的方式检测并解决所有冲突

优选第一种解决方案。

Mar*_*iae 5

我冒险提出一个解决方案:尝试添加

  Before=basic.target
Run Code Online (Sandbox Code Playgroud)

到 /usr/lib/systemd/system/dbus.service。

我对你的日志中的一个奇怪现象感到震惊,这让我想起了一段时间前我在 Arch Linux 论坛上读到的一个事故:这个系统会在重启时挂起。上面提供了解决方案,理由是挂起是由某些服务在停止后尝试与 d-bus 通信引起的:

因此,通过在 basic.target 之前对其进行排序,它不仅在达到基本目标之前启动,而且还确保它一直存在,直到在关闭期间 basic.target 被关闭之后。

在您的不健康日志中,我们实际上看到基本系统并未停止,而在健康日志中已正确停止。

如果这不起作用,并且考虑到您无法升级,您是否考虑过降级?


use*_*686 3

shutdown.target默认情况下与所有其他单元冲突,以便在关闭过程开始时自动停止它们。反之亦然——如果另一个单元启动,它就会shutdown.target停止。所以问题是某些东西导致某些东西在关闭期间启动,这会覆盖关闭过程。

这个问题应该在 systemd v198 中得到修复,这使得关闭工作变得“不可替代”。