sda*_*aau 4 linux-kernel reboot
我有一台装有 Ubuntu 10.04 的小型服务器;我通过从另一台计算机操作此服务器ssh
,并尝试使用nfs
它来共享文件。这主要是有效的,直到其中一个客户端卸载并且我想nfs-kernel-server
在服务器上关闭。虽然停止似乎是正确的:
$ sudo service nfs-kernel-server stop
* Stopping NFS kernel daemon [ OK ]
* Unexporting directories for NFS kernel daemon... [ OK ]
Run Code Online (Sandbox Code Playgroud)
...我确实在日志中得到了这样的信息:
Feb 5 11:50:17 user init: statd main process (3806) killed by KILL signal
Feb 5 11:50:17 user init: statd main process ended, respawning
Feb 5 11:50:17 user init: idmapd main process (3808) killed by KILL signal
Feb 5 11:50:17 user init: idmapd main process ended, respawning
Feb 5 11:50:17 user statd-pre-start: local-filesystems started
Feb 5 11:50:17 user sm-notify[3815]: Already notifying clients; Exiting!
Feb 5 11:50:17 user rpc.statd[3830]: Version 1.1.6 Starting
Feb 5 11:50:17 user rpc.statd[3830]: Flags:
Run Code Online (Sandbox Code Playgroud)
...意味着一些与 nfs 相关的进程不在乎我说停止,然后再次重生。如果此时我尝试执行sudo service nfs-kernel-server start
(再次通过 ssh),该命令会冻结,然后/var/log/syslog
我得到以下信息:
Feb 5 11:43:55 user mountd[2045]: authenticated mount request from 192.168.0.2:1005 for /media/disk (/media/disk)
Feb 5 11:45:19 user mountd[2045]: Caught signal 15, un-registering and exiting.
Feb 5 11:45:19 user kernel: [27428.148368] nfsd: last server has exited, flushing export cache
Feb 5 11:45:19 user kernel: [27428.148431] BUG: Dentry d0bc8b28{i=1f6,n=} still in use (1) [unmount of vfat sdd8]
Feb 5 11:45:19 user kernel: [27428.148473] ------------[ cut here ]------------
Feb 5 11:45:19 user kernel: [27428.148481] kernel BUG at /build/buildd/linux-2.6.32/fs/dcache.c:670!
Feb 5 11:45:19 user kernel: [27428.148491] invalid opcode: 0000 [#1] SMP
Feb 5 11:45:19 user kernel: [27428.148501] last sysfs file: /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
...
Feb 5 11:45:19 user kernel: [27428.148807] Call Trace:
Feb 5 11:45:19 user kernel: [27428.148824] [<c024c780>] ? vfs_quota_off+0x0/0x20
Feb 5 11:45:19 user kernel: [27428.148838] [<c021d4fc>] ? shrink_dcache_for_umount+0x3c/0x50
Feb 5 11:45:19 user kernel: [27428.148852] [<c020d090>] ? generic_shutdown_super+0x20/0xe0
...
Feb 5 11:45:19 user kernel: [27428.149511] EIP: [<c021d4a9>] shrink_dcache_for_umount_subtree+0x249/0x260 SS:ESP 0068:ccc6de6c
Feb 5 11:45:19 user kernel: [27428.149631] ---[ end trace 6198103bb62887ac ]---
Feb 5 11:49:53 user init: idmapd main process (838) killed by TERM signal
Feb 5 11:49:53 user init: idmapd main process ended, respawning
Feb 5 11:49:53 user rpc.statd[769]: Caught signal 15, un-registering and exiting.
Feb 5 11:49:53 user init: statd main process ended, respawning
Feb 5 11:49:53 user statd-pre-start: local-filesystems started
Feb 5 11:49:53 user sm-notify[3790]: Already notifying clients; Exiting!
Feb 5 11:49:53 user rpc.statd[3806]: Version 1.1.6 Starting
Feb 5 11:49:53 user rpc.statd[3806]: Flags:
...
Run Code Online (Sandbox Code Playgroud)
现在,事情是这样的 - 在这个错误发生后,服务器的ssh
服务器(出于某种原因)通常仍然“活跃”,所以我可以ssh
再次登录,并尝试关闭进程(并意识到不可能杀死/usr/sbin/rpc.nfsd 8
,这是挂着的)。
但是 - 如果此时我尝试通过sudo shutdown -r now && exit
ssh发出重新启动,那么该服务器 PC 将开始重新启动过程 - 但不会完成它;它会掉到一个终端,转储一些错误信息,然后留在那里:(
问题是 - 服务器 PC 位于一个非常难以访问的位置,并且必须去那里执行 Alt+SysRq + REISUB 才能正确重新启动(如果内核对该键组合做出反应;否则就是硬断电)真的很困难。
所以我的问题是 - Linux 中是否有一些“硬核重启”命令,即使它遇到了内核错误,它也会“保证”PC 将重启(而不仅仅是挂起/冻结),而我可以通过ssh
? 什么东西相当于硬断电(即通过例如按住电源按钮 10 秒以上来关闭电源)和硬通电?
wur*_*tel 11
为了确保系统无论如何都会重新启动,我总是这样做:
# echo s > /proc/sysrq-trigger
# echo u > /proc/sysrq-trigger
# echo s > /proc/sysrq-trigger
# echo b > /proc/sysrq-trigger
Run Code Online (Sandbox Code Playgroud)
这要求内核执行以下操作:
o
用于关机。有关此功能的说明,请参见例如此处。
您必须绕过卸载文件系统、停止守护进程等的正常关闭过程。这就是它停止的地方 - 它无法安全地停止进程。您需要的是reboot -f
或者 poweroff -f
您想要实现的任何目标(systemd
例如,某些初始化系统可能会带来自己的命令)。“强制”功能会跳过常规关闭过程,直接进行硬件重启。