带有 systemd 的 Ubuntu 服务器 - 如何获得回溯或核心转储?

cda*_*hms 5 debugging systemd ubuntu-18.04

我正在使用带有 systemd 的 Ubuntu 18.04 服务器。最近我部门开发的一个程序一天内崩溃了两次,错误如下:

Jun 07 06:33:07 xxx systemd[1]: xxx.service: Main process exited, code=killed, status=11/SEGV
Jun 07 06:33:07 xxx systemd[1]: xxx.service: Failed with result 'signal'.
Run Code Online (Sandbox Code Playgroud)

我认为下一步是获取回溯或核心转储,但是我不确定如何在带有 systemd 的 Ubuntu 服务器上执行此操作。

我不知道我是否应该追求使用systemd-coredumpcoredumpctl或者一些其他的工具。

另外,我不确定要发出什么命令。对于上述实用程序,有大量关于各种功能等的文档,但我找不到以下方面的简明示例:

sudo apt-get install xyz

(run x, y, z commands to get core dump)
Run Code Online (Sandbox Code Playgroud)

任何人都可以提供一个简洁的示例或教程网站来很好地解释这一点吗?我不需要或不想使用各种精心设计的功能,我只是想获得一个基本的核心转储。

Joh*_*ald 9

以一个相对简单的服务为例,chrony NTP 守护进程。

使用 dbgsym 包安装调试符号。 不幸的是,默认情况下 ddebs 存储库不在源文件中。也没有很好的脚本来查找包,所以首先将 -dbgsym 附加到包名。

sudo apt install chrony-dbgsym
Run Code Online (Sandbox Code Playgroud)

您可能需要考虑如何处理现代 Linux 服务器上的核心转储 ,在他们的情况下,他们正在考虑回到仅核心转储文件。就个人而言,我在服务器上没有得到任何有用的东西,但发现 coredumpctl 很有用。因此,Ubuntu 18.04 上的 systemd 方法:

sudo systemctl stop apport
sudo systemctl mask --now apport
sudo apt install systemd-coredump
# Verify this changed the core pattern to a pipe to systemd-coredump
sysctl kernel.core_pattern

# Reproduce the crash.
sudo killall -s SIGSEGV chronyd

# List collected dumps.
coredumpctl

# Invoke debugger on the latest one.
sudo coredumpctl gdb
# systemd >= 239  the gdb verb was renamed debug. Also, select core by PID.
sudo coredumpctl debug 5809

# In GDB, the basic thing to get is a stack trace. Ask the developer what else they want.
(gdb) thread apply all bt
Run Code Online (Sandbox Code Playgroud)

启动调试器会话可能如下所示:

John@coredump:~$ coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Sat 2019-06-08 12:55:16 UTC    5809   111   115  11 error     /usr/sbin/chronyd
John@coredump:~$ sudo coredumpctl gdb
           PID: 5809 (chronyd)
           UID: 111 (_chrony)
           GID: 115 (_chrony)
        Signal: 11 (SEGV)
     Timestamp: Sat 2019-06-08 12:55:16 UTC (1h 19min ago)
  Command Line: /usr/sbin/chronyd
    Executable: /usr/sbin/chronyd
 Control Group: /system.slice/chrony.service
          Unit: chrony.service
         Slice: system.slice
       Boot ID: c9a0a69a73d245c1ae5dfe7d491ead0a
    Machine ID: d2934a6e67f81ae0097be31003da0b31
      Hostname: coredump
       Storage: /var/lib/systemd/coredump/core.chronyd.111.c9a0a69a73d245c1ae5dfe7d491ead0a.5809.1559998516000000.lz4
       Message: Process 5809 (chronyd) of user 111 dumped core.

                Stack trace of thread 5809:
                #0  0x00007eff1ce1403f __GI___select (libc.so.6)
                #1  0x00005597867eb3be n/a (chronyd)
                #2  0x00005597867e1071 n/a (chronyd)
                #3  0x00007eff1cd1eb97 __libc_start_main (libc.so.6)
                #4  0x00005597867e127a n/a (chronyd)

GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/chronyd...Reading symbols from /usr/lib/debug/.build-id/89/dcd398c87777f4c869bfd0831215eeb8b6c7fe.debug...done.
done.
[New LWP 5809]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/chronyd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007eff1ce1403f in __GI___select (nfds=4, readfds=readfds@entry=0x7ffc0fd73c80,
    writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc0fd73be0)
    at ../sysdeps/unix/sysv/linux/select.c:41
41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) thread apply all bt

Thread 1 (Thread 0x7eff1df14740 (LWP 5809)):
#0  0x00007eff1ce1403f in __GI___select (nfds=4, readfds=readfds@entry=0x7ffc0fd73c80,
    writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc0fd73be0)
    at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00005597867eb3be in SCH_MainLoop () at sched.c:747
#2  0x00005597867e1071 in main (argc=<optimized out>, argv=0x7ffc0fd73fb8) at main.c:605
Run Code Online (Sandbox Code Playgroud)

在这个人为的例子中,在 select() 中捕获了它,因为我粗鲁地向等待 I/O 的任务发送了一个信号。

更复杂的软件可能缺少其他符号,安装这些符号和源代码并继续调试。