Postgres 卡在恢复模式

Fli*_*mzy 13 postgresql recovery

我有一个处于恢复模式的 PostgreSQL 独立实例。一直在说

2014-03-24 18:45:57 MDT FATAL:  the database system is starting up
Run Code Online (Sandbox Code Playgroud)

几个小时,并ps显示:

postgres  2637  0.1  0.1 116916  4420 ?        Ds   15:43   0:18 postgres: startup process   recovering 00000001000000040000007E
Run Code Online (Sandbox Code Playgroud)

有没有办法观察恢复的进度,最好是使用 ETA?我怎样才能让这个过程“解除卡住”?

我可以使用标准的启动/停止脚本停止 postgres,但是当我再次启动它时,它仍然停留在恢复模式。

Debian 7.4、Linux 内核 3.2.0、PostgreSQL 9.1.12


的输出strace -p 2637

Process 2637 attached - interrupt to quit
close(154)                              = 0
getppid()                               = 2600
open("pg_clog/0003", O_RDWR|O_CREAT, 0600) = 154
lseek(154, 221184, SEEK_SET)            = 221184
write(154, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
fsync(154)                              = 0
Run Code Online (Sandbox Code Playgroud)

上面的输出似乎无限重复。


一个 gdb 回溯(随后的三个 bts 几乎完全相同):

(gdb) bt
#0  0x00007f0c8f917d70 in fsync () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f0c9169f255 in pg_fsync_no_writethrough (fd=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:286
#2  0x00007f0c9169f265 in pg_fsync (fd=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:274
#3  0x00007f0c9151917b in SlruPhysicalWritePage (ctl=ctl@entry=0x7f0c91b856c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1- 9.1.12/build/../src/backend/access/transam/slru.c:801
#4  0x00007f0c91519925 in SlruInternalWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:528
#5  0x00007f0c9151a1c7 in SimpleLruWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:565
#6  0x00007f0c915018e3 in clog_redo (lsn=..., record=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/clog.c:720
#7  0x00007f0c91515f08 in StartupXLOG () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:6664
#8  0x00007f0c91518537 in StartupProcessMain () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:10179
#9  0x00007f0c9152f1ba in AuxiliaryProcessMain (argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7fff651b0610) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/bootstrap/bootstrap.c:409
#10 0x00007f0c9167ddeb in StartChildProcess (type=StartupProcess) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:4512
#11 0x00007f0c916806ac in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x7f0c91d8e180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:1118
#12 0x00007f0c914d392f in main (argc=5, argv=0x7f0c91d8e180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/main/main.c:209
Run Code Online (Sandbox Code Playgroud)

如果我设置:

fsync = off
Run Code Online (Sandbox Code Playgroud)

然后再次运行 bt:

(gdb) bt
#0  0x00007f3d99ab7160 in write () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f3d9b6be0c6 in SlruPhysicalWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:783
#2  0x00007f3d9b6be925 in SlruInternalWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:528
#3  0x00007f3d9b6bf1c7 in SimpleLruWritePage (ctl=ctl@entry=0x7f3d9bd2a6c0, slotno=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/slru.c:565
#4  0x00007f3d9b6a68e3 in clog_redo (lsn=..., record=<optimized out>) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/clog.c:720
#5  0x00007f3d9b6baf08 in StartupXLOG () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:6664
#6  0x00007f3d9b6bd537 in StartupProcessMain () at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/transam/xlog.c:10179
#7  0x00007f3d9b6d41ba in AuxiliaryProcessMain (argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7fff99d477a0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/bootstrap/bootstrap.c:409
#8  0x00007f3d9b822deb in StartChildProcess (type=StartupProcess) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:4512
#9  0x00007f3d9b8256ac in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x7f3d9d614180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/postmaster/postmaster.c:1118
#10 0x00007f3d9b67892f in main (argc=5, argv=0x7f3d9d614180) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/main/main.c:209
Run Code Online (Sandbox Code Playgroud)

Fli*_*mzy 11

在谷歌搜索了几个小时后,我偶然发现了一个与我的问题并不真正相关的线程,我能说的最好,但它似乎无害,可以尝试,瞧!

我看了看/var/lib/postgresql/9.1/main/pg_clog,看到:

drwx------  2 postgres postgres   4096 Mar 15 15:20 .
drwx------ 13 postgres postgres   4096 Mar 25 12:15 ..
-rw-------  1 postgres postgres 262144 Feb  4 19:39 0000
-rw-------  1 postgres postgres 262144 Feb 13 11:10 0001
-rw-------  1 postgres postgres 262144 Mar 15 15:20 0002
-rw-------  1 postgres postgres 229376 Mar 25 14:51 0003
Run Code Online (Sandbox Code Playgroud)

注意到/var/lib/postgresql/9.1/main/pg_clog/0003正在strace输出中打开/查找文件,并且这是论坛帖子中的相关文件,我在论坛中尝试了建议的操作:

dd if=/dev/zero bs=8k count=1 >> /var/lib/postgresql/9.1/main/pg_clog/0003
Run Code Online (Sandbox Code Playgroud)

现在 postgres 又开始了。