ubuntu 20.04 上的 Postgres 12(复制检查点有错误的魔法 539122744 而不是 307747550)

Sid*_*don 5 postgresql 20.04

在带有 Postgres 12 服务器的 ubuntu 20.04 中,HD 出现问题后,Postgres 停止工作。下面是我的测试和解决问题的尝试:

\n
~$ psql\npsql: error: could not connect to server: No such file or directory\n        Is the server running locally and accepting\n        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?\n
Run Code Online (Sandbox Code Playgroud)\n

当我尝试 systemctl 时:

\n
$ sudo systemctl start postgresql@12-main\nJob for postgresql@12-main.service failed because the service did not take the steps required by its unit configuration.\nSee "systemctl status postgresql@12-main.service" and "journalctl -xe" for details.\n
Run Code Online (Sandbox Code Playgroud)\n

systemctl status postgresql@12-main.service 的输出:

\n
$ systemctl status postgresql@12-main.service\n\xe2\x97\x8f postgresql@12-main.service - PostgreSQL Cluster 12-main\n     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled; vendor preset: enabled)\n     Active: failed (Result: protocol) since Wed 2021-01-27 01:58:21 UTC; 1min 0s ago\n    Process: 1075 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 12-main start (code=exited, status=1/FAILURE)\n\nJan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.191 UTC [1096] LOG:  could not remove cache file "global/pg_internal.>\nJan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.191 UTC [1096] PANIC:  replication checkpoint has wrong magic 5391227>\nJan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.424 UTC [1095] LOG:  startup process (PID 1096) was terminated by sig>\nJan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.424 UTC [1095] LOG:  aborting startup due to startup process failure\nJan 27 01:58:21 znserver postgresql@12-main[1075]: 2021-01-27 01:58:21.425 UTC [1095] LOG:  database system is shut down\nJan 27 01:58:21 znserver postgresql@12-main[1075]: pg_ctl: could not start server\nJan 27 01:58:21 znserver postgresql@12-main[1075]: Examine the log output.\nJan 27 01:58:21 znserver systemd[1]: postgresql@12-main.service: Can't open PID file /run/postgresql/12-main.pid (yet?) after start: Operati>\nJan 27 01:58:21 znserver systemd[1]: postgresql@12-main.service: Failed with result 'protocol'.\nJan 27 01:58:21 znserver systemd[1]: Failed to start PostgreSQL Cluster 12-main.\nlines 1-15/15 (END)\n
Run Code Online (Sandbox Code Playgroud)\n

使用“服务命令”我有:

\n
$ sudo service postgresql start\n(base) sidon@znserver:~$ sudo service postgresql status\n\xe2\x97\x8f postgresql.service - PostgreSQL RDBMS\n     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)\n     Active: active (exited) since Wed 2021-01-27 02:05:24 UTC; 4s ago\n    Process: 1246 ExecStart=/bin/true (code=exited, status=0/SUCCESS)\n   Main PID: 1246 (code=exited, status=0/SUCCESS)\n\nJan 27 02:05:24 znserver systemd[1]: Starting PostgreSQL RDBMS...\nJan 27 02:05:24 znserver systemd[1]: Finished PostgreSQL RDBMS.\n
Run Code Online (Sandbox Code Playgroud)\n

一些帮助?

\n

Sid*_*don 7

经过数小时的研究但没有成功,我通过反复试验得到了解决方案:

简答

~$ sudo chown postgres.postgres /var/lib/postgresql/12/main/global/pg_internal.init
~$ sudo rm -rf 12/main/global/pg_internal.init
~$ sudo rm -rf /var/lib/postgresql/12/main/pg_logical/replorigin_checkpoint
~$ sudo -i -u postgres
~$ /usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
Run Code Online (Sandbox Code Playgroud)

长答案:

首先,我尝试按照以下顺序重新启动

~$ sudo -i -u postgres
~$ /usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
Run Code Online (Sandbox Code Playgroud)

我得到以下结果:

pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 16:40:12.633 UTC [3806] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 16:40:12.636 UTC [3806] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 16:40:12.636 UTC [3806] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 16:40:12.681 UTC [3806] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 16:40:12.824 UTC [3809] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 16:40:13.138 UTC [3809] LOG:  could not open directory "./global/pg_internal.init": Permission denied
2021-02-13 16:40:13.148 UTC [3809] LOG:  could not remove cache file "global/pg_internal.init": Is a directory
2021-02-13 16:40:13.148 UTC [3809] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 16:40:13.38After hours and hours of research5 UTC [3806] LOG:  startup process (PID 3809) was terminated by signal 6: Aborted
2021-02-13 16:40:13.385 UTC [3806] LOG:  aborting startup due to startup process failure
2021-02-13 16:40:13.387 UTC [3806] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
Run Code Online (Sandbox Code Playgroud)

经过简单的调查,我发现目录 /var/lib/postgresql/12/main/global/pg_internal.init 的所有者是 root。我更改所有者:

sudo chown postgres.postgres /var/lib/postgresql/12/main/global/pg_internal.init
Run Code Online (Sandbox Code Playgroud)

然后我又做了一次尝试(第一步):

sudo -i -u postgres
Run Code Online (Sandbox Code Playgroud)

结果略有不同:

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main     
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 16:53:26.132 UTC [4024] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 16:53:26.132 UTC [4024] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 16:53:26.132 UTC [4024] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 16:53:26.171 UTC [4024] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 16:53:26.314 UTC [4025] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 16:53:26.615 UTC [4025] LOG:  could not remove cache file "global/pg_internal.init": Is a directory
2021-02-13 16:53:26.615 UTC [4025] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 16:53:26.851 UTC [4024] LOG:  startup process (PID 4025) was terminated by signal 6: Aborted
2021-02-13 16:53:26.851 UTC [4024] LOG:  aborting startup due to startup process failure
2021-02-13 16:53:26.852 UTC [4024] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
Run Code Online (Sandbox Code Playgroud)

所以我决定删除文件:12/main/global/pg_internal.init

rm -rf 12/main/global/pg_internal.init
Run Code Online (Sandbox Code Playgroud)

我再次执行步骤 1

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 17:00:33.310 UTC [4072] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 17:00:33.310 UTC [4072] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 17:00:33.310 UTC [4072] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 17:00:33.348 UTC [4072] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 17:00:33.483 UTC [4073] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 17:00:33.792 UTC [4073] PANIC:  replication checkpoint has wrong magic 539122744 instead of 307747550
2021-02-13 17:00:34.030 UTC [4072] LOG:  startup process (PID 4073) was terminated by signal 6: Aborted
2021-02-13 17:00:34.030 UTC [4072] LOG:  aborting startup due to startup process failure
2021-02-13 17:00:34.031 UTC [4072] LOG:  database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
Run Code Online (Sandbox Code Playgroud)

所以,我删除了文件 /var/lib/postgresql/12/main/pg_logic/replorigin_checkpoint

sudo rm -rf /var/lib/postgresql/12/main/pg_逻辑/replorigin_checkpoint

我再次执行步骤 1

/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/12/main
pg_ctl: PID file "/var/lib/postgresql/12/main/postmaster.pid" does not exist
Is server running?
trying to start server anyway
waiting for server to start....2021-02-13 17:08:02.913 UTC [4186] LOG:  starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-02-13 17:08:02.913 UTC [4186] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-13 17:08:02.913 UTC [4186] LOG:  listening on IPv6 address "::", port 5432
2021-02-13 17:08:02.952 UTC [4186] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-13 17:08:03.103 UTC [4187] LOG:  database system was interrupted; last known up at 2021-01-07 10:56:38 UTC
2021-02-13 17:08:03.412 UTC [4187] LOG:  database system was not properly shut down; automatic recovery in progress
2021-02-13 17:08:03.481 UTC [4187] LOG:  redo starts at 0/2FFEC58
2021-02-13 17:08:03.481 UTC [4187] LOG:  invalid record length at 0/2FFEC90: wanted 24, got 0
2021-02-13 17:08:03.481 UTC [4187] LOG:  redo done at 0/2FFEC58
2021-02-13 17:08:03.683 UTC [4186] LOG:  database system is ready to accept connections
done
server started
Run Code Online (Sandbox Code Playgroud)

好的,这一切都恢复后,postgres 安装和数据!