Gon*_*uez 2 postgresql replication data-synchronization postgresql-9.4 master-slave-replication
我们有一个 PostgreSQL 9.4.9 生产服务器,它正在复制到一个从属实例,但今天我发现该实例不同步!
显而易见的操作是重新创建从属节点,为复制活动设置指标和适当的警报,因此我们可以有效地监控主节点和从属节点之间的同步状态。
但是,由于同步失败,我想首先诊断问题并尝试确定其根本原因,因为这将是大约 6 个月内第二次发生这种情况。
问题:如何诊断复制过程中失败的内容,以便这次可以以更好的方式完成?
版本说明:
PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
Run Code Online (Sandbox Code Playgroud)
从从节点,在/var/log/postgresql/postgresql-9.4-main.log我可以看到:
2017-07-18 19:43:55 UTC [12816-1] LOG: started streaming WAL from primary at 125D/68000000 on timeline 1
2017-07-18 19:43:55 UTC [12816-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000125D00000068 has already been removed
2017-07-18 19:44:00 UTC [12817-1] LOG: started streaming WAL from primary at 125D/68000000 on timeline 1
2017-07-18 19:44:00 UTC [12817-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000125D00000068 has already been removed
2017-07-18 19:44:05 UTC [12821-1] LOG: started streaming WAL from primary at 125D/68000000 on timeline 1
2017-07-18 19:44:05 UTC [12821-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000125D00000068 has already been removed
2017-07-18 19:44:10 UTC [12825-1] LOG: started streaming WAL from primary at 125D/68000000 on timeline 1
2017-07-18 19:44:10 UTC [12825-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000125D00000068 has already been removed
2017-07-18 19:44:15 UTC [12826-1] LOG: started streaming WAL from primary at 125D/68000000 on timeline 1
2017-07-18 19:44:15 UTC [12826-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000125D00000068 has already been removed
Run Code Online (Sandbox Code Playgroud)
新问题:我如何才能看到实际问题出现的位置?
大师postgresql.conf:https : //pastebin.com/NJX5ku6m
奴隶postgresql.conf:https : //pastebin.com/CUZcyazC
奴隶recovery.conf:
standby_mode = on
primary_conninfo = 'host=10.1.1.65 port=5432 user=replicador password=replicador'
Run Code Online (Sandbox Code Playgroud)
基于此,我会说您wal_keep_segments在主服务器上没有足够的资源,没有使用复制槽,并且hot_standby_feedback连接断开或连接断开的时间足够长,以便主服务器删除所需的 WAL。
而且您可能没有使用 WAL 归档(archive_command在主服务器上,restore_command在副本上)作为后备。
因此,主删除事务记录所需的备用。
您需要重新创建备用数据库。然后:
将备用数据库设置为使用复制槽并启用hot_standby_feedback;或者
启用archive_command和restore_command
| 归档时间: |
|
| 查看次数: |
2406 次 |
| 最近记录: |