MySQL 复制滞后行为不稳定

Question

MySQL 复制滞后行为不稳定

shl*_*oid 4 mysql replication

我有一个普通的 MySQL 复制，使用“混合”模式。

mysql slave 延迟计算的行为非常奇怪 - 一分钟是0，之后是3630秒（或类似的数字），然后回到0，依此类推。显然，复制配置有问题，因为 MySQL 根据中继日志中的时间戳计算延迟。

我试过检查服务器的时间，这是相同的（SELECT NOW()在 MySQL 中使用）。我还检查了SELECT @@system_time_zone主从上设置为 CDT的时区 ( )。

我还可以验证什么以确保此问题得到解决？有没有其他人遇到过这个问题？

Answer 1

Rol*_*DBA 7

MySQL 复制基于一些事情对 Seconds_Behind_Master 执行奇怪的计算。

NOW() 为在主服务器上运行的 SQL 返回什么，如中继日志中记录的那样
NOW() 在奴隶上返回什么
来自 master 的最后一个位置被执行

这是一个 SHOW SLAVE STATUS\G 以及如何从 Master 中识别最后执行的 SQL

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.17.20.102
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002814
          Read_Master_Log_Pos: 823078734
               Relay_Log_File: relay-bin.007364
                Relay_Log_Pos: 823078879
        Relay_Master_Log_File: mysql-bin.002814
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 823078734
              Relay_Log_Space: 823079071
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

Run Code Online (Sandbox Code Playgroud)

请注意 SHOW SLAVE STATUS\G 中的以下字段

Master_Log_File（第 6 行）：上次读取位置的 Master 上的日志文件
Read_Master_Log_Pos（第 7 行）：从主站读取从站的最后位置
Relay_Master_Log_File（第 10 行）：上次执行位置的 Master 上的日志文件
Exec_Master_Log_Pos（第 22 行）：Master 在 Slave 上执行的最后一个位置
Relay_Log_Space（第 23 行）：所有中继日志的字节总和

要计算 Seconds_Behind_Master，它基本上可以计算出

NOW() as recorded in Master LogFile 'Master_Log_File' LogPos 'Read_Master_Log_Pos'
minus
NOW() on the Slave

Run Code Online (Sandbox Code Playgroud)

这个数字会间歇性地增长，因为 I/O 线程可以从 Master 收集更多条目并将其加载到最后一个中继日志中，同时处理位于 Master LogFile 'Relay_Master_Log_File' 和 Master LogPos 'Exec_Master_Log_Pos' 的 SQL 语句。这个增加的数字还表现为三 (3) 个变量的变化：Relay_Log_Space、Master_Log_File、Read_Master_Log_Pos。如果 Seconds_Behind_Master 为 0，这很好地表明 Read_Master_Log_Pos 和 Exec_Master_Log_Pos 几乎相同。同样值得注意的是，长时间运行的 SQL 会欺骗您认为复制落后了。一旦该 SQL 语句完成并且没有其他语句充分积压，Seconds_Behind_Master 可能会急剧下降，甚至降至 0。

那么，为什么 Seconds_Behind_Master 会在 0 和递增的数字之间来回反弹。通过 I/O 线程传输 SQL 时的网络延迟可能会导致计算错误，因为TIMESTAMP在最后一个中继日志的后面还没有到达所需的变量（请记住这是异步复制）。要验证这一点，请mysqlbinlog针对任何二进制日志或中继日志运行并查看TIMESTAMP变量在语句之间写入的位置。

此行为于 2007 年 6 月 22 日首次得到解决，据称该错误已在那时得到修复。

这是采用 MySQL 复制并正确拧紧其头部的一种可靠方法（对于此示例，使用相同的 SHOW SLAVE STATUS\G）

使用 Relay_Master_Log_File mysql-bin.002814 和 Exec_Master_Log_Pos 823078734 作为 Log FilePosition to Restart From，运行这些命令

STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.002814',MASTER_LOG_POS=823078734;
START SLAVE;

Run Code Online (Sandbox Code Playgroud)

这些步骤应该 1) 杀死 I/O 和 SQL 线程，2) 清除所有收集的中继日志，3) 从新的中继日志开始，以及 4) 建立新的 I/O 和 SQL 线程。

从这里开始复制应该很好。至于这个 bug 能否真正得到解决，现在掌握在 Oracle 手中（最后一句并没有激发任何信心，是吗？？？）

归档时间：	14 年，5 月前
查看次数：	2765 次
最近记录：	10 年，8 月前