MySQL 5.6:Slave_IO 线程停止工作

HTF*_*HTF 5 mysql replication

标准复制无缘无故中断。

mysql> SELECT @@version, @@version_comment;
+---------------+----------------------------------------------------------------------------+
| @@version | @@version_comment |
+---------------+----------------------------------------------------------------------------+
| 5.6.15-56-log | Percona XtraDB Cluster (GPL), Release 25.5, Revision 759, wsrep_25.5.r4061 |
+---------------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'wsrep_on';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_on | OFF |
+---------------+-------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

启用崩溃安全复制:

master_info_repository = TABLE
relay_log_info_repository = TABLE
relay_log_recovery = 1
Run Code Online (Sandbox Code Playgroud)

从站运行良好:

# mysql -e "SHOW SLAVE STATUS\G" | grep "Slave"
               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Run Code Online (Sandbox Code Playgroud)

但一段时间后,MASTER 和 IO 线程上没有连接的从站从 MASTER 消失:

mysql> SELECT * FROM information_schema.processlist WHERE command = 'Binlog Dump';
Empty set (0.10 sec)

mysql> SHOW SLAVE HOSTS;
Empty set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

掌握:

mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 568210 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

奴隶:

# mysql -e "SHOW SLAVE STATUS\G" | grep "Master_Log"
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 568210
        Relay_Master_Log_File: mysql-bin.000003
          Exec_Master_Log_Pos: 568210
Run Code Online (Sandbox Code Playgroud)

掌握:

mysql> CREATE DATABASE IF NOT EXISTS repl_test; SHOW MASTER STATUS;
Query OK, 1 row affected (0.00 sec)

+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 568333 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

奴隶没有变化:

# mysql -e "SHOW SLAVE STATUS\G" | grep "Master_Log"
    Master_Log_File: mysql-bin.000003
    Read_Master_Log_Pos: 568210
    Relay_Master_Log_File: mysql-bin.000003
    Exec_Master_Log_Pos: 568210
Run Code Online (Sandbox Code Playgroud)

slave 在 IO_THREAD 重启后获取更改:

# mysql -e "STOP SLAVE IO_THREAD; START SLAVE IO_THREAD;"
# mysql -e "SHOW SLAVE STATUS\G" | grep "Master_Log"
    Master_Log_File: mysql-bin.000003
    Read_Master_Log_Pos: 568333
    Relay_Master_Log_File: mysql-bin.000003
    Exec_Master_Log_Pos: 568333
Run Code Online (Sandbox Code Playgroud)

更新:2014 年 7 月 18 日星期五 14:15:59 BST

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: $IP
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000007
          Read_Master_Log_Pos: 433
               Relay_Log_File: mysql-relay.000265
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 433
              Relay_Log_Space: 710
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: /etc/mysql/ca-cert.pem
           Master_SSL_CA_Path: 
              Master_SSL_Cert: /etc/mysql/client-cert.pem
            Master_SSL_Cipher: 
               Master_SSL_Key: /etc/mysql/client-key.pem
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1124732721
                  Master_UUID: 4412a455-e1d0-11e3-835a-5254007fe78d
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: /etc/mysql/ca-cert.pem
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

Mic*_*bot 8

您遇到的情况很容易在低流量情况下发生,特别是当两台服务器被防火墙或其他实施状态包检测的设备分隔时。(这是一种可能发生的情况,例如,在 Amazon EC2/VPC 中)。中间网络硬件可以“忘记”服务器之间的 TCP 连接,因为当没有数据被复制时,连接会处于空闲状态。

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.09 sec)

mysql> CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD = 60;
Query OK, 0 rows affected (0.13 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

当 slave 连接到 master 时,它会请求 master 每 60 秒向 binlog 流中注入一个心跳消息,但只有在这段时间内没有复制事件的情况下——所以当有复制事件时它没有影响很多流量,但是当流量很轻时,将发送心跳事件,并且连接将保持活动状态。

请注意,这CHANGE MASTER TO通常是一个破坏性命令,可以重置您的复制配置。在这种情况下,如果MASTER_HEARTBEAT_PERIOD是提供的唯一参数,则从属配置不会重置。

http://dev.mysql.com/doc/refman/5.6/en/change-master-to.html

还可以考虑将全局变量设置为slave_net_timeout比默认值更短的值,但不小于您用于主心跳周期的值的两​​倍。如果在配置的时间段内复制流上没有发生任何事情,这将导致从服务器断开并重试与主服务器的连接。