MySQL 卡在“wsrep:为写集(-1)启动复制”

Jos*_*osh 6 mysql percona-server galera mysql-5.7

我最近遇到了以前从未见过的 MySQL 情况。我们有一个有 3 个节点的 Percona 集群。master 停止处理查询,我们托管的 PHP FPM Web 应用程序变得无响应。当我检查时SHOW PROCESSLIST,MySQL 进程卡在状态:wsrep: initiating replication for write set (-1)

PHP 将所有查询定向到的主数据库和两个辅助数据库之一都是这种情况。这是我看到的输出SHOW PROCESSLIST

mysql> show processlist;
+-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+
| Id    | User        | Host               | db              | Command | Time  | State                                                  | Info                                                                                 | Rows_sent | Rows_examined |
+-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+
|     1 | system user |                    | NULL            | Sleep   |  2171 | wsrep: committing write set (542480920)                | NULL                                                                                 |         0 |             0 |
|     2 | system user |                    | NULL            | Sleep   | 17169 | wsrep: aborter idle                                    | NULL                                                                                 |         0 |             0 |
|     4 | system user |                    | NULL            | Sleep   |  3250 | wsrep: deleting row for write-set (542480919)          | NULL                                                                                 |         0 |             0 |
| 46944 | $user1      | 172.24.62.92:54004 | $user1_db1      | Query   |  2158 | wsrep: initiating pre-commit for write set (542481004) | delete from $table where $col < '$val'                                               |         0 |             1 |
| 47126 | $user1      | 172.24.62.92:54745 | $user1_db2      | Query   |  2096 | wsrep: initiating replication for write set (-1)       | update $table2 set $col = current_timestamp where $col2 = 393 and $col3 = 176935     |         0 |             1 |
| 47155 | $user1      | 172.24.62.92:54841 | $user1_db3      | Query   |  2089 | wsrep: initiating replication for write set (-1)       | UPDATE $table SET $col5 = 'something' WHERE $somecol = '$someval'                    |         0 |             1 |
| 47416 | $user1      | 172.24.62.92:55891 | $user1_db3      | Query   |  1950 | wsrep: initiating replication for write set (-1)       | UPDATE $table SET $col5 = 'something' WHERE $somecol = 's'                           |         0 |             1 |
| 47576 | $user1      | 172.24.62.92:56493 | $user1_db3      | Query   |  1849 | wsrep: initiating replication for write set (-1)       | INSERT INTO $table3  ($column6, $column7, $column8, $column9, $column10 ...          |         0 |             0 |
| 47654 | $user1      | 172.24.62.92:56808 | $user1_db2      | Query   |  1924 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 43625 a |         0 |             1 |
| 48036 | $user1      | 172.24.62.92:58343 | $user1_db2      | Query   |  1795 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 248528  |         0 |             1 |
| 48936 | $user1      | 172.24.62.92:61929 | $user1_db2      | Query   |  1495 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 156001  |         0 |             1 |
| 48952 | $user1      | 172.24.62.92:61982 | $user1_db2      | Query   |  1490 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 205495  |         0 |             1 |
| 49497 | $user1      | 172.24.62.92:64167 | $user1_db2      | Query   |  1306 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 234457  |         0 |             1 |
| 49510 | $user1      | 172.24.62.92:64218 | $user1_db2      | Query   |  1302 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 209489  |         0 |             1 |
| 49839 | $user1      | 172.24.62.92:65534 | $user1_db2      | Query   |  1192 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 70958 a |         0 |             1 |
| 49970 | $user1      | 172.24.62.92:1539  | $user1_db2      | Query   |  1096 | wsrep: initiating replication for write set (-1)       | update $table set $col11 = $col11 + 1 where id = $val                                |         0 |             1 |
| 50292 | $user1      | 172.24.62.92:2819  | $user1_db2      | Query   |  1041 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 193078  |         0 |             1 |
| 50398 | $user1      | 172.24.62.92:3240  | $user1_db2      | Query   |  1006 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 242842  |         0 |             1 |
| 51120 | $user1      | 172.24.62.92:6135  | $user1_db2      | Query   |   763 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 173382  |         0 |             1 |
| 51453 | $user1      | 172.24.62.92:7456  | $user1_db1      | Query   |   653 | wsrep: initiating replication for write set (-1)       | delete from $table5 where expiry < 1496379436                                        |         0 |             2 |
| 51460 | $user1      | 172.24.62.92:7475  | $user1_db1      | Query   |   651 | wsrep: initiating replication for write set (-1)       | insert into $table5 values ('...                                                     |         0 |             0 |
| 51504 | $user1      | 172.24.62.92:7646  | $user1_db1      | Query   |   587 | wsrep: initiating replication for write set (-1)       | insert into $table5 values ('...                                                     |         0 |             0 |
| 51525 | $user1      | 172.24.62.92:7721  | $user1_db1      | Query   |   631 | wsrep: initiating replication for write set (-1)       | insert into $table5 values ('...                                                     |         0 |             0 |
| 51998 | $user1      | 172.24.62.92:9585  | $user1_db2      | Query   |   475 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 203223  |         0 |             1 |
| 52290 | $user1      | 172.24.62.92:10759 | $user1_db2      | Query   |   377 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 185874  |         0 |             1 |
| 53055 | $user1      | 172.24.62.92:13797 | $user1_db2      | Query   |   123 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 89879 a |         0 |             1 |
| 53303 | $user1      | 172.24.62.92:14793 | $user1_db2      | Query   |    39 | wsrep: initiating replication for write set (-1)       | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551  |         0 |             1 |
| 53396 | $user1      | 172.24.62.92:15176 | $user1_db2      | Query   |     7 | updating                                               | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551  |         0 |             0 |
| 53403 | $user1      | 172.24.62.92:15205 | $user1_db2      | Query   |     5 | updating                                               | update $table4 set $col11 = current_timestamp where $col12 = 393 and $col1 = 146551  |         0 |             0 |
| 53410 | root        | localhost          | NULL            | Query   |     0 | starting                                               | show processlist                                                                     |         0 |             0 |
+-------+-------------+--------------------+-----------------+---------+-------+--------------------------------------------------------+--------------------------------------------------------------------------------------+-----------+---------------+
30 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

(表和列名称更改以保护无辜)

我无法弄清楚如何从这种情况中恢复,我们最终不得不重新启动整个集群(滚动重启)并使用重新引导集群 service mysql bootstrap-pxc

我们最近刚刚升级到 Percona 5.7……我们还更改SET GLOBAL tx_isolation='READ-COMMITTED';MySQL Docs引用的Set Transactions以尝试提高 SELECT 的性能。我们不确定这些是否可能导致我们遇到的情况。

如果 MySQL 卡在状态中意味着什么wsrep: initiating replication for write set (-1)以及可能的原因(和修复)是什么?

Ric*_*mes 0

(这并不能回答所提出的问题,但可能会消除问题。)

听起来您缺少复合材料 INDEX($col1, $col2)(无论顺序)。

而且桌子还蛮大的。

在这两者之间,每一个UPDATE似乎都需要一两分钟,显然会阻碍下一个UPDATE