Mysql集群超过MaxBufferedEpochs

Question

Mysql集群超过MaxBufferedEpochs

我有一个 mysql 集群，有 4 个 api 节点、2 个管理节点和 4 个数据节点。今天，我在尝试连接数据库时遇到问题，所有查询都挂在“打开表”状态。检查日志后，我在日志上发现了这些错误：

Api节点错误：

2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 5 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 6 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 7 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 8 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: cluster failure at epoch 3313124/17.
2015-08-20 19:44:14 15540 [Note] NDB Binlog: ndb tables initially read only on reconnect.
2015-08-20 19:44:14 15540 [ERROR] /opt/mysql/server-5.6/bin/mysqld: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
2015-08-20 19:44:14 15540 [ERROR] /opt/mysql/server-5.6/bin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
2015-08-20 19:44:14 15540 [ERROR] Got error 4010 when reading table './database_name/table'
2015-08-20 19:44:14 15540 [Note] NDB Binlog: cluster failure for ./database_name/table_name at epoch 3313124/17.

mysql> show processlists;

Id  User    Host    db  Command Time    State   Info
1   system user     NULL    Daemon  1497    Waiting for ndbcluster to start NULL

Run Code Online (Sandbox Code Playgroud)

数据节点错误：

2015-08-20 19:44:14 [ndbd] ERROR -- c_gcp_list.seize() failed: gci: 14229759227592721 nodes: 0000000000000000000000000000040000000000000000000000000000001a00
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2000b from: 0fa2000b
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2000c from: 0fa2000c
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2008a from: 0fa2008a

Run Code Online (Sandbox Code Playgroud)

管理节点错误：

2015-08-20 19:44:14 [MgmtSrvr] INFO     -- Node 5: Disconnecting lagging nodes '0000000000000000000000000000000000000000000000000000000000000200',
2015-08-20 19:44:14 [MgmtSrvr] WARNING  -- Node 5: Disconnecting node 9 because it has exceeded MaxBufferedEpochs (100 > 100), epoch 3313119/4

Run Code Online (Sandbox Code Playgroud)

详细的日志和配置

数据节点配置：

https://gist.github.com/sdemircan/730fa49fcc14b4376c42

Run Code Online (Sandbox Code Playgroud)

API节点配置：

https://gist.github.com/sdemircan/f9d230d32700b86564fd

Run Code Online (Sandbox Code Playgroud)

管理节点配置：

https://gist.github.com/sdemircan/d6fbd54799daaae01bf2

Run Code Online (Sandbox Code Playgroud)

API节点日志：

https://gist.github.com/sdemircan/2d62b1c92176de9de9d3

Run Code Online (Sandbox Code Playgroud)

数据节点日志：

https://gist.github.com/sdemircan/d0c97b82457a9c33deaa

Run Code Online (Sandbox Code Playgroud)

数据节点日志：

https://gist.github.com/sdemircan/3faa1e41367bc7655210

Run Code Online (Sandbox Code Playgroud)

管理节点日志：

https://gist.github.com/sdemircan/a026ac57757fafdafaa9

Run Code Online (Sandbox Code Playgroud)

什么可能使 MaxBufferedEpochs 达到上限？

Answer 1

KCD*_*KCD 0

您可能有一个大型事务，一个检索许多行的查询，拉回大量数据并使节点 9 的网络连接饱和。

重新连接时，API 节点是只读的，因此必须是在此之前执行的查询

NDB Binlog: ndb tables initially read only on reconnect

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，1 月前
查看次数：	782 次
最近记录：	1 年，7 月前