长时间运行mysql"清理"事务

Mar*_*rcF 6 mysql locking transactions amazon-web-services amazon-rds

我一直在尝试调试MySQL(AWS RDS)v5.6.19a中的"锁定等待超时超时"错误,当我尝试使用主ID选择行进行更新时偶尔会抛出该错误,即:

SELECT primary_id FROM tbl_widgets WHERE primary_id = 5 FOR UPDATE
Run Code Online (Sandbox Code Playgroud)

经过几个小时的调试后,我已经排除了我的应用程序的另一部分"直接"锁定同一行(这是明显的罪魁祸首).因此我开始深入研究mysql锁定的兔子洞,并注意到抛出的"锁定等待超时超时"错误与以下信息提供的信息之间存在以下相关性:

SHOW ENGINE INNODB STATUS;
Run Code Online (Sandbox Code Playgroud)

清理状态中似乎存在长时间运行的TRANSACTION,其锁定缓慢增加的行数达10分钟,这里是来自10个手动INNODB STATUS查询的此事务的相关行:

2015-08-19 13:29:04
---TRANSACTION 25861246681, ACTIVE 158 sec
10 lock struct(s), heap size 1184, 21 row lock(s), undo log entries 20
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7146839061 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:29:42
---TRANSACTION 25861246681, ACTIVE 196 sec
13 lock struct(s), heap size 2936, 28 row lock(s), undo log entries 27
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147149416 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:10
---TRANSACTION 25861246681, ACTIVE 224 sec
13 lock struct(s), heap size 2936, 31 row lock(s), undo log entries 30
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147321023 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:41
---TRANSACTION 25861246681, ACTIVE 255 sec
13 lock struct(s), heap size 2936, 35 row lock(s), undo log entries 34
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147511090 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:12
---TRANSACTION 25861246681, ACTIVE 286 sec
15 lock struct(s), heap size 2936, 38 row lock(s), undo log entries 37
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147604774 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:30
---TRANSACTION 25861246681, ACTIVE 304 sec
21 lock struct(s), heap size 2936, 42 row lock(s), undo log entries 39
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147789789 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:57
---TRANSACTION 25861246681, ACTIVE 331 sec
21 lock struct(s), heap size 2936, 46 row lock(s), undo log entries 43
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147837536 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:32:28
---TRANSACTION 25861246681, ACTIVE 362 sec
22 lock struct(s), heap size 2936, 51 row lock(s), undo log entries 48
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147905807 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:16
---TRANSACTION 25861246681, ACTIVE 410 sec
23 lock struct(s), heap size 2936, 58 row lock(s), undo log entries 55
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148317478 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:49
---TRANSACTION 25861246681, ACTIVE 443 sec
24 lock struct(s), heap size 2936, 64 row lock(s), undo log entries 61
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148471519 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682
Run Code Online (Sandbox Code Playgroud)

我发现了以下博客文章(http://databaseblog.myname.nl/2014/10/when-your-query-is-blocked-but-there-is_26.html),它提供了一个有助于确定内容的潜在解决方案.继续这个长期运行的交易,特别是设置:

set GLOBAL innodb_status_output_locks=ON;
Run Code Online (Sandbox Code Playgroud)

遗憾的是,由于权限受限,无法在RDS上执行此操作.

我很乐意请求一些调试帮助,说明如何解决这次清理事务中发生的事情,以及可能如何避免这一切.

编辑添加:MySQL实例的平均CPU使用率为20%

Bam*_*fer 2

就我而言,在我杀死运行调试器的 JVM 后,我的“清理”锁就消失了。显然,它们是我在清理事务之前中断的早期调试运行的残余。

这可能对您没有帮助,但这里有一些在这种情况下进行调试的建议。

  1. 你确实有一条信息,那就是锁的数量。使用断点,您可以在不同的位置暂停应用程序,以尝试准确地确定计数何时增加。(或者只有在日志中看到某些错误后,或者只有在某些用户操作后,它才会上升。)

  2. 如果您无法使用断点,那么您还有另一种工具,它是select for update在发生锁定后阻塞的语句。您也许可以将其散布在您的代码中,可能还需要额外的日志记录,以查明阻塞开始的位置。

  3. 考虑针对本地安装的 MySQL 数据库临时调试应用程序。它可以安装在本地服务器上,也可以安装在您的开发计算机上。这可能设置起来很麻烦,但可以有许多其他好处(例如数据库脚本的测试床;离线时在笔记本电脑上工作的能力。)

所有这些都假设锁定是由您自己的代码引起的,而不是由其他作业引起的。(在您的日志中,清理用户是“mfuser”。)这使您可以根据需要重现问题。