MySQL:delete...where..in() vs delete..from..join,并在删除时使用子选择锁定表

0x8*_*x89 9 mysql index join delete optimization

免责声明:请原谅我对数据库内部知识的缺乏。它是这样的:

我们运行一个应用程序(不是我们编写的),它在数据库的定期清理作业中存在很大的性能问题。查询如下所示:

delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
       select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
       where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");
Run Code Online (Sandbox Code Playgroud)

直截了当、易于阅读和标准 SQL。但不幸的是非常慢。解释查询显示VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID未使用现有索引:

mysql> explain delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
    ->        select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
    ->        where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");
| id | select_type        | table                 | type            | possible_keys                    | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+-----------------------+-----------------+----------------------------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | VARIABLE_SUBSTITUTION | ALL             | NULL                             | NULL    | NULL    | NULL | 7300039 | Using where |
|  2 | DEPENDENT SUBQUERY | BUILDRESULTSUMMARY    | unique_subquery | PRIMARY,key_number_results_index | PRIMARY | 8       | func |       1 | Using where |
Run Code Online (Sandbox Code Playgroud)

这使得它非常慢(120 秒或更长时间)。除此之外,它似乎会阻止尝试插入的查询BUILDRESULTSUMMARY,输出来自show engine innodb status

---TRANSACTION 68603695, ACTIVE 157 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 127964, OS thread handle 0x7facd0670700, query id 956555826 localhost 127.0.0.1 bamboosrv updating
update BUILDRESULTSUMMARY set CREATED_DATE='2015-06-18 09:22:05', UPDATED_DATE='2015-06-18 09:22:32', BUILD_KEY='BLA-RELEASE1-JOB1', BUILD_NUMBER=8, BUILD_STATE='Unknown', LIFE_CYCLE_STATE='InProgress', BUILD_DATE='2015-06-18 09:22:31.792', BUILD_CANCELLED_DATE=null, BUILD_COMPLETED_DATE='2015-06-18 09:52:02.483', DURATION=1770691, PROCESSING_DURATION=1770691, TIME_TO_FIX=null, TRIGGER_REASON='com.atlassian.bamboo.plugin.system.triggerReason:CodeChangedTriggerReason', DELTA_STATE=null, BUILD_AGENT_ID=199688199, STAGERESULT_ID=230943366, RESTART_COUNT=0, QUEUE_TIME='2015-06-18 09:22:04.52
------- TRX HAS BEEN WAITING 157 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 38 page no 30140 n bits 112 index `PRIMARY` of table `bamboong`.`BUILDRESULTSUMMARY` trx id 68603695 lock_mode X locks rec but not gap waiting
------------------
---TRANSACTION 68594818, ACTIVE 378 sec starting index read
mysql tables in use 2, locked 2
646590 lock struct(s), heap size 63993384, 3775190 row lock(s), undo log entries 117
MySQL thread id 127845, OS thread handle 0x7facc6bf8700, query id 956652201 localhost 127.0.0.1 bamboosrv preparing
delete from VARIABLE_SUBSTITUTION  where BUILDRESULTSUMMARY_ID in   (select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY where BUILDRESULTSUMMARY.BUILD_KEY = 'BLA-BLUBB10-SON')
Run Code Online (Sandbox Code Playgroud)

这会减慢系统速度并迫使我们增加innodb_lock_wait_timeout.

当我们运行 MySQL 时,我们重写了删除查询以使用“从连接中删除”:

delete VARIABLE_SUBSTITUTION from VARIABLE_SUBSTITUTION join BUILDRESULTSUMMARY
   on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID = BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID
   where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1";
Run Code Online (Sandbox Code Playgroud)

这稍微不那么容易阅读,不幸的是没有标准的 SQL(据我所知),但由于它使用索引,速度要快得多(0.02 秒左右):

mysql> explain delete VARIABLE_SUBSTITUTION from VARIABLE_SUBSTITUTION join BUILDRESULTSUMMARY
    ->    on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID = BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID
    ->    where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1";
| id | select_type | table                 | type | possible_keys                    | key                      | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------------+------+----------------------------------+--------------------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | BUILDRESULTSUMMARY    | ref  | PRIMARY,key_number_results_index | key_number_results_index | 768     | const                                                  |    1 | Using where; Using index |
|  1 | SIMPLE      | VARIABLE_SUBSTITUTION | ref  | var_subst_result_idx             | var_subst_result_idx     | 8       | bamboo_latest.BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID |   26 | NULL                     |
Run Code Online (Sandbox Code Playgroud)

附加信息:

mysql> SHOW CREATE TABLE VARIABLE_SUBSTITUTION;
| Table                 | Create Table |
| VARIABLE_SUBSTITUTION | CREATE TABLE `VARIABLE_SUBSTITUTION` (
  `VARIABLE_SUBSTITUTION_ID` bigint(20) NOT NULL,
  `VARIABLE_KEY` varchar(255) COLLATE utf8_bin NOT NULL,
  `VARIABLE_VALUE` varchar(4000) COLLATE utf8_bin DEFAULT NULL,
  `VARIABLE_TYPE` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `BUILDRESULTSUMMARY_ID` bigint(20) NOT NULL,
  PRIMARY KEY (`VARIABLE_SUBSTITUTION_ID`),
  KEY `var_subst_result_idx` (`BUILDRESULTSUMMARY_ID`),
  KEY `var_subst_type_idx` (`VARIABLE_TYPE`),
  CONSTRAINT `FK684A7BE0A958B29F` FOREIGN KEY (`BUILDRESULTSUMMARY_ID`) REFERENCES `BUILDRESULTSUMMARY` (`BUILDRESULTSUMMARY_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |

mysql> SHOW CREATE TABLE BUILDRESULTSUMMARY;
| Table              | Create Table |
| BUILDRESULTSUMMARY | CREATE TABLE `BUILDRESULTSUMMARY` (
  `BUILDRESULTSUMMARY_ID` bigint(20) NOT NULL,
....
  `SKIPPED_TEST_COUNT` int(11) DEFAULT NULL,
  PRIMARY KEY (`BUILDRESULTSUMMARY_ID`),
  KEY `FK26506D3B9E6537B` (`CHAIN_RESULT`),
  KEY `FK26506D3BCCACF65` (`MERGERESULT_ID`),
  KEY `key_number_delta_state` (`DELTA_STATE`),
  KEY `brs_build_state_idx` (`BUILD_STATE`),
  KEY `brs_life_cycle_state_idx` (`LIFE_CYCLE_STATE`),
  KEY `brs_deletion_idx` (`MARKED_FOR_DELETION`),
  KEY `brs_stage_result_id_idx` (`STAGERESULT_ID`),
  KEY `key_number_results_index` (`BUILD_KEY`,`BUILD_NUMBER`),
  KEY `brs_agent_idx` (`BUILD_AGENT_ID`),
  KEY `rs_ctx_baseline_idx` (`VARIABLE_CONTEXT_BASELINE_ID`),
  KEY `brs_chain_result_summary_idx` (`CHAIN_RESULT`),
  KEY `brs_log_size_idx` (`LOG_SIZE`),
  CONSTRAINT `FK26506D3B9E6537B` FOREIGN KEY (`CHAIN_RESULT`) REFERENCES `BUILDRESULTSUMMARY` (`BUILDRESULTSUMMARY_ID`),
  CONSTRAINT `FK26506D3BCCACF65` FOREIGN KEY (`MERGERESULT_ID`) REFERENCES `MERGE_RESULT` (`MERGERESULT_ID`),
  CONSTRAINT `FK26506D3BCEDEEF5F` FOREIGN KEY (`STAGERESULT_ID`) REFERENCES `CHAIN_STAGE_RESULT` (`STAGERESULT_ID`),
  CONSTRAINT `FK26506D3BE3B5B062` FOREIGN KEY (`VARIABLE_CONTEXT_BASELINE_ID`) REFERENCES `VARIABLE_CONTEXT_BASELINE` (`VARIABLE_CONTEXT_BASELINE_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
Run Code Online (Sandbox Code Playgroud)

(省略了一些东西,这是一张很宽的桌子)。

所以我有几个问题:

  • 为什么查询优化器不能在子查询版本时使用索引进行删除,而在使用连接版本时?
  • 是否有任何(理想情况下符合标准)的方法来诱使它使用索引?或者
  • 有没有一种可移植的方式来写delete from join?该应用程序支持 PostgreSQL、MySQL、Oracle 和 Microsoft SQL Server,通过 jdbc 和 Hibernate 使用。
  • 为什么 delete from VARIABLE_SUBSTITUTIONblocks inserts intoBUILDRESULTSUMMARY只在子选择中使用?

ype*_*eᵀᴹ 7

  • 为什么查询优化器不能在子查询版本时使用索引进行删除,而在使用连接版本时?

因为优化器在这方面有点愚蠢。不仅是 forDELETEUPDATEforSELECT语句,类似的东西WHERE column IN (SELECT ...)都没有完全优化。执行计划通常涉及为外部表的每一行(VARIABLE_SUBSTITUTION在本例中)运行子查询。如果那张桌子很小,一切都很好。如果它很大,那就没有希望了。在更旧的版本中,带有IN子子查询的IN子查询甚至EXPLAIN可以运行很长时间。

您可以做的 - 如果您想保留此查询 - 是使用已实施多项优化的最新版本并再次测试。最新版本含义:MySQL 5.6(和 5.7 出测试版)和 MariaDB 5.5 / 10.0

(更新)您已经使用了具有优化改进的 5.6,而这个是相关的:Optimizing Subqueries with Semi-Join Transformations
我建议(BUILD_KEY)单独添加一个索引。有一个复合的,但对这个查询不是很有用。

  • 是否有任何(理想情况下符合标准)的方法来诱使它使用索引?

没有我能想到的。在我看来,尝试使用标准 SQL 没有多大价值。有这么多的差异和轻微的怪癖,每个DBMS都(UPDATEDELETE陈述的这种差异很好的例子),当您尝试使用一些作品随处可见,其结果是SQL的一个非常有限的子集。

  • 有没有一种可移植的方式来从加入中写入删除?该应用程序支持 PostgreSQL、MySQL、Oracle 和 Microsoft SQL Server,通过 jdbc 和 Hibernate 使用。

与上一个问题的答案相同。

  • 为什么从 VARIABLE_SUBSTITUTION 中删除阻塞插入到 BUILDRESULTSUMMARY 中,它只在子选择中使用?

不是 100% 确定,但我认为这与多次运行子查询以及它对表采取的锁定类型有关。


Mas*_*oud 3

这是您两个问题的答案

  • 优化器无法使用索引,因为每一行的 where 子句都会发生变化。通过优化器后,删除语句将如下所示

    delete from VARIABLE_SUBSTITUTION where EXISTS (
    select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
    where BUILDRESULTSUMMARY.BUILD_KEY = BUILDRESULTSUMMARY_ID AND BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");
    
    Run Code Online (Sandbox Code Playgroud)

但是当您进行连接时,服务器能够识别要删除的行。

  • 技巧是使用变量来保存BUILDRESULTSUMMARY_ID并使用变量而不是查询。请注意,变量初始化和删除查询都必须在会话内运行。像这样的东西。

    SET @ids = (SELECT GROUP_CONCAT(BUILDRESULTSUMMARY_ID) 
            from BUILDRESULTSUMMARY where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1" ); 
    delete from VARIABLE_SUBSTITUTION where FIND_IN_SET(BUILDRESULTSUMMARY_ID,@ids) > 0;
    
    Run Code Online (Sandbox Code Playgroud)

    如果查询返回太多 id,并且这不是标准方法,您可能会遇到问题。这只是一个解决方法。

    我对你的另外两个问题没有答案:)