DELETE查询性能

Question

DELETE查询性能

原始查询

delete B from 
TABLE_BASE B , 
TABLE_INC  I 
where B.ID = I.IDID and B.NUM = I.NUM;

Run Code Online (Sandbox Code Playgroud)

以上查询的性能统计数据

+-------------------+---------+-----------+
|    Response Time  | SumCPU  | ImpactCPU |
+-------------------+---------+-----------+
|   00:05:29.190000 |   2852  |  319672   |
+-------------------+---------+-----------+

Run Code Online (Sandbox Code Playgroud)

优化查询1

DEL FROM TABLE_BASE WHERE (ID, NUM) IN 
(SELECT ID, NUM FROM TABLE_INC);

Run Code Online (Sandbox Code Playgroud)

以上查询的统计信息

+-----------------+--------+-----------+
|   QryRespTime   | SumCPU | ImpactCPU |
+-----------------+--------+-----------+
| 00:00:00.570000 |  15.42 |     49.92 |
+-----------------+--------+-----------+

Run Code Online (Sandbox Code Playgroud)

优化查询2

DELETE FROM TABLE_BASE B WHERE EXISTS
(SELECT * FROM TABLE_INC I WHERE B.ID = I.ID AND B.NUM = I.NUM);

Run Code Online (Sandbox Code Playgroud)

以上查询的统计信息

+-----------------+--------+-----------+
|   QryRespTime   | SumCPU | ImpactCPU |
+-----------------+--------+-----------+
| 00:00:00.400000 |  11.96 |     44.93 |
+-----------------+--------+-----------+

Run Code Online (Sandbox Code Playgroud)

我的问题 -

优化查询1和2如何/为何如此显着地影响性能？
这种DELETE查询的最佳实践是什么？
我应该选择查询1还是查询2？哪一个是理想/更好/可靠的？我觉得查询1是理想的,因为SELECT *我没有使用SELECT ID,NUM简化为只有两列但查询2显示更好的结果.

QUERY 1

 This query is optimized using type 2 profile T2_Linux64, profileid 21.
  1) First, we lock TEMP_DB.TABLE_BASE for write on a
     reserved RowHash to prevent global deadlock.
  2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we
     lock TEMP_DB.TABLE_BASE for write.
  3) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          TEMP_DB.TABLE_BASE by way of an all-rows scan
          with no residual conditions into Spool 2 (all_amps), which is
          redistributed by the hash code of (
          TEMP_DB.TABLE_BASE.NUM,
          TEMP_DB.TABLE_BASE.ID) to all AMPs.  Then
          we do a SORT to order Spool 2 by row hash.  The size of Spool
          2 is estimated with low confidence to be 168,480 rows (
          5,054,400 bytes).  The estimated time for this step is 0.03
          seconds.
       2) We do an all-AMPs RETRIEVE step from
          TEMP_DB_T.TABLE_INC by way of an all-rows scan
          with no residual conditions into Spool 3 (all_amps), which is
          redistributed by the hash code of (
          TEMP_DB_T.TABLE_INC.NUM,
          TEMP_DB_T.TABLE_INC.ID) to all AMPs.  Then
          we do a SORT to order Spool 3 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 3
          is estimated with high confidence to be 5,640 rows (310,200
          bytes).  The estimated time for this step is 0.03 seconds.
  4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using an inclusion
     merge join, with a join condition of ("(ID = ID) AND
     (NUM = NUM)").  The result goes into Spool 1 (all_amps),
     which is redistributed by the hash code of (
     TEMP_DB.TABLE_BASE.ROWID) to all AMPs.  Then we do
     a SORT to order Spool 1 by row hash and the sort key in spool
     field1 eliminating duplicate rows.  The size of Spool 1 is
     estimated with no confidence to be 168,480 rows (3,032,640 bytes).
     The estimated time for this step is 1.32 seconds.
  5) We do an all-AMPs MERGE DELETE to
     TEMP_DB.TABLE_BASE from Spool 1 (Last Use) via the
     row id.  The size is estimated with no confidence to be 168,480
     rows.  The estimated time for this step is 42.95 seconds.
  6) We spoil the parser's dictionary cache for the table.
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

Run Code Online (Sandbox Code Playgroud)

QUERY 2 EXPLAIN PLAN

 This query is optimized using type 2 profile T2_Linux64, profileid 21.
  1) First, we lock TEMP_DB.TABLE_BASE for write on a reserved RowHash to
     prevent global deadlock.
  2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we
     lock TEMP_DB.TABLE_BASE for write.
  3) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from TEMP_DB.TABLE_BASE by way of
          an all-rows scan with no residual conditions into Spool 2
          (all_amps), which is redistributed by the hash code of (
          TEMP_DB.TABLE_BASE.NUM, TEMP_DB.TABLE_BASE.ID) to all AMPs.
          Then we do a SORT to order Spool 2 by row hash.  The size of
          Spool 2 is estimated with low confidence to be 168,480 rows (
          5,054,400 bytes).  The estimated time for this step is 0.03
          seconds.
       2) We do an all-AMPs RETRIEVE step from
          TEMP_DB_T.TABLE_INC by way of an all-rows scan
          with no residual conditions into Spool 3 (all_amps), which is
          redistributed by the hash code of (
          TEMP_DB_T.TABLE_INC.NUM,
          TEMP_DB_T.TABLE_INC.ID) to all AMPs.  Then
          we do a SORT to order Spool 3 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 3
          is estimated with high confidence to be 5,640 rows (310,200
          bytes).  The estimated time for this step is 0.03 seconds.
  4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are joined using an inclusion
     merge join, with a join condition of ("(NUM = NUM) AND
     (ID = ID)").  The result goes into Spool 1 (all_amps), which
     is redistributed by the hash code of (TEMP_DB.TABLE_BASE.ROWID) to all
     AMPs.  Then we do a SORT to order Spool 1 by row hash and the sort
     key in spool field1 eliminating duplicate rows.  The size of Spool
     1 is estimated with no confidence to be 168,480 rows (3,032,640
     bytes).  The estimated time for this step is 1.32 seconds.
  5) We do an all-AMPs MERGE DELETE to TEMP_DB.TABLE_BASE from Spool 1 (Last
     Use) via the row id.  The size is estimated with no confidence to
     be 168,480 rows.  The estimated time for this step is 42.95
     seconds.
  6) We spoil the parser's dictionary cache for the table.
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

Run Code Online (Sandbox Code Playgroud)

对于TABLE_BASE

+----------------+----------+
|  table_bytes   | skewness |
+----------------+----------+
| 16842085888.00 |    22.78 |
+----------------+----------+

Run Code Online (Sandbox Code Playgroud)

对于TABLE_INC

+-------------+----------+
| table_bytes | skewness |
+-------------+----------+
|  5317120.00 |    44.52 |
+-------------+----------+

Run Code Online (Sandbox Code Playgroud)

Answer 1

dno*_*eth 1

TABLE_BASE和之间有什么关系TABLE_INC？

如果是一对多，Q1 可能首先创建一个巨大的线轴，而 Q2&3 可能DISTINCT在连接之前应用。

关于INvs.EXISTS应该几乎没有任何区别，你检查过 dbc.QryLogStepsV 吗？

编辑：

如果(ID,Num)是目标表的 PI 重写为 MERGE DELETE 应提供最佳性能：

MERGE INTO TABLE_BASE AS tgt
USING TABLE_INC AS src
ON src.ID = tgt.ID,
AND src.Num = tgt.Num
WHEN MATCHED 
THE DELETE

Run Code Online (Sandbox Code Playgroud)

@BhaveshGhodasara：因为实现不同:-) (2认同)

归档时间：	9 年，2 月前
查看次数：	182 次
最近记录：	9 年，2 月前