SELECT行的最有效方法WHERE ID EXISTS IN第二个表

Tot*_*Zam 4 sql sql-server performance sql-server-2012

我正在寻找从第二个表中存在ID的一个表中选择所有记录.

以下两个查询返回正确的结果:

查询1:

SELECT *
FROM Table1 t1
WHERE EXISTS (SELECT 1 FROM Table2 t2 WHERE t1.ID = t2.ID)
Run Code Online (Sandbox Code Playgroud)

查询2:

SELECT *
FROM Table1 t1
WHERE t1.ID IN (SELECT t2.ID FROM Table2 t2)
Run Code Online (Sandbox Code Playgroud)

其中一个查询比另一个更有效吗?我应该使用一个吗?还有第三种方法我没想到会更有效吗?

The*_*war 7

摘要:

IN和EXISTS在所有场景中表现相似.以下是用于验证的参数..

执行成本,时间:
两者相同,优化器生成相同的计划.
内存授予:
对于两个查询都相同
Cpu时间,逻辑读取:
尽管读取相同,但Exists在CPU时间方面似乎比IN位略优于IN.

我使用下面的测试数据集运行每个查询10次.

  1. 一个非常大的子查询结果集(100000行)
  2. 重复的行
  3. 空行

对于所有上述情况下,都INEXISTS以相同的方式进行.

有关用于测试的Performance V3数据库的一些信息.20000个客户拥有1000000个订单,因此每个客户在订单表中随机复制(范围为10到100).

执行成本,时间:
下面是两个查询运行的截图.观察每个查询的相对成本.

在此输入图像描述

内存成本:
两个查询的内存授权也相同.我强制MDOP 1,以免将它们溢出到TEMPDB ..

在此输入图像描述

CPU时间,读取:

存在:

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 469 ms,  elapsed time = 595 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Run Code Online (Sandbox Code Playgroud)

对于IN:

(20000 row(s) affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 547 ms,  elapsed time = 669 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Run Code Online (Sandbox Code Playgroud)

在每种情况下,优化器都足够智能以重新排列查询.

我倾向于使用EXISTS(我的意见).要使用的一个用例EXISTS是当您不想返回第二个表结果集时.

根据Martin Smith的查询更新:

我运行以下查询以找到从第一个表中获取第二个表中存在引用的行的最有效方法.

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid   

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

SELECT *
FROM Customers C
WHERE EXISTS(SELECT 1 FROM Orders o WHERE o.custid = c.custid)

SELECT *
FROM Customers c
WHERE custid IN (SELECT custid FROM Orders)
Run Code Online (Sandbox Code Playgroud)

以上所有查询共享相同的成本,但第二个除外,INNER JOIN其余的计划相同.

在此输入图像描述

内存授予:
此查询

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid 
Run Code Online (Sandbox Code Playgroud)

所需的内存授予

在此输入图像描述

这个查询

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid 
Run Code Online (Sandbox Code Playgroud)

所需的内存授予..

在此输入图像描述

CPU时间,读取:
查询:

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid   

(20000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 48, logical reads 1344, physical reads 96, read-ahead reads 1248, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1453 ms,  elapsed time = 781 ms.
Run Code Online (Sandbox Code Playgroud)

对于查询:

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

(20000 row(s) affected)
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1499 ms,  elapsed time = 403 ms.
Run Code Online (Sandbox Code Playgroud)

  • 实际上我刚下载了该数据库并进行了测试,并设法将第一个额外的`DISTINCT`查询转换为半连接. (2认同)
  • 我在原始问题中没有提到的东西,但我认为有用的是,DISTINCT不能很好地处理"文本"数据类型的字段,因此即使JOIN解决方案的效率相同,它们也可能不起作用作为替代品. (2认同)