如何将NOT IN子查询重写为join

Ale*_*tis 2 mysql sql join subquery

我们假设MySQL中的下表描述了文件夹中包含的文档.

mysql> select * from folder;
+----+----------------+
| ID | PATH           |
+----+----------------+
|  1 | matches/1      |
|  2 | matches/2      |
|  3 | shared/3       |
|  4 | no/match/4     |
|  5 | unreferenced/5 |
+----+----------------+


mysql> select * from DOC;
+----+------+------------+
| ID | F_ID | DATE       |
+----+------+------------+
|  1 |    1 | 2000-01-01 |
|  2 |    2 | 2000-01-02 |
|  3 |    2 | 2000-01-03 |
|  4 |    3 | 2000-01-04 |
|  5 |    3 | 2000-01-05 |
|  6 |    3 | 2000-01-06 |
|  7 |    4 | 2000-01-07 |
|  8 |    4 | 2000-01-08 |
|  9 |    4 | 2000-01-09 |
| 10 |    4 | 2000-01-10 |
+----+------+------------+
Run Code Online (Sandbox Code Playgroud)

列ID是主键,表DOC的列F_ID是引用表FOLDER的主键的非空外键.通过在where子句中使用文档的"DATE",我想找到哪些文件夹仅包含所选文档.对于早于2000-01-05的文件,可以写成:

SELECT DISTINCT d1.F_ID 
FROM DOC d1 
WHERE d1.DATE < '2000-01-05' 
AND d1.F_ID NOT IN (
    SELECT d2.F_ID 
    FROM DOC d2 WHERE NOT (d2.DATE < '2000-01-05')
);
Run Code Online (Sandbox Code Playgroud)

它正确地返回'1'和'2'.通过阅读 http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html ,如果将子查询替换为连接,则可以提高大表的性能.我已经找到了与NOT IN和JOINS相关的问题,但不完全是我正在寻找的问题.那么,关于如何使用连接编写这些的任何想法?

Gor*_*off 6

一般答案是:

select t.*
from t
where t.id not in (select id from s)
Run Code Online (Sandbox Code Playgroud)

可以改写为:

select t.*
from t left outer join
     (select distinct id from s) s
     on t.id = s.id
where s.id is null
Run Code Online (Sandbox Code Playgroud)

我想你可以将它应用到你的情况中.