我有一个 MySQL 数据库表,它引用了不同的单词及其在文档中的位置。我想返回包含所有单词的文档的 ID。
这是一个示例表。
docid wordid
1 4
2 4
1 2
1 5
Run Code Online (Sandbox Code Playgroud)
好的,现在假设有人在数据库中查询了 WORDID 为 4、2 和 5 的单词。
我错误的 SQL SELECT 语句类似于:
Select docid from table where wordid = 4 and wordid = 2 and wordid = 5
Run Code Online (Sandbox Code Playgroud)
这给了我 0 结果。
我在其他地方看到where in
有人建议使用该条款:
如果我理解正确,这是编写 OR 子句的另一种方式。我试过这个:
select docid from table where wordid in (4,2,5)
Run Code Online (Sandbox Code Playgroud)
但是,这给了我所有的结果。它应该排除 docid 2,因为它不包含其他词。我期待获得 docid 1。
但是,我可能会where in
错误地使用该子句,因为我的数据库经验很少。
如何返回包含所有单词的 docid?
另请注意,我的 where 子句将在 FOR 循环中动态生成。查询可以是一两个词那么简单,也可以是 10 或 12 个词。我正在寻找一种考虑速度的查询结构。如果您需要更多信息,请告诉我。
作为参考,我正在尝试将此代码转换为 PHP/MYSQL,但我不理解此处的 sql 语句或 MYSQL 中的等效语句:
这是关系除法问题,SO 有一个问题,有很多方法可以编写此查询,加上 PostgreSQL 的性能分析:如何过滤具有多通关系的 SQL 结果
无耻地在那里复制代码并删除/更改具有 MySQL 缺乏功能的答案的代码,例如 CTE EXCEPT
、INTERSECT
、 等,这里有一些方法可以做到这一点。
假设:
factors
UNIQUE
约束(wordid, docid)
documents
一张words
桌子: 易写,中等效率:
-- Query 1 -- by Martin
SELECT d.docid, d.docname
FROM document d
JOIN factors f USING (docid)
WHERE f.wordid IN (2, 4, 5)
GROUP BY d.docid
HAVING COUNT(*) = 3 ; -- number of words
Run Code Online (Sandbox Code Playgroud)
易写,中等效率:
-- Query 2 -- by Erwin
SELECT d.docid, d.docname
FROM documents d
JOIN (
SELECT docid
FROM factors
WHERE wordid IN (2, 4, 5)
GROUP BY docid
HAVING COUNT(*) = 3
) f USING (docid) ;
Run Code Online (Sandbox Code Playgroud)
写起来更复杂,在 Postgres 中效率非常好——在 MySQL 中可能很糟糕:
-- Query 4 -- by Derek
SELECT d.docid, d.docname
FROM documents d
WHERE d.docid IN (SELECT docid FROM factors WHERE wordid = 2)
AND d.docid IN (SELECT docid FROM factors WHERE wordid = 4);
AND d.docid IN (SELECT docid FROM factors WHERE wordid = 5);
Run Code Online (Sandbox Code Playgroud)
编写起来更复杂,在 Postgres 中效率非常好——在 MySQL 中可能也是如此:
-- Query 5 -- by Erwin
SELECT d.docid, d.docname
FROM documents d
WHERE EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 2)
AND EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 4)
AND EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 5) ;
Run Code Online (Sandbox Code Playgroud)
编写起来更复杂,在 Postgres 中效率非常好——在 MySQL 中可能也是如此:
-- Query 6 -- by Sean
SELECT d.docid, d.docname
FROM documents d
JOIN factors x ON d.docid = x.docid
JOIN factors y ON d.docid = y.docid
JOIN factors z ON d.docid = z.docid
WHERE x.wordid = 2
AND y.wordid = 4
AND z.wordid = 5 ;
Run Code Online (Sandbox Code Playgroud)
易于编写和扩展到任意一组,words
但效率不如JOIN
和EXISTS
解决方案:
-- Query 7 -- by ypercube
SELECT d.docid, d.docname
FROM documents d
WHERE NOT EXISTS (
SELECT *
FROM words AS w
WHERE w.wordid IN (2, 4, 5)
AND NOT EXISTS (
SELECT *
FROM factors AS f
WHERE f.docid = d.docid
AND f.wordid = w.wordid
)
);
Run Code Online (Sandbox Code Playgroud)
写起来容易,效率不高:
-- Query 8 -- by ypercube
SELECT d.docid, d.docname
FROM documents d
WHERE NOT EXISTS (
SELECT *
FROM (
SELECT 2 AS wordid UNION ALL
SELECT 4 UNION ALL
SELECT 5
) AS w
WHERE NOT EXISTS (
SELECT *
FROM factors AS f
WHERE f.docid = d.docid
AND f.wordid = w.wordid
)
);
Run Code Online (Sandbox Code Playgroud)
喜欢测试它们:)