在 MySQL 中将单列与多个值匹配而无需自联接表

Chr*_*ong 14 mysql join database-design

我们有一个表,用于存储问题的答案。我们需要能够找到对特定问题有特定答案的用户。因此,如果我们的表包含以下数据:

user_id     question_id     answer_value  
Sally        1               Pooch  
Sally        2               Peach  
John         1               Pooch  
John         2               Duke
Run Code Online (Sandbox Code Playgroud)

并且我们想要找到回答问题 1 的“Pooch”和回答问题 2 的“Peach”的用户,以下 SQL 将(显然)不起作用:

select user_id 
from answers 
where question_id=1 
  and answer_value = 'Pooch'
  and question_id=2
  and answer_value='Peach'
Run Code Online (Sandbox Code Playgroud)

我的第一个想法是为我们正在寻找的每个答案自行加入表格:

select a.user_id 
from answers a, answers b 
where a.user_id = b.user_id
  and a.question_id=1
  and a.answer_value = 'Pooch'
  and b.question_id=2
  and b.answer_value='Peach'
Run Code Online (Sandbox Code Playgroud)

这是有效的,但由于我们允许任意数量的搜索过滤器,我们需要找到更有效的东西。我的下一个解决方案是这样的:

select user_id, count(question_id) 
from answers 
where (
       (question_id=2 and answer_value = 'Peach') 
    or (question_id=1 and answer_value = 'Pooch')
      )
group by user_id 
having count(question_id)>1
Run Code Online (Sandbox Code Playgroud)

但是,我们希望用户能够两次填写同一份问卷,因此他们可能在答案表中对问题 1 有两个答案。

所以,现在我不知所措。解决这个问题的最佳方法是什么?谢谢!

Rol*_*DBA 8

我找到了一种无需自联接即可执行此查询的巧妙方法。

我在 MySQL 5.5.8 for Windows 中运行这些命令并得到以下结果:

use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');

SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;

+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John    |           1 | Pooch         |
| John    |           2 | Duke,Duck     |
| Sally   |           1 | Pouch,Pooch   |
| Sally   |           2 | Peach         |
+---------+-------------+---------------+
Run Code Online (Sandbox Code Playgroud)

此显示显示 John 对问题 2 给出了两个不同的答案,而 Sally 对问题 1 给出了两个不同的答案。

要了解所有用户对哪些问题的回答不同,只需将上述查询放在子查询中并检查给定答案列表中的逗号以获取不同答案的数量,如下所示:

SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;
Run Code Online (Sandbox Code Playgroud)

我懂了:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           1 | Pooch         |                 1 |
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
| Sally   |           2 | Peach         |                 1 |
+---------+-------------+---------------+-------------------+
Run Code Online (Sandbox Code Playgroud)

现在只需使用另一个子查询过滤出 multianswer_count = 1 的行:

SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;
Run Code Online (Sandbox Code Playgroud)

这是我得到的:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
+---------+-------------+---------------+-------------------+
Run Code Online (Sandbox Code Playgroud)

本质上,我执行了三个表扫描:1 次在主表上,2 次在小子查询上。没有加入!!!

试一试 !!!

  • 我一直很欣赏您为回答所付出的努力。 (2认同)

Der*_*ney 7

我自己喜欢 join 方法:

SELECT a.user_id FROM answers a
INNER JOIN answers a1 ON a1.question_id=1 AND a1.answer_value='Pooch'
INNER JOIN answers a2 ON a2.question_id=2 AND a2.answer_value='Peach'
GROUP BY a.user_id
Run Code Online (Sandbox Code Playgroud)

更新 在使用更大的表(约 100 万行)进行测试后,此方法比OR原始问题中提到的简单方法花费的时间要长得多。


Chr*_*ong 5

我们加入了user_idanswers表中链的连接,以从其他表中的数据,但隔离回答表SQL,在这种简单的术语写它帮助我发现了解决方案:

SELECT user_id, COUNT(question_id) 
FROM answers 
WHERE
  (question_id = 2 AND answer_value = 'Peach') 
  OR (question_id = 1 AND answer_value = 'Pooch')
GROUP by user_id 
HAVING COUNT(question_id) > 1
Run Code Online (Sandbox Code Playgroud)

我们不必要地使用了第二个子查询。