我试图在这个简化的例子中找到喜欢同一组电视节目的用户对
假设我有一张桌子,每个用户都可以获得他们喜欢的每个电视节目的参赛作品:
|USER | Show |
|-----|-------------|
|001 | Lost |
|001 | South Park |
|002 | Lost |
|003 | Lost |
|003 | South Park |
|004 | South Park |
|005 | Lost |
|006 | Lost |
Run Code Online (Sandbox Code Playgroud)
然后我想得到一个结果:
|USER1 |USER2 |
|------|------|
|001 |003 |
|003 |001 |
|002 |005 |
|002 |006 |
|005 |002 |
|005 |006 |
|006 |002 |
|006 |005 |
Run Code Online (Sandbox Code Playgroud)
或者更好的版本是:
|USER1 |USER2 |
|------|------|
|001 |003 |
|002 |005 |
|002 |006 |
|005 |006 |
Run Code Online (Sandbox Code Playgroud)
基本上说:用户1喜欢与用户3相同的节目集.
我一直在玩GROUP BY和JOIN,但我仍然找不到答案:(.
到目前为止,我发现使用了
SELECT s1.User as USER1, s2.User as USER2, s1.Show as Show
FROM Shows s1 JOIN (SELECT * FROM Shows) s2
ON s1.Shows=s2.Shows AND s1.User!=s2.User;
Run Code Online (Sandbox Code Playgroud)
这产生了成对的用户和他们共同的展示.但我不知道从哪里开始.
如果您可以接受 CSV 而不是表格结果,则只需将表格分组两次即可:
SELECT GROUP_CONCAT(User) FROM (
SELECT User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
FROM Shows
GROUP BY User
) t GROUP BY s
Run Code Online (Sandbox Code Playgroud)
否则,您可以将上面的子查询连接到其自身:
SELECT DISTINCT LEAST(t.User, u.User) AS User1,
GREATEST(t.User, u.User) AS User2
FROM (
SELECT User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
FROM Shows
GROUP BY User
) t JOIN (
SELECT User, GROUP_CONCAT(DISTINCT `Show` ORDER BY `Show` SEPARATOR 0x1e) AS s
FROM Shows
GROUP BY User
) u USING (s)
WHERE t.User <> u.User
Run Code Online (Sandbox Code Playgroud)
在sqlfiddle上查看它们。
当然,如果保证表中不存在重复(User, Show)对,您可以通过从聚合中删除关键字来提高性能。ShowsDISTINCTGROUP_CONCAT()
| 归档时间: |
|
| 查看次数: |
102 次 |
| 最近记录: |