http://sqlfiddle.com/#!2/e6382
id id_news word
1 6 superman
2 6 movie
3 6 review
4 6 excellent
5 7 review
6 7 guardians of the galaxy
7 7 great
8 8 review
9 8 superman
10 8 movie
11 8 great
Run Code Online (Sandbox Code Playgroud)
我有一个小问题,我试图通过具有阈值设置的单词来处理不同的新闻,在提供的示例中id_news 6应该与之相关8但不是7因为7只有2单词有共同之处我只想检测那些至少有3单词的人共同的.
尝试这个自加入:
SELECT
wa1.id_news id_news_1,
wa2.id_news id_news_2,
count(wa2.word) cnt_words
FROM word_analysis wa1
INNER JOIN word_analysis wa2
ON wa1.id_news <> wa2.id_news AND wa1.word = wa2.word
GROUP BY wa1.id_news, wa2.id_news
HAVING count(wa2.word) >= 3
ORDER BY wa1.id_news, wa2.id_news;
Run Code Online (Sandbox Code Playgroud)