如何在MySQL中找到最流行的单词出现?

Use*_*ser 14 mysql sql denormalization

我有一个叫results5列的表.

我想使用该title列来查找所说的行:WHERE title like '%for sale%'然后列出该列中最常用的单词.一个会是for另一个会是,sale但我想看看其他单词与此相关.

样本数据:

title
cheap cars for sale
house for sale
cats and dogs for sale
iphones and androids for sale
cheap phones for sale
house furniture for sale
Run Code Online (Sandbox Code Playgroud)

结果(单个单词):

for    6
sale    6
cheap    2
and    2
house    2
furniture 1
cars    1
etc...
Run Code Online (Sandbox Code Playgroud)

Gor*_*off 7

您可以通过一些字符串操作来提取单词.假设您有一个数字表,并且单词由单个空格分隔:

select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
       count(*)
from results r join
     numbers n
     on n.n <= length(title) - length(replace(title, ' ', '')) + 1
group by word;
Run Code Online (Sandbox Code Playgroud)

如果您没有数字表,可以使用子查询手动构建一个:

from results r join
     (select 1 as n union all select 2 union all select 3 union all . . .
     ) n
     . . .
Run Code Online (Sandbox Code Playgroud)

SQL Fiddle(由@GrzegorzAdamKowalski提供)就在这里.

  • 它似乎无法正常工作.检查一下:http://sqlfiddle.com/#!9/b0749/2 (2认同)

Grz*_*ski 6

您可以以某种有趣的方式使用 ExtractValue。请参阅此处的 SQL 小提琴: http 45

我们只需要一张桌子:

CREATE TABLE text (`title` varchar(29));

INSERT INTO text (`title`)
VALUES
    ('cheap cars for sale'),
    ('house for sale'),
    ('cats and dogs for sale'),
    ('iphones and androids for sale'),
    ('cheap phones for sale'),
    ('house furniture for sale')
;
Run Code Online (Sandbox Code Playgroud)

现在我们构造一系列选择,从转换为 XML 的文本中提取整个单词。每个选择从文本中提取第 N 个单词。

select words.word, count(*) as `count` from
(select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[1]') as word from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[2]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[3]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[4]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[5]') from `text`) as words
where length(words.word) > 0
group by words.word
order by `count` desc, words.word asc
Run Code Online (Sandbox Code Playgroud)