SQL LIKE,如何按加权发生次数对结果进行排序?

Tom*_*len 9 sql search-engine sql-order-by

我有我的搜索字词:

"Yellow large widgets"
Run Code Online (Sandbox Code Playgroud)

我将这些术语分为3个单词:

1 = "Yellow";
2 = "Large";
2 = "Widgets";
Run Code Online (Sandbox Code Playgroud)

然后我搜索:

SELECT * FROM widgets
    WHERE (description LIKE '%yellow%' OR description LIKE '%large%' OR description LIKE 'widgets') 
    OR (title LIKE '%yellow%' OR title LIKE '%large%' OR title LIKE '%widgets%')
Run Code Online (Sandbox Code Playgroud)

如何根据这些偏差对结果进行排序?

  • 标题需要总统,如果标题中出现任何术语,则应将其视为更重要
  • 发生次数,总出现率较高的结果应首先出现

理想的方法论

  • 计算出现次数description.
  • 每次出现都值得1 point.
  • 计算出现次数title.
  • 每次title出现都值得5 points.
  • 按点排序.

但我不知道在SQL中从哪里开始这样做.

Dam*_*ver 10

好的,让我们在临时表中包含您的搜索词:

CREATE TABLE #SearchTerms (Term varchar(50) not null)
insert into #SearchTerms (Term)
select 'yellow' union all
select 'large' union all
select 'widgets'
Run Code Online (Sandbox Code Playgroud)

让我们做一些愚蠢的事:

select
    widgets.ID,
    (LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) as DescScore
    (LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) as TitleScore
from
    widgets,#SearchTerms
Run Code Online (Sandbox Code Playgroud)

我们现在已经在描述和标题中计算了每个术语的每个出现次数.

所以现在我们可以对这些事件求和并加权:

select
    widgets.ID,
    SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
    ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
from
    widgets,#SearchTerms
group by
    Widgets.ID
Run Code Online (Sandbox Code Playgroud)

如果我们需要对此做更多的事情,我建议将上面的内容放在子选择中

select
    w.*,CombinedScore
from
    widgets.w
       inner join
    (select
        widgets.ID,
        SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
        ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
    from
        widgets,#SearchTerms
    group by
        Widgets.ID
    ) t
        on
            w.ID = t.ID
where
    CombinedScore > 0
order by
    CombinedScore desc
Run Code Online (Sandbox Code Playgroud)

(请注意,我假设所有这些示例中都有一个ID列,但可以将其扩展为在窗口小部件表中定义PK所需的列数)


这里真正的技巧是计算更大的文本体中单词的出现次数,这可以通过以下方式完成:

(LEN(text) - LEN(text with each occurrence of term removed)) / LEN(term)
Run Code Online (Sandbox Code Playgroud)