为什么邻近度会影响多词查询的 ts_rank？

kev*_*b11 6 postgresql search full-text-search

当我将 ts_rank 与包含多个带有 & 运算符的术语的 ts_query 一起使用时，术语的接近度会影响排名并创建我意想不到的结果。一个例子：

select ts_rank(to_tsvector('why in the world is this not working?'), plainto_tsquery('world working'));
RESULT: 0.095243

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working? I just do not get it'), plainto_tsquery('world working'));
RESULT: 0.0397712

Run Code Online (Sandbox Code Playgroud)

在文档中 ts_rank 被描述为简单地测量匹配频率。

ts_rank([ 权重 float4[], ] 矢量 tsvector, 查询 tsquery [, 归一化整数 ]) 返回 float4 根据匹配词位的频率对向量进行排名。

然而，上面的示例似乎正在测量频率，并且在多术语查询的情况下，还测量邻近度。

在下面的示例中，这给我带来了意想不到的结果：

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world'));
RESULT: 0.0607927

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712

Run Code Online (Sandbox Code Playgroud)

我希望该文档在第二个查询中排名更高，因为它与查询中的多个术语匹配，但它的排名却较低。

有没有办法阻止这种行为？我对 ts_rank 或如何使用它有什么误解吗？

归档时间：	8 年，5 月前
查看次数：	267 次
最近记录：	8 年，5 月前