为什么邻近度会影响多词查询的 ts_rank?

kev*_*b11 6 postgresql search full-text-search

当我将 ts_rank 与包含多个带有 & 运算符的术语的 ts_query 一起使用时,术语的接近度会影响排名并创建我意想不到的结果。一个例子:

select ts_rank(to_tsvector('why in the world is this not working?'), plainto_tsquery('world working'));
RESULT: 0.095243

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working? I just do not get it'), plainto_tsquery('world working'));
RESULT: 0.0397712
Run Code Online (Sandbox Code Playgroud)

在文档中 ts_rank 被描述为简单地测量匹配频率。

ts_rank([ 权重 float4[], ] 矢量 tsvector, 查询 tsquery [, 归一化整数 ]) 返回 float4 根据匹配词位的频率对向量进行排名。

然而,上面的示例似乎正在测量频率,并且在多术语查询的情况下,还测量邻近度。

在下面的示例中,这给我带来了意想不到的结果:

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world'));
RESULT: 0.0607927

select ts_rank(to_tsvector('why in the world is this not - at least as I would expect - working?'), plainto_tsquery('world working'));
RESULT: 0.0397712
Run Code Online (Sandbox Code Playgroud)

我希望该文档在第二个查询中排名更高,因为它与查询中的多个术语匹配,但它的排名却较低。

有没有办法阻止这种行为?我对 ts_rank 或如何使用它有什么误解吗?