Elasticsearch：当搜索词在文档中多次出现时，如何防止分数增加？

Question

Elasticsearch：当搜索词在文档中多次出现时，如何防止分数增加？

当一个搜索词在我正在搜索的文档中不仅出现一次而且出现多次时，分数就会上升。虽然大多数时候可能需要这样做，但在我的情况下并非如此。

查询：

"query": {
  "bool": {
    "should": {
      "nested": {
        "path": "editions",
        "query": {
          "match": {
            "title_author": {
              "query": "look me up",
              "operator": "and",
              "boost": 2
            }
          }
        }
      }
    },
    "must": {
      "nested": {
        "path": "editions",
        "query": {
          "match": {
            "title_author": {
              "query": "look me up",
              "operator": "and",
              "fuzziness": 0.5,
              "boost": 1
            }
          }
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

文档_1

{
  "editions": [
    {
      "editionid": 1,
      "title_author": "look me up look me up",
    },
    {
      "editionid": 2,
      "title_author": "something else",
    }
  ]
}

Run Code Online (Sandbox Code Playgroud)

和 doc_2

{
  "editions": [
    {
      "editionid": 3,
      "title_author": "look me up",
    },
    {
      "editionid": 4,
      "title_author": "something else",
    }
  ]
}

Run Code Online (Sandbox Code Playgroud)

现在，由于搜索词被包含两次，doc_1 将获得更高的分数。我不想要那个。如何关闭此行为？我想要相同的分数 - 无论搜索词是否在匹配文档中找到一次或两次。

Answer 1

Ale*_*kov 5

除了@keety和@Sid1199讨论的内容之外，还有另一种方法可以做到这一点：称为index_options 的“文本”类型字段的特殊属性。默认情况下，它设置为“positions”，但您可以明确地将其设置为“docs”，因此词频不会被放置在索引中，并且 Elasticsearch 在搜索时不会知道重复。

"title_author": {
    "type": "text",
    "index_options": "docs"
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，9 月前
查看次数：	1585 次
最近记录：	7 年，10 月前