如何从elasticsearch 6.1搜索中排除字段?

Ric*_*nha 4 elasticsearch elasticsearch-6

我有一个包含多个字段的索引。我想根据除 one - user_comments之外的所有字段中是否存在搜索字符串来过滤掉。我正在做的查询搜索是

{
    "from": offset,
    "size": limit,
    "_source": [
      "document_title"
    ],
    "query": {
      "function_score": {
        "query": {
          "bool": {
            "must":
            {
              "query_string": {
                "query": "#{query}"
              }
            }
          }
        }
      }
    }
  }
Run Code Online (Sandbox Code Playgroud)

尽管查询字符串正在搜索所有字段,并在user_comments字段中为我提供具有匹配字符串的文档。但是,我想针对所有不包含user_comments字段的字段来查询它。白名单是一个非常大的列表,而且字段的名称是动态的,因此使用 fields 参数提及白名单字段列表是不可行的。

"query_string": {
                    "query": "#{query}",
                    "fields": [
                      "document_title",
                      "field2"
                    ]
                  }
Run Code Online (Sandbox Code Playgroud)

任何人都可以提出一个关于如何从搜索中排除字段的想法吗?

Nik*_*iev 5

有一种方法可以让它工作,它不是很漂亮,但可以完成工作。您可以使用一个实现你的目标升压万事的参数query_stringbool查询到的分数和环境相结合min_score

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "#{query}",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "#{query}",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)

那么引擎盖下会发生什么?

假设您有以下一组文档:

PUT my-query-string/doc/1
{
  "title": "Prodigy in Bristol",
  "text": "Prodigy in Bristol",
  "comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham and Bristol",
  "comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Cardiff"
}
Run Code Online (Sandbox Code Playgroud)

在您的搜索请求中,您只想查看文档 1 和 3,但您的原始查询将返回 1、2 和 3。

在 Elasticsearch 中,搜索结果按相关性_score排序,分数越大越好。

因此,让我们尝试提升"comments"领域,从而忽略其对相关性得分的影响。我们可以通过将两个查询与 a 组合should并使用负数来做到这一点boost

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

这将为我们提供以下输出:

{
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)

文档 2 受到了惩罚,但文档 1 也受到了惩罚,尽管它是我们想要的匹配项。为什么会这样?

下面是 Elasticsearch_score在这种情况下的计算方式:

_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

文档 1 匹配comments:"Bristol"部分,它也恰好是最好的分数。根据我们的公式,结果分数为 0。

我们实际上做的是提高第一条(与“所有”域),如果更多的字段匹配。

我们可以提升query_string匹配更多字段吗?

我们可以query_string多场模式有type那正是这么做的参数。查询将如下所示:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "type": "most_fields",
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

这将为我们提供以下输出:

{
  "hits": {
    "total": 3,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)

如您所见,不需要的文档 2 位于底部,得分为 0。这是这次计算得分的方法:

_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

因此,"Bristol"在任何字段中匹配的文档都被选中了。comments:"Bristol"被淘汰的相关性分数,只有匹配title:"Bristol"text:"Bristol"得到_score> 0的文档。

我们可以过滤掉那些分数不理想的结果吗?

是的,我们可以,使用min_score

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)

这将起作用(在我们的例子中),因为当且仅当仅与"Bristol"字段匹配"comments"且不匹配任何其他字段时,文档的分数将为 0 。

输出将是:

{
  "hits": {
    "total": 2,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)

可以以不同的方式完成吗?

当然。我实际上不建议进行_score调整,因为这是一个非常复杂的问题。

我建议获取现有映射并构建一个字段列表来预先运行查询,这将使代码更加简单明了。

答案中提出的原始解决方案(保留历史记录)

最初建议使用这种查询,其意图与上述解决方案完全相同:

POST my-query-string/doc/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": {
            "query_string": {
              "fields" : ["*", "comments^0"],
              "query": "#{query}"
            }
          }
        }
      }
    }
  },
  "min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)

唯一的问题是,如果索引包含任何数值,这部分:

"fields": ["*"]
Run Code Online (Sandbox Code Playgroud)

引发错误,因为文本查询字符串不能应用于数字。