Ric*_*nha 4 elasticsearch elasticsearch-6
我有一个包含多个字段的索引。我想根据除 one - user_comments之外的所有字段中是否存在搜索字符串来过滤掉。我正在做的查询搜索是
{
"from": offset,
"size": limit,
"_source": [
"document_title"
],
"query": {
"function_score": {
"query": {
"bool": {
"must":
{
"query_string": {
"query": "#{query}"
}
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
尽管查询字符串正在搜索所有字段,并在user_comments字段中为我提供具有匹配字符串的文档。但是,我想针对所有不包含user_comments字段的字段来查询它。白名单是一个非常大的列表,而且字段的名称是动态的,因此使用 fields 参数提及白名单字段列表是不可行的。
"query_string": {
"query": "#{query}",
"fields": [
"document_title",
"field2"
]
}
Run Code Online (Sandbox Code Playgroud)
任何人都可以提出一个关于如何从搜索中排除字段的想法吗?
有一种方法可以让它工作,它不是很漂亮,但可以完成工作。您可以使用一个实现你的目标升压和万事的参数query_string,bool查询到的分数和环境相结合min_score:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "#{query}",
"type": "most_fields",
"boost": 1
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "#{query}",
"boost": -1
}
}
]
}
},
"min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)
假设您有以下一组文档:
PUT my-query-string/doc/1
{
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Cardiff"
}
Run Code Online (Sandbox Code Playgroud)
在您的搜索请求中,您只想查看文档 1 和 3,但您的原始查询将返回 1、2 和 3。
在 Elasticsearch 中,搜索结果按相关性_score排序,分数越大越好。
因此,让我们尝试提升该"comments"领域,从而忽略其对相关性得分的影响。我们可以通过将两个查询与 a 组合should并使用负数来做到这一点boost:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Bristol"
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
}
}
Run Code Online (Sandbox Code Playgroud)
这将为我们提供以下输出:
{
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "2",
"_score": 0,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
}
]
}
}
Run Code Online (Sandbox Code Playgroud)
文档 2 受到了惩罚,但文档 1 也受到了惩罚,尽管它是我们想要的匹配项。为什么会这样?
下面是 Elasticsearch_score在这种情况下的计算方式:
_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"
文档 1 匹配comments:"Bristol"部分,它也恰好是最好的分数。根据我们的公式,结果分数为 0。
我们实际上做的是提高第一条(与“所有”域)多,如果更多的字段匹配。
query_string匹配更多字段吗?我们可以query_string在多场模式有type那正是这么做的参数。查询将如下所示:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"type": "most_fields",
"query": "Bristol"
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
}
}
Run Code Online (Sandbox Code Playgroud)
这将为我们提供以下输出:
{
"hits": {
"total": 3,
"max_score": 0.57536423,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0.57536423,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "2",
"_score": 0,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
}
]
}
}
Run Code Online (Sandbox Code Playgroud)
如您所见,不需要的文档 2 位于底部,得分为 0。这是这次计算得分的方法:
_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"
因此,"Bristol"在任何字段中匹配的文档都被选中了。comments:"Bristol"被淘汰的相关性分数,只有匹配title:"Bristol"或text:"Bristol"得到_score> 0的文档。
是的,我们可以,使用min_score:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Bristol",
"type": "most_fields",
"boost": 1
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
},
"min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)
这将起作用(在我们的例子中),因为当且仅当仅与"Bristol"字段匹配"comments"且不匹配任何其他字段时,文档的分数将为 0 。
输出将是:
{
"hits": {
"total": 2,
"max_score": 0.57536423,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0.57536423,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
}
]
}
}
Run Code Online (Sandbox Code Playgroud)
当然。我实际上不建议进行_score调整,因为这是一个非常复杂的问题。
我建议获取现有映射并构建一个字段列表来预先运行查询,这将使代码更加简单明了。
最初建议使用这种查询,其意图与上述解决方案完全相同:
POST my-query-string/doc/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must": {
"query_string": {
"fields" : ["*", "comments^0"],
"query": "#{query}"
}
}
}
}
}
},
"min_score": 0.00001
}
Run Code Online (Sandbox Code Playgroud)
唯一的问题是,如果索引包含任何数值,这部分:
"fields": ["*"]
Run Code Online (Sandbox Code Playgroud)
引发错误,因为文本查询字符串不能应用于数字。
| 归档时间: |
|
| 查看次数: |
1991 次 |
| 最近记录: |