弹性搜索 v7.1.1
\n\n我不明白包含“AND”的 query_string 与包含“AND”的 query_string 之间的区别。“默认运算符 AND”
\n\n我认为它应该产生相同的结果,但事实并非如此:
\n\nHTTP POST http://localhost:9200/umlautsuche\n\n{\n "settings": {\n "analysis": {\n "char_filter": {\n "my_char_filter": {\n "type": "mapping",\n "mappings": ["ph => f"]\n }\n },\n "filter": {\n "my_ngram": {\n "type": "edge_ngram",\n "min_gram": 3,\n "max_gram": 10\n }\n },\n "analyzer": {\n "my_name_analyzer": {\n "tokenizer": "standard",\n "char_filter": [\n "my_char_filter"\n ],\n "filter": [\n "lowercase",\n "german_normalization"\n ]\n }\n }\n }\n },\n "mappings": {\n "date_detection": false,\n "dynamic_templates": [\n {\n "string_fields_german": {\n "match_mapping_type": "string",\n "match": "*",\n "mapping": {\n "type": "text",\n "analyzer": "my_name_analyzer"\n }\n }\n },\n {\n "dates": {\n "match": "lastModified",\n "match_pattern": "regex",\n "mapping": {\n "type": "date",\n "ignore_malformed": true\n }\n }\n }\n ]\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\nHTTP POST http://localhost:9200/_bulk\n\n{ "index" : { "_index" : "umlautsuche", "_id" : "1" } }\n{"vorname": "Stephan-J\xc3\xb6rg", "nachname": "M\xc3\xbcller", "ort": "Hollabrunn"}\n\n{ "index" : { "_index" : "umlautsuche", "_id" : "2" } }\n{"vorname": "Stephan-Joerg", "nachname": "Mueller", "ort": "Hollabrunn"}\n\n{ "index" : { "_index" : "umlautsuche", "_id" : "3" } }\n{"vorname": "Stephan-J\xc3\xb6rg", "nachname": "M\xc3\xbcll", "ort": "Hollabrunn"}\nRun Code Online (Sandbox Code Playgroud)\n\n这里没有结果 - 出乎我意料:
\n\nHTTP POST http://localhost:9200/umlautsuche/_search\n\n{\n "query": {\n "query_string": {\n "query": "Stefan M\xc3\xbcller J\xc3\xb6r*",\n "analyze_wildcard": true,\n "default_operator": "AND",\n "fields": ["vorname", "nachname"]\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n此查询给出了我预期的结果:
\n\nHTTP POST http://localhost:9200/umlautsuche/_search\n\n{\n "query": {\n "query_string": {\n "query": "Stefan AND M\xc3\xbcller AND J\xc3\xb6r*",\n "analyze_wildcard": true,\n "default_operator": "AND",\n "fields": ["vorname", "nachname"]\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n如何配置查询/分析器,以便我的搜索词之间不需要这些“AND”?
\n您面临的是query_string布尔运算符的布尔逻辑的模糊性,并且可能是未记录的行为。由于这种模糊性,我认为最好使用bool具有显式逻辑的查询,或者使用copy_to.
让我更详细地解释一下发生了什么以及如何解决它。
\n\n为了查看查询如何执行,让我们设置profile: true:
POST /umlautsuche/_search\n{\n "query": {\n "query_string": {\n "query": "Stefan M\xc3\xbcller J\xc3\xb6r*",\n "analyze_wildcard": true,\n "default_operator": "AND",\n "fields": [\n "vorname",\n "nachname"\n ]\n }\n },\n "profile": true\n}\nRun Code Online (Sandbox Code Playgroud)\n\n在 ES 响应中我们将看到:
\n\n "profile": {\n "shards": [\n {\n "id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",\n "searches": [\n {\n "query": [\n {\n "type": "BooleanQuery",\n "description": "+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)",\n "time_in_nanos": 17787641,\n "breakdown": {\n "set_min_competitive_score_count": 0,\nRun Code Online (Sandbox Code Playgroud)\n\n我们对这部分感兴趣:
\n\n"+((+nachname:stefan +nachname:muller) | (+vorname:stefan +vorname:muller)) +(nachname:jor* | vorname:jor*)"\nRun Code Online (Sandbox Code Playgroud)\n\n不进行深入分析,我们可以看出这个查询想要查找带有 surnamestefan和 surname 的文档muller,这是不可能的(因为stefan文档中从来没有 surname )。
我认为,我们真正想做的是“找到全名是Stefan M\xc3\xbcller J\xc3\xb6r*”的人。这不是 Elasticsearch 生成的查询所做的事情。
让我们对 执行同样的操作explain: true。响应将包含以下内容:
"profile": {\n "shards": [\n {\n "id": "[QCANVs5gR0GOiiGCmEwj7w][umlautsuche][0]",\n "searches": [\n {\n "query": [\n {\n "type": "BooleanQuery",\n "description": "+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)",\n "time_in_nanos": 17970342,\n "breakdown": {\nRun Code Online (Sandbox Code Playgroud)\n\n我们可以看到查询被解释如下:
\n\n"+(nachname:stefan | vorname:stefan) +(nachname:muller | vorname:muller) +(nachname:jor* | vorname:jor*)"\nRun Code Online (Sandbox Code Playgroud)\n\n我们可以粗略地解释为“找到名字或姓氏是这三个名字之一的人”,这就是我们期望它做的事情。
\n\nquery_string在query的文档中,它说它default_operator: AND应该将空格解释为ANDs:
\n\n\n如果未指定显式运算符,则使用默认运算符。例如,使用默认运算符 时
\nOR,查询capital of Hungary\n 将转换为capital OR of OR Hungary,而使用默认运算符\n 时AND,同一查询将转换为capital AND of AND Hungary。\n 默认值为OR。
尽管从我们刚刚看到的情况来看,这似乎并不正确 - 至少在查询多个字段的情况下。
\n\n那么我们能做些什么呢?
\n\nbool与显式逻辑一起使用这个查询似乎有效:
\n\nPOST /umlautsuche/_search\n{\n "query": {\n "bool": {\n "must": [\n {\n "query_string": {\n "query": "Stefan M\xc3\xbcller J\xc3\xb6r*",\n "analyze_wildcard": true,\n "fields": [\n "vorname"\n ]\n }\n },\n {\n "query_string": {\n "query": "Stefan M\xc3\xbcller J\xc3\xb6r*",\n "analyze_wildcard": true,\n "fields": [\n "nachname"\n ]\n }\n }\n ]\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n此查询并不完全等效,请将其视为示例。例如,如果我们有另一个这样的记录,没有“J\xc3\xb6rg”:
\n\n{"vorname": "Stephan", "nachname": "M\xc3\xbcll", "ort": "Hollabrunn"}\nRun Code Online (Sandbox Code Playgroud)\n\nbool尽管缺少“J\xc3\xb6rg”,上面的查询仍会匹配它。为了克服这个问题,您可以编写一个更复杂的bool查询,但如果您想避免解析用户输入,则这将不起作用。
我们如何仍然使用普通的、未解析的查询字符串?
\n\ncopy_to字段我们可以尝试使用copy_to能力。它将多个字段的内容复制到另一个字段中,并一起分析这些字段。
我们必须修改映射配置(不幸的是,必须重新创建现有索引):
\n\n "mappings": {\n "date_detection": false,\n "dynamic_templates": [\n {\n "name_fields_german": {\n "match_mapping_type": "string",\n "match": "*name",\n "mapping": {\n "type": "text",\n "analyzer": "my_name_analyzer",\n "copy_to": "full_name"\n }\n }\n },\n {\n "string_fields_german": {\n "match_mapping_type": "string",\n "match": "*",\n "mapping": {\n "type": "text",\n "analyzer": "my_name_analyzer"\n }\n }\n },\n {\n "dates": {\n "match": "lastModified",\n "match_pattern": "regex",\n "mapping": {\n "type": "date",\n "ignore_malformed": true\n }\n }\n }\n ]\n }\nRun Code Online (Sandbox Code Playgroud)\n\n然后我们可以按照与之前完全相同的方式填充索引。
\n\nfull_name现在我们可以使用以下查询来查询新字段:
POST /umlautsuche/_search\n{\n "query": {\n "bool": {\n "must": [\n {\n "query_string": {\n "query": "Stefan M\xc3\xbcller J\xc3\xb6r*",\n "analyze_wildcard": true,\n "default_operator": "AND",\n "fields": [\n "full_name"\n ]\n }\n }\n ]\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n此查询将返回与第二个查询相同的 2 个文档。因此,在这种情况下,default_operator: AND其行为正如我们所期望的那样,要求匹配查询中的所有标记。
希望有帮助!
\n| 归档时间: |
|
| 查看次数: |
6007 次 |
| 最近记录: |