为什么ElasticSearch匹配查询返回所有结果?

The*_*boy 6 node.js elasticsearch elasticsearch-plugin

我有以下ElasticSearch查询,我认为会返回电子邮件字段中的所有匹配项,它等于myemails@email.com

"query": {
  "bool": {
    "must": [
      {
        "match": {
          "email": "myemail@gmail.com"
      }
    }
  ]
}
Run Code Online (Sandbox Code Playgroud)

}

正在搜索的用户类型的映射如下:

    {
      "users": {
      "mappings": {
         "user": {
            "properties": {
               "email": {
                  "type": "string"
               },
               "name": {
                  "type": "string",
                  "fields": {
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "nickname": {
                  "type": "string"
               },
            }
         }
       }
   }  
     }
Run Code Online (Sandbox Code Playgroud)

以下是从ElasticSearch返回的结果示例

 [{
    "_index": "users",
    "_type": "user",
    "_id": "54b19c417dcc4fe40d728e2c",
    "_score": 0.23983537,
    "_source": {
    "email": "johnsmith@gmail.com",
    "name": "John Smith",
    "nickname": "jsmith",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "9c417dcc4fe40d728e2c54b1",
    "_score": 0.23983537,
    "_source": {
       "email": "myemail@gmail.com",
       "name": "Walter White",
       "nickname": "wwhite",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "4fe40d728e2c54b19c417dcc",
    "_score": 0.23983537,
    "_source": {
       "email": "JimmyFallon@gmail.com",
       "name": "Jimmy Fallon",
       "nickname": "jfallon",
}]
Run Code Online (Sandbox Code Playgroud)

从上面的查询中,我认为这需要与'myemail@gmail.com'完全匹配作为电子邮件属性值.

如何更改ElasticSearch DSL查询以便仅返回电子邮件上的完全匹配.

Vin*_*han 11

电子邮件字段已被标记化,这是此异常的原因.所以当你编入索引时会发生什么

"myemail@gmail.com"=> ["myemail","gmail.com"]

这样,如果您搜索myemail或gmail.com,您将获得正确的匹配.所以当你搜索john@gmail.com时,分析器也应用于搜索查询.因此它被打破了

"john@gmail.com"=> ["john","gmail.com"]

这里"gmail.com"令牌在搜索词和索引词中很常见,你会得到一个匹配.

为了克服这种行为,请声明电子邮件; 字段为not_analyzed.通过标记化不会发生,整个字符串将被索引.

用"not_analyzed"

"john@gmail.com"=> ["john@gmail.com"]

所以修改映射到这个,你应该是好的 -

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "email": {
            "type": "string",
            "index": "not_analyzed"
          },
          "name": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "nickname": {
            "type": "string"
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

我已经更精确地描述了这个问题,并在此解决了这个问题.