用于电子邮件的ElasticSearch Analyzer和Tokenizer

LYu*_*LYu 26 email lucene tokenize analyzer elasticsearch

在以下情况下,我无法在Google或ES中找到完美的解决方案,希望有人可以在这里提供帮助.

假设在"email"字段下存储了五个电子邮件地址:

1. {"email": "john.doe@gmail.com"}
2. {"email": "john.doe@gmail.com, john.doe@outlook.com"}
3. {"email": "hello-john.doe@outlook.com"}
4. {"email": "john.doe@outlook.com}
5. {"email": "john@yahoo.com"}
Run Code Online (Sandbox Code Playgroud)

我想完成以下搜索方案:

[搜索 - >接收]

"john.doe@gmail.com" - > 1,2

"john.doe@outlook.com" - > 2,4

"john@yahoo.com" - > 5

"john.doe" - > 1,2,3,4

"约翰" - > 1,2,3,4,5

"gmail.com" - > 1,2

"outlook.com" - > 2,3,4

前三个匹配是必须的,对于其余的匹配,越精确越好.已经尝试过不同的索引/搜索分析器,标记器和过滤器组合.还试图处理匹配查询的条件,但没有找到理想的解决方案,任何想法都是受欢迎的,并且对映射,分析器或使用哪种查询没有限制,谢谢.

And*_*fan 37

映射:

PUT /test
{
  "settings": {
    "analysis": {
      "filter": {
        "email": {
          "type": "pattern_capture",
          "preserve_original": 1,
          "patterns": [
            "([^@]+)",
            "(\\p{L}+)",
            "(\\d+)",
            "@(.+)",
            "([^-@]+)"
          ]
        }
      },
      "analyzer": {
        "email": {
          "tokenizer": "uax_url_email",
          "filter": [
            "email",
            "lowercase",
            "unique"
          ]
        }
      }
    }
  },
  "mappings": {
    "emails": {
      "properties": {
        "email": {
          "type": "string",
          "analyzer": "email"
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

测试数据:

POST /test/emails/_bulk
{"index":{"_id":"1"}}
{"email": "john.doe@gmail.com"}
{"index":{"_id":"2"}}
{"email": "john.doe@gmail.com, john.doe@outlook.com"}
{"index":{"_id":"3"}}
{"email": "hello-john.doe@outlook.com"}
{"index":{"_id":"4"}}
{"email": "john.doe@outlook.com"}
{"index":{"_id":"5"}}
{"email": "john@yahoo.com"}
Run Code Online (Sandbox Code Playgroud)

要使用的查询:

GET /test/emails/_search
{
  "query": {
    "term": {
      "email": "john.doe@gmail.com"
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

  • 刚刚浏览了文档,真的无法理解为什么我自己没找到它,再次感谢,不需要更多的解释,以防有人需要这个:http://www.elastic.co/guide/en/ elasticsearch /参考/ 1.5 /分析图案捕获tokenfilter.html](http://www.elastic.co/guide/en/elasticsearch/reference/1.5/analysis-pattern-capture-tokenfilter.html) (3认同)