拼写错误分析器

sha*_*nuo 2 elasticsearch

我已将用户输入直接保存在 elastcisearch 中。同一学生的姓名字段有多种拼写组合。

PrabhuNath Prasad
PrabhuNathPrasad
Prabhu NathPrasad

Prabhu Nath Prashad
PrabhuNath Prashad
PrabhuNathPrashad
Prabhu NathPrashad
Run Code Online (Sandbox Code Playgroud)

该学生的真实姓名是“ Prabhu Nath Prasad ”,当我用这个名字搜索时,我应该得到上述所有结果。elasticsearch中有没有可以处理这个问题的分析器?

Chi*_*h25 5

你可以这样做custom_analyzer,这是我的设置

POST name_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "char_filter": [
            "space_removal"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "space_removal": {
          "type": "pattern_replace",
          "pattern": "\\s+",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "variation": {
              "type": "string",
              "analyzer": "my_custom_analyzer"
            }
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

我已经映射了nameand standard analyzercustom_analyzer它使用了keyword tokenizerandlowercase filter以及 withchar_filter来删除空格并连接字符串。这char_filter将帮助我们有效地查询不同的变体。

我插入了你在index.html中给出的所有7个组合。这是我的查询

GET name_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "Prabhu Nath Prasad"
          }
        },
        {
          "match": {
            "name.variation": {
              "query": "Prabhu Nath Prasad",
              "fuzziness": "AUTO"
            }
          }
        }
      ]
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

这处理了你所有的可能性,它也会回馈prabhuprasad等。

希望这可以帮助!!