Elasticsearch搜索土耳其语字符

Kur*_*lar 2 elasticsearch

我有一些文件,我正在使用elasticsearch进行索引.但有些文件是用大写字母写的,而Tukish字符则是改变的.例如,"kürşat"被写为"KURSAT".

我想通过搜索"kürşat"找到这份文件.我怎样才能做到这一点?

谢谢

Byr*_*ach 7

看一下asciifolding标记过滤器.

以下是您在Sense中尝试的一个小例子:

指数:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      },
      "analyzer": {
        "turkish_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_ascii_folding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "turkish_analyzer"
        }
      }
    }
  }
}

POST test/test/1
{
  "name": "kür?at"
}

POST test/test/2
{
  "name": "KURSAT"
}
Run Code Online (Sandbox Code Playgroud)

查询:

GET test/_search
{
  "query": {
    "match": {
      "name": "kursat"
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

响应:

 "hits": {
    "total": 2,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.30685282,
        "_source": {
          "name": "KURSAT"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "name": "kür?at"
        }
      }
    ]
  }
Run Code Online (Sandbox Code Playgroud)

查询:

GET test/_search
{
  "query": {
    "match": {
      "name": "kür?at"
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

响应:

 "hits": {
    "total": 2,
    "max_score": 0.4339554,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.4339554,
        "_source": {
          "name": "kür?at"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.09001608,
        "_source": {
          "name": "KURSAT"
        }
      }
    ]
  }
Run Code Online (Sandbox Code Playgroud)

现在'preserve_original'标志将确保如果用户输入:'kürşat',那么具有该完全匹配的文档将比具有'kursat'的文档排名更高(注意两个查询响应的分数差异).

如果您希望得分相等,则可以将该标志设置为false.

希望我的问题正确!