我有一些文件,我正在使用elasticsearch进行索引.但有些文件是用大写字母写的,而Tukish字符则是改变的.例如,"kürşat"被写为"KURSAT".
我想通过搜索"kürşat"找到这份文件.我怎样才能做到这一点?
谢谢
以下是您在Sense中尝试的一个小例子:
指数:
DELETE test
PUT test
{
"settings": {
"analysis": {
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
},
"analyzer": {
"turkish_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_ascii_folding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "turkish_analyzer"
}
}
}
}
}
POST test/test/1
{
"name": "kür?at"
}
POST test/test/2
{
"name": "KURSAT"
}
Run Code Online (Sandbox Code Playgroud)
查询:
GET test/_search
{
"query": {
"match": {
"name": "kursat"
}
}
}
Run Code Online (Sandbox Code Playgroud)
响应:
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.30685282,
"_source": {
"name": "KURSAT"
}
},
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.30685282,
"_source": {
"name": "kür?at"
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
查询:
GET test/_search
{
"query": {
"match": {
"name": "kür?at"
}
}
}
Run Code Online (Sandbox Code Playgroud)
响应:
"hits": {
"total": 2,
"max_score": 0.4339554,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.4339554,
"_source": {
"name": "kür?at"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.09001608,
"_source": {
"name": "KURSAT"
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
现在'preserve_original'标志将确保如果用户输入:'kürşat',那么具有该完全匹配的文档将比具有'kursat'的文档排名更高(注意两个查询响应的分数差异).
如果您希望得分相等,则可以将该标志设置为false.
希望我的问题正确!
| 归档时间: |
|
| 查看次数: |
944 次 |
| 最近记录: |