use*_*109 1 analyzer elasticsearch
某些字符被视为像# 这样的分隔符,因此它们在查询中永远不会匹配。最接近标准的自定义分析器配置应该是什么,以允许匹配这些字符?
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase&pretty' -d 'new year #celebration vegas'
Run Code Online (Sandbox Code Playgroud)
这会给你
{
"tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "year",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
}, {
"token" : "#celebration",
"start_offset" : 9,
"end_offset" : 21,
"type" : "word",
"position" : 3
}, {
"token" : "vegas",
"start_offset" : 22,
"end_offset" : 27,
"type" : "word",
"position" : 4
} ]
}
Run Code Online (Sandbox Code Playgroud)
2)如果您只想保留一些特殊字符,则可以使用char filter映射它们,以便您的文本在tokenization发生之前转换为其他内容。这更接近standard analyzer. 例如,您可以像这样创建索引
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"special_analyzer": {
"char_filter": [
"special_mapping"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"char_filter": {
"special_mapping": {
"type": "mapping",
"mappings": [
"#=>hashtag\\u0020"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"tweet": {
"type": "string",
"analyzer": "special_analyzer"
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
现在curl -XPOST 'localhost:9200/my_index/_analyze?analyzer=special_analyzer&pretty' -d 'new year #celebration vegas'
自定义分析器将生成以下令牌
{
"tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "year",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "hashtag",
"start_offset" : 9,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "celebration",
"start_offset" : 10,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "vegas",
"start_offset" : 22,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 5
} ]
}
Run Code Online (Sandbox Code Playgroud)
所以你可以这样搜索
GET my_index/_search
{
"query": {
"match": {
"tweet": "#celebration"
}
}
}
Run Code Online (Sandbox Code Playgroud)
您还可以只搜索庆祝活动,因为我使用了 unicode 作为空间,\\u0020否则我们将始终需要搜索#
希望这可以帮助!!
| 归档时间: |
|
| 查看次数: |
1619 次 |
| 最近记录: |