我试图让Elasticsearch忽略连字符.我不希望它将连字符的任何一边分成单独的单词.看起来很简单,但我正在敲打墙壁.
我希望字符串"Roland JD-Xi"产生以下术语:[roland jd-xi,roland,jd-xi,jdxi,roland jdxi]
我无法轻易实现这一目标.大多数人只会键入'jdxi',所以我最初的想法就是删除连字符.所以我使用以下定义
name: {
"type": "string",
"analyzer": "language",
"include_in_all": true,
"boost": 5,
"fields": {
"my_standard": {
"type": "string",
"analyzer": "my_standard"
},
"my_prefix": {
"type": "string",
"analyzer": "my_text_prefix",
"search_analyzer": "my_standard"
},
"my_suffix": {
"type": "string",
"analyzer": "my_text_suffix",
"search_analyzer": "my_standard"
}
}
Run Code Online (Sandbox Code Playgroud)
}
相关的分析仪和过滤器定义为
{
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"std": {
"tokenizer": "standard",
"char_filter": "html_strip",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "length", "strip_hyphens"]
...
"my_text_prefix": {
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_front"]
},
"my_text_suffix": {
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_back"]
},
"my_standard": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase"]
}
},
"char_filter": {
"my_filter": {
"type": "mapping",
"mappings": ["- => ", ". => "]
}
},
"filter": {
"edge_ngram_front": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20,
"side": "front"
},
"edge_ngram_back": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20,
"side": "back"
},
"strip_spaces": {
"type": "pattern_replace",
"pattern": "\\s",
"replacement": ""
},
"strip_dots": {
"type": "pattern_replace",
"pattern": "\\.",
"replacement": ""
},
"strip_hyphens": {
"type": "pattern_replace",
"pattern": "-",
"replacement": ""
},
"stop": {
"type": "stop",
"stopwords": "_none_"
},
"length": {
"type": "length",
"min": 1
}
}
}
Run Code Online (Sandbox Code Playgroud)
我已经能够测试(即_analyze)这个并且字符串"Roland JD-Xi"被标记为[roland,jdxi]
它不完全是我想要的但足够接近它应该匹配'jdxi'.
但这就是我的问题.如果我做一个简单的"index/_search?q = jdxi",它就不会带回文件.但是,如果我执行"index/_search?q = roland + jdxi",它会带回文档.
所以至少我知道连字符被删除但是如果正在创建令牌"roland"和"jdxi",那么"index/_search?q = jdxi"与文档不匹配?
我在 ES 6 上重现了您的案例,并搜索index/_search?q=jdxi返回了文档。
问题可能是,在index/_search?q=jdxi不指定字段的情况下进行搜索时,它基本上会搜索其中_all包含该字段中的任何内容name(基本上与 相同index/_search?q=name:jdxi)。由于未使用您的my_standard分析器分析该字段,因此您不会得到任何结果。
您应该做的是使用my_standard子字段进行搜索,即index/_search?q=name.my_standard:jdxi并且非常确定您会得到文档。
| 归档时间: |
|
| 查看次数: |
266 次 |
| 最近记录: |