我想用 eleasticsearch 实现下面的 tsql 查询
declare @searchstring nvarchar (max)
set @searchstring = 'tn-241'
set @searchstring = replace(replace('%'+@searchstring+'%', '-', ''), ' ', '')
SELECT *
FROM [dbo].[Product]
where
replace(replace(shortdescription, '-', ''), ' ', '') like @searchstring or
replace(replace(name, '-', ''), ' ', '') like @searchstring or
replace(replace(number, '-', ''), ' ', '') like @searchstring
Run Code Online (Sandbox Code Playgroud)
为此,我使用关键字分词器和带有 catenate_all 的分隔符过滤器创建了分析器,如下所示
"search_delimiter": {
"split_on_numerics": "false",
"generate_word_parts": "false",
"preserve_original": "false",
"generate_number_parts": "false",
"catenate_all": "true",
"split_on_case_change": "false",
"type": "word_delimiter",
"stem_english_possessive": "false"
}
"analyzer": {
"searchanalyzer": {
"filter": [
"lowercase"
,
"search_delimiter"
],
"type": "custom",
"tokenizer": "keyword"
},
"Name": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
},
"Number": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
}
"ShortDescription": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
},
Run Code Online (Sandbox Code Playgroud)
其结果为
curl -XGET "Index/_analyze?analyzer=searchanalyzer&pretty=true" -d "Original Brother TN-241C Toner Cyan"
{
"tokens" : [ {
"token" : "originalbrothertn241ctonercyan",
"start_offset" : 0,
"end_offset" : 35,
"type" : "word",
"position" : 0
} ]
}
}
Run Code Online (Sandbox Code Playgroud)
所以我基本上需要应用相同的分析器并使用 query_string 或通配符搜索来进行字符串搜索
所以如果我像下面这样搜索
"query": {
"query_string" : {
"fields" : ["Name", "Number", "ShortDescription"],
"query" : "*TonerCyan*"
}
}
Run Code Online (Sandbox Code Playgroud)
它工作正常,但如果我搜索
"query": {
"query_string" : {
"fields" : ["Name", "Number", "ShortDescription"],
"query" : "*Toner Cyan*"
}
}
Run Code Online (Sandbox Code Playgroud)
它不会返回任何结果,这意味着在执行 query_string 之前不会应用 searchanalyzer,因为我希望它应该在第二个查询中搜索 TonerCyan,而不是分别搜索 Toner 和 Cyan?第一个问题是为什么这不起作用?第二是实现上述 tsql 查询的最佳方法是什么?它应该搜索多个字段
您可以尝试将搜索字符串放在双引号内,如下所示,这应该有效:
{
"query": {
"query_string": {
"fields": [
"Name",
"Number",
"ShortDescription"
],
"query": "*\"Toner Cyan\"*"
}
}
}
Run Code Online (Sandbox Code Playgroud)
此外,您应该知道搜索前缀通配符可能会产生灾难性的性能影响,具体取决于您拥有的数据量。我仍然坚信您应该出于这个原因对 ngram 建立索引。
| 归档时间: |
|
| 查看次数: |
10407 次 |
| 最近记录: |