Rrr*_*Rrr 6 elasticsearch elasticsearch-percolate
指数:
{
"settings": {
"index.percolator.map_unmapped_fields_as_text": true,
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
此测试过滤器查询有效
{
"query": {
"match": {
"message": "blah"
}
}
}
Run Code Online (Sandbox Code Playgroud)
此查询不起作用
{
"query": {
"simple_query_string": {
"query": "bl*"
}
}
}
Run Code Online (Sandbox Code Playgroud)
结果:
{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}
Run Code Online (Sandbox Code Playgroud)
为什么此simple_query_string查询与文档不匹配?
我也不明白你在问什么。可能你不太了解percolator?这是我现在刚刚尝试的一个例子。
假设您有一个索引(我们称之为索引test
),您想要在其中索引一些文档。该索引具有以下映射(只是我的测试设置中的随机测试索引):
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
您注意到它有一个自定义email
分析器,可以将类似内容foo@bar.com
分成以下标记:foo@bar.com
, foo
, bar.com
, bar
, com
。
正如文档所述,您可以创建一个单独的渗透器索引,该索引仅保存您的渗透器查询,而不保存文档本身。而且,即使渗透器索引本身不包含文档,它也应该保存保存文档的索引的映射(test
在我们的例子中)。
这是渗透器索引(我称之为percolator_index
)的映射,它还具有用于分割email
字段的特殊分析器:
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
它的映射和设置与我的原始索引几乎相同,唯一的区别是添加到映射query
的类型的附加字段。percolator
你感兴趣的查询它——simple_query_string
应该放到一个文档里面percolator_index
。就像这样:
PUT /percolator_index/_doc/1?refresh
{
"query": {
"simple_query_string" : {
"query" : "month foo@bar.com",
"fields": ["part", "email"]
}
}
}
Run Code Online (Sandbox Code Playgroud)
为了让它更有趣,我添加了email
在其中添加了要在查询中专门搜索的字段(默认情况下,会搜索所有字段)。
现在,我们的目标是测试一个文档,该文档最终应根据渗透器索引中的test
此查询进入索引。simple_query_string
例如:
GET /percolator_index/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
显然,下面document
是您未来(尚不存在)的文档。这将与上面定义的进行匹配simple_query_string
并产生匹配:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.39324823,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.39324823,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
Run Code Online (Sandbox Code Playgroud)
如果我改为渗透此文档会怎样:
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
(注意,电子邮件只是foo
)这是结果:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.26152915,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
Run Code Online (Sandbox Code Playgroud)
请注意,分数比第一个渗透文档要低一些。这可能是这样的,因为foo
(我的电子邮件)仅匹配我分析的其中一个术语foo@bar.com
,而foo@bar.com
会匹配所有术语(从而给出更好的分数)
但不确定你在谈论什么分析仪。我认为上面的例子涵盖了唯一的“分析器”问题/未知,我认为可能有点令人困惑。
归档时间: |
|
查看次数: |
203 次 |
最近记录: |