我一直在尝试使用facet来获得字段的术语频率.我的查询只返回一个匹配,所以我希望facet返回特定字段中频率最高的条件.
我的映射:
{
"mappings":{
"document":{
"properties":{
"tags":{
"type":"object",
"properties":{
"title":{
"fields":{
"partial":{
"search_analyzer":"main",
"index_analyzer":"partial",
"type":"string",
"index" : "analyzed"
}
"title":{
"type":"string",
"analyzer":"main",
"index" : "analyzed"
}
},
"type":"multi_field"
}
}
}
}
}
},
"settings":{
"analysis":{
"filter":{
"name_ngrams":{
"side":"front",
"max_gram":50,
"min_gram":2,
"type":"edgeNGram"
}
},
"analyzer":{
"main":{
"filter": ["standard", "lowercase", "asciifolding"],
"type": "custom",
"tokenizer": "standard"
},
"partial":{
"filter":["standard","lowercase","asciifolding","name_ngrams"],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
测试数据:
curl -XPUT localhost:9200/testindex/document -d '{"tags": {"title": "people also kill people"}}'
Run Code Online (Sandbox Code Playgroud)
查询:
curl -XGET 'localhost:9200/testindex/document/_search?pretty=1' -d '
{
"query":
{
"term": { "tags.title": "people" }
},
"facets": {
"popular_tags": { "terms": {"field": "tags.title"}}
}
}'
Run Code Online (Sandbox Code Playgroud)
这个结果
"hits" : {
"total" : 1,
"max_score" : 0.99381393,
"hits" : [ {
"_index" : "testindex",
"_type" : "document",
"_id" : "uI5k0wggR9KAvG9o7S7L2g",
"_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
} ]
},
"facets" : {
"popular_tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "people",
"count" : 1 // I expect this to be 2
}, {
"term" : "kill",
"count" : 1
}, {
"term" : "also",
"count" : 1
} ]
}
Run Code Online (Sandbox Code Playgroud)
}
以上结果不是我想要的.我希望频率计数为2
"hits" : {
"total" : 1,
"max_score" : 0.99381393,
"hits" : [ {
"_index" : "testindex",
"_type" : "document",
"_id" : "uI5k0wggR9KAvG9o7S7L2g",
"_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}}
} ]
},
"facets" : {
"popular_tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "people",
"count" : 2
}, {
"term" : "kill",
"count" : 1
}, {
"term" : "also",
"count" : 1
} ]
}
}
Run Code Online (Sandbox Code Playgroud)
我该如何实现这一目标?是错误的方式去?
方面计算文档,而不是属于它们的条款.你得到1,因为只有一个文档包含该术语,发生的次数并不重要.我不知道一个开箱即用的方式来返回术语频率,这个方面不是一个好选择.
如果启用术语向量,该信息可以存储在索引中,但是现在无法从elasticsearch读取术语向量.
| 归档时间: |
|
| 查看次数: |
5751 次 |
| 最近记录: |