我有以下示例mappipng:
{
"book" : {
"properties" : {
"author" : { "type" : "string" },
"title" : { "type" : "string" },
"reviews" : {
"properties" : {
"url" : { "type" : "string" },
"score" : { "type" : "integer" }
}
},
"chapters" : {
"include_in_root" : 1,
"type" : "nested",
"properties" : {
"name" : { "type" : "string" }
}
}
}
}
}
我想对评论数量进行分析 - 即"评论"数组的长度.例如,我需要的口头语言结果是:"100篇文章有10篇评论,20篇文档有5篇评论,......"
我正在尝试以下统计方面:
{
"query" : {
"match_all" : {}
},
"facets" : {
"stat1" : {
"statistical" : {"script" : "doc['reviews.score'].values.size()"}
}
}
}
但它一直在失败:
{
"error" : "SearchPhaseExecutionException[Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w][facettest][0]: QueryPhaseExecutionException[[facettest][0]: query[ConstantScore(NotDeleted(cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@a2a5984b)))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near : {... doc[reviews.score].values.size() ....}]
^
[Line: 1, Column: 5]]; }]",
"status" : 500
}
我怎样才能实现目标?
ElasticSearch版本为0.19.9.
这是我的示例数据:
{
"author" : "Mark Twain",
"title" : "The Adventures of Tom Sawyer",
"reviews" : [
{
"url" : "amazon.com",
"score" : 10
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
{
"author" : "Jack London",
"title" : "The Call of the Wild",
"reviews" : [
{
"url" : "amazon.com",
"score" : 8
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
},
{
"url" : "www.books.com",
"score" : 5
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
看起来你正在使用curl来执行你的查询,这个curl语句如下所示:curl localhost:9200/my-index/book -d'{....}'
这里的问题是因为你使用撇号来包装请求的主体,你需要转义它包含的所有撇号.所以,你的脚本应该成为:
{"script" : "doc['\''reviews.score'\''].values.size()"}
Run Code Online (Sandbox Code Playgroud)
要么
{"script" : "doc[\"reviews.score"].values.size()"}
Run Code Online (Sandbox Code Playgroud)
第二个问题是,根据您的描述,您看起来正在寻找直方图方面或范围方面,而不是统计方面.所以,我建议尝试这样的事情:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"key_script" : "doc[\"reviews.score\"].values.size()",
"value_script" : "doc[\"reviews.score\"].values.size()",
"interval" : 1
}
}
}
}'
Run Code Online (Sandbox Code Playgroud)
第三个问题是,将为结果列表中的每个记录调用构面中的脚本,如果您有很多结果,则可能需要很长时间.所以,我建议索引一个额外的字段number_of_reviews,该字段应填充客户端的评论数量.然后您的查询将变为:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"field" : "number_of_reviews"
"interval" : 1
}
}
}
}'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1362 次 |
| 最近记录: |