在ElasticSearch中对已过滤的嵌套inner_hits查询进行聚合

Question

在ElasticSearch中对已过滤的嵌套inner_hits查询进行聚合

我只有几天新的ElasticSearch,并且作为一个学习练习实施了一个基本的工作刮刀,汇总了一些工作列表网站的工作,并填充了一些索引,我可以使用一些数据.

我的索引包含列出作业的每个网站的文档.每个文档的属性都是"作业"数组,其中包含该站点上存在的每个作业的对象.我正在考虑将每个作业编入索引作为自己的文档(特别是因为ElasticSearch文档说inner_hits是一个实验性功能)但是现在,我试图看看我是否可以使用ElasticSearch的inner_hits和嵌套功能完成我想做的事情.

我能够查询,过滤和返回仅匹配的作业.但是,我不确定如何将相同的inner_hits约束应用于聚合.

这是我的映射:

{
  "jobsitesIdx" : {
    "mappings" : {
      "sites" : {
        "properties" : {
          "createdAt" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "jobs" : {
            "type" : "nested",
            "properties" : {
              "company" : {
                "type" : "string"
              },
              "engagement" : {
                "type" : "string"
              },
              "link" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "location" : {
                "type" : "string",
                "fields" : {
                  "raw" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  }
                }
              },
              "title" : {
                "type" : "string"
              }
            }
          },
          "jobscount" : {
            "type" : "long"
          },
          "sitename" : {
            "type" : "string"
          },
          "url" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

这是我正在尝试的查询和聚合(来自Node.js):

client.search({
  "index": 'jobsitesIdx,
  "type": 'sites',
  "body": {


    "aggs" : {
            "jobs" : {
                "nested" : {
                    "path" : "jobs"
                },
                "aggs" : {
                    "location" : { "terms" : { "field" : "jobs.location.raw", "size": 25 } },
                    "company" : { "terms" : { "field" : "jobs.company.raw", "size": 25 } }
                }
            }
        },


    "query": {
        "filtered": {
          "query": {"match_all": {}},
          "filter": {
            "nested": {
              "inner_hits" : { "size": 1000 },
              "path": "jobs",
              "query":{
                "filtered": {
                  "query": { "match_all": {}},
                  "filter": {
                    "and": [
                      {"term": {"jobs.location": "york"}},
                      {"term": {"jobs.location": "new"}}
                    ]
                  }
                }
              }
            }
          }
        }
      }
  }
}, function (error, response) {
    response.hits.hits.forEach(function(jobsite) {
    jobs = jobsite.inner_hits.jobs.hits.hits;

    jobs.forEach(function(job) {
        console.log(job);
    });

});

    console.log(response.aggregations.jobs.location.buckets);
});

Run Code Online (Sandbox Code Playgroud)

这让我回到了纽约的所有内部工作岗位,但是汇总显示了每个地点和公司的数量,而不仅仅是与inner_hits相匹配的数量.

有关如何仅在匹配的inner_hits中包含的数据上获取聚合的任何建议？

编辑:我正在更新此项以包括根据请求导出映射和索引数据.我使用Taskrabbit的elasticdump工具导出了这个,可以在这里找到:https: //github.com/taskrabbit/elasticsearch-dump

索引:http://pastebin.com/WaZwBwn4 映射:http://pastebin.com/ZkGnYN94

上述链接数据与我原始问题中的示例代码的不同之处在于,索引在数据中命名为jobsites6,而不是问题中提到的jobsitesIdx.此外,数据中的类型是"作业",而在上面的代码中,它是"站点".

我在上面的代码中填写了回调来显示响应数据.正如预期的那样,我只能从inner_hits的foreach循环中看到纽约的工作,但是我看到这个聚合位置:

[ { key: 'New York, NY', doc_count: 243 },
  { key: 'San Francisco, CA', doc_count: 92 },
  { key: 'Chicago, IL', doc_count: 43 },
  { key: 'Boston, MA', doc_count: 39 },
  { key: 'Berlin, Germany', doc_count: 22 },
  { key: 'Seattle, WA', doc_count: 22 },
  { key: 'Los Angeles, CA', doc_count: 20 },
  { key: 'Austin, TX', doc_count: 18 },
  { key: 'Anywhere', doc_count: 16 },
  { key: 'Cupertino, CA', doc_count: 15 },
  { key: 'Washington D.C.', doc_count: 14 },
  { key: 'United States', doc_count: 11 },
  { key: 'Atlanta, GA', doc_count: 10 },
  { key: 'London, UK', doc_count: 10 },
  { key: 'Ulm, Deutschland', doc_count: 10 },
  { key: 'Riverton, UT', doc_count: 9 },
  { key: 'San Diego, CA', doc_count: 9 },
  { key: 'Charlotte, NC', doc_count: 8 },
  { key: 'Irvine, CA', doc_count: 8 },
  { key: 'London', doc_count: 8 },
  { key: 'San Mateo, CA', doc_count: 8 },
  { key: 'Boulder, CO', doc_count: 7 },
  { key: 'Houston, TX', doc_count: 7 },
  { key: 'Palo Alto, CA', doc_count: 7 },
  { key: 'Sydney, Australia', doc_count: 7 } ]

Run Code Online (Sandbox Code Playgroud)

由于我的inner_hits仅限于纽约的那些,我可以看到聚合不在我的inner_hits上,因为它给了我所有位置的计数.

Answer 1

Val*_*Val 13

您可以通过在聚合中添加相同的过滤器来实现此目的,仅包括纽约作业.另请注意,在您的第二个聚合中,您company.raw在映射中该jobs.company字段没有not_analyzed命名部分raw,因此如果要在未分析的公司名称上进行聚合,则可能需要添加它.

{
  "_source": [
    "sitename"
  ],
  "query": {
    "filtered": {
      "filter": {
        "nested": {
          "inner_hits": {
            "size": 1000
          },
          "path": "jobs",
          "query": {
            "filtered": {
              "filter": {
                "terms": {
                  "jobs.location": [
                    "new",
                    "york"
                  ]
                }
              }
            }
          }
        }
      }
    }
  },
  "aggs": {
    "jobs": {
      "nested": {
        "path": "jobs"
      },
      "aggs": {
        "only_loc": {
          "filter": {            <----- add this filter
            "terms": {
              "jobs.location": [
                "new",
                "york"
              ]
            }
          },
          "aggs": {
            "location": {
              "terms": {
                "field": "jobs.location.raw",
                "size": 25
              }
            },
            "company": {
              "terms": {
                "field": "jobs.company",
                "size": 25
              }
            }
          }
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	3471 次
最近记录：	10 年，5 月前