在spring data elasticsearch中，聚合查询可以放在repository实现中吗？

Question

在spring data elasticsearch中，聚合查询可以放在repository实现中吗？

Dam*_*mon 3 elasticsearch spring-data-elasticsearch

我是第一次使用 spring-boot-elasticsearch。我现在已经弄清楚如何使用 elastics java api 描述我的串行差异管道查询。正如您将在下面看到的，此查询相当大，并为每个对象返回多个存储桶以及每个存储桶之间的序列差异。我在 Spring Data Repository 中看到的搜索示例似乎都在查询注释中拼出了查询的 json 主体，如下所示：

@Repository
public interface SonarMetricRepository extends ElasticsearchRepository<Article, String> {

    @Query("{\"bool\": {\"must\": {\"match\": {\"authors.name\": \"?0\"}}, \"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
    Page<Article> findByAuthorsNameAndFilteredTagQuery(String name, String tag, Pageable pageable);
}

Run Code Online (Sandbox Code Playgroud)

这对于基本的 CRUD 操作来说似乎很优雅，但是如何将下面的查询放入存储库对象中，而无需使用 @Query 的原始查询语法？如果您有一个类似的示例，说明 Model 对象为串行差异查询结果或任何管道聚合构建的内容，那也会更有帮助。基本上我想要一个像这样在我的存储库中的搜索方法

Page<Serial Difference Result Object> getCodeCoverageMetrics(String projectKey, Date start, Date end, String interval, int lag);

Run Code Online (Sandbox Code Playgroud)

我应该提到我想使用这个对象的部分原因是我在这里也会有其他 CRUD 查询，而且我认为它会为我处理分页，所以这很有吸引力。

这是我的查询，它显示了 1 周时间段内声纳项目的代码覆盖率之间的序列差异：

        SerialDiffPipelineAggregationBuilder serialDiffPipelineAggregationBuilder =
            PipelineAggregatorBuilders
                    .diff("Percent_Change", "avg_coverage")
                    .lag(1);

    AvgAggregationBuilder averageCoverageAggregationBuilder = AggregationBuilders
            .avg("avg_coverage")
            .field("coverage");

    AggregationBuilder coverageHistoryAggregationBuilder = AggregationBuilders
            .dateHistogram("coverage_history")
            .field("@timestamp")
            .calendarInterval(DateHistogramInterval.WEEK)
            .subAggregation(averageCoverageAggregationBuilder)
            .subAggregation(serialDiffPipelineAggregationBuilder);

    TermsAggregationBuilder sonarProjectKeyAggregationBuilder = AggregationBuilders
            .terms("project_key")
            .field("key.keyword")
            .subAggregation(coverageHistoryAggregationBuilder);

    BoolQueryBuilder searchQuery = new BoolQueryBuilder()
            .filter(matchAllQuery())
            .filter(matchPhraseQuery("name.keyword", "my-sample-sonar-project"))
            .filter(rangeQuery("@timestamp")
                    .format("strict_date_optional_time")
                    .gte("2020-07-08T19:29:12.054Z")
                    .lte("2020-07-15T19:29:12.055Z"));

    // Join query and aggregation together
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
            .query(searchQuery)
            .aggregation(sonarProjectKeyAggregationBuilder);

    SearchRequest searchRequest = new SearchRequest("sonarmetrics").source(searchSourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

Run Code Online (Sandbox Code Playgroud)

Answer 1

P.J*_*sch 10

好的，所以如果我猜对了，您想向存储库查询添加聚合。这对于 Spring Data Elasticsearch 自动创建的方法是不可能的，但实现起来并不难。

为了向您展示如何执行此操作，我使用了一个更简单的示例，其中我们定义了一个Person实体：

@Document(indexName = "person")
public class Person {

    @Id
    @Nullable
    private Long id;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String lastName;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String firstName;

    // getter/setter
}

Run Code Online (Sandbox Code Playgroud)

还有一个对应的存储库：

public interface PersonRepository extends ElasticsearchRepository<Person, Long>{
}

Run Code Online (Sandbox Code Playgroud)

我们现在想要扩展这个存储库，以便能够搜索有名字的人，并为这些人返回前 10 名的前 10 个姓氏和计数（lastNames 上的术语 aggs）。

首先要做的是定义一个描述您需要的方法的自定义存储库：

interface PersonCustomRepository {
    SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable);
}

Run Code Online (Sandbox Code Playgroud)

我们想传入 aPageable以便方法返回数据页。我们返回一个SearchPage对象检查返回类型的文档，该文档将包含分页信息和SearchHits<Person>. 该对象然后具有聚合信息和结果数据。

然后我们更改PersonRepository以扩展这个新接口：

public interface PersonRepository extends ElasticsearchRepository<Person, Long>, PersonCustomRepository {
}

Run Code Online (Sandbox Code Playgroud)

当然，我们现在需要在一个名为的类中提供一个实现PersonCustomRepositoryImpl（必须像添加了Impl的接口那样命名）：

public class PersonCustomRepositoryImpl implements PersonCustomRepository {

    private final ElasticsearchOperations operations;

    public PersonCustomRepositoryImpl(ElasticsearchOperations operations) { // let Spring inject an operations which we use to do the work
        this.operations = operations;
    }

    @Override
    public SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable) {

        Query query = new NativeSearchQueryBuilder()                       // we build a Elasticsearch native query
            .addAggregation(terms("lastNames").field("lastName").size(10)) // add the aggregation
            .withQuery(QueryBuilders.matchQuery("firstName", firstName))   // add the query part
            .withPageable(pageable)                                        // add the requested page
            .build();

        SearchHits<Person> searchHits = operations.search(query, Person.class);  // send it of and get the result

        return SearchHitSupport.searchPageFor(searchHits, pageable);  // convert the result to a SearchPage
    }
}

Run Code Online (Sandbox Code Playgroud)

这就是执行搜索的全部内容。现在存储库有这个额外的方法。如何使用它？

对于这个演示，我假设我们有一个 REST 控制器，它接受一个名称并返回一对：

找到的人作为SearchHit<Person>对象列表
aMap<String, Long>包含姓氏及其计数

这可以按如下方式实现，注释描述了所做的事情：

@GetMapping("persons/firstNameWithLastNameCounts/{firstName}")
public Pair<List<SearchHit<Person>>, Map<String, Long>> firstNameWithLastNameCounts(@PathVariable("firstName") String firstName) {

    // helper function to get the lastName counts from an Elasticsearch Aggregations
    // Spring Data Elasticsearch does not have functions for that, so we need to know what is coming back
    Function<Aggregations, Map<String, Long>> getLastNameCounts = aggregations -> {
        if (aggregations != null) {
            Aggregation lastNames = aggregations.get("lastNames");
            if (lastNames != null) {
                List<? extends Terms.Bucket> buckets = ((Terms) lastNames).getBuckets();
                if (buckets != null) {
                    return buckets.stream().collect(Collectors.toMap(Terms.Bucket::getKeyAsString, Terms.Bucket::getDocCount));
                }
            }
        }
        return Collections.emptyMap();
    };

    // the parts of the returned object
    Map<String, Long> lastNameCounts = null;
    List<SearchHit<Person>> searchHits = new ArrayList<>();

    // request pages of size 1000
    Pageable pageable = PageRequest.of(0, 1000);
    boolean fetchMore = true;
    while (fetchMore) {
        // call the custom method implementation
        SearchPage<Person> searchPage = personRepository.findByFirstNameWithLastNameCounts(firstName, pageable);

        // get the aggregations on the first call, will be the same on the other pages
        if (lastNameCounts == null) {
            Aggregations aggregations = searchPage.getSearchHits().getAggregations();
            lastNameCounts = getLastNameCounts.apply(aggregations);
        }

        // collect the returned data
        if (searchPage.hasContent()) {
            searchHits.addAll(searchPage.getContent());
        }

        pageable = searchPage.nextPageable();
        fetchMore = searchPage.hasNext();
    }

    // return the collected stuff
    return Pair.of(searchHits, lastNameCounts);
}

Run Code Online (Sandbox Code Playgroud)

我希望这可以让您了解如何实现自定义存储库功能并添加开箱即用的功能。

归档时间：	5 年，10 月前
查看次数：	3879 次
最近记录：	5 年，10 月前