小编thi*_*ava的帖子

Spring Data Elasticsearch 批量索引/删除 - 数百万条记录

我正在使用 Spring Data Elasticsearch 4.2.5，我们有一项工作对特定的数据库表执行 ETL（提取、转换和加载数据）。我在作业运行时使用 Elasticsearch 为这些数据建立索引。数据将达到数百万条甚至更多。目前，我正在对每次迭代进行索引。我读到，在每次迭代中使用 elasticsearch 索引可能需要一些时间。我想使用像bulk-index这样的东西，但为此我需要将indexQuery对象添加到List中。添加数百万条记录到列表并进行批量索引可能会带来内存问题。

我需要应用类似的删除过程。当根据一些常见的ID删除记录时，我需要删除相关的弹性文档，这也将是数百万甚至更多。

无论如何，是否可以针对此要求非常快速地进行索引/删除？非常感谢任何帮助，如果我的理解不正确，请纠正我。

索引

for (Map.Entry<Integer, ObjectDetails> key : objectDetailsHashMap.entrySet()) {
    indexDocument(elasticsearchOperations, key, oPath);
    // other code to insert data in db table...
 }

private void indexDocument(ElasticsearchOperations elasticsearchOperations,
                              Map.Entry<Integer, ObjectDetails> key, String oPath) {
    String docId = "" + key.getValue().getCatalogId() + key.getValue().getObjectId();

    byte[] nameBytes = key.getValue().getName();
    byte[] physicalNameBytes = key.getValue().getPhysicalName();
    byte[] definitionBytes =  key.getValue().getDefinition();
    byte[] commentBytes = key.getValue().getComment();

    IndexQuery indexQuery = new IndexQueryBuilder()
            .withId(docId)
            .withObject(new MetadataSearch(
                    key.getValue().getObjectId(),
                    key.getValue().getCatalogId(),
                    key.getValue().getParentId(),
                    key.getValue().getTypeCode(),
                    key.getValue().getStartVersion(), …

Run Code Online (Sandbox Code Playgroud)

java elasticsearch spring-data-elasticsearch

thi*_*ava

2021 09-29

4
推荐指数

1
解决办法

3119
查看次数

标签统计

elasticsearch ×1

java ×1

spring-data-elasticsearch ×1

Spring Data Elasticsearch 批量索引/删除 - 数百万条记录

标签 统计

小编thi_ava的帖子

标签统计