Spring JPA 批量更新插入很慢(1,000 个实体花了 20 秒)

hyn*_*iia 5 spring jpa bulkinsert batch-processing

当我尝试更新插入测试数据(1,000 个实体)时,花了1m 5s。

\n

所以我看了很多文章,然后我把处理时间减少到20秒

\n

但它对我来说仍然很慢,我相信有比我使用的方法更多的好的解决方案。有没有人有好的做法来处理这个问题?

\n

我还想知道哪个部分使它变慢?

\n
    \n
  1. 持久化上下文
  2. \n
  3. 附加选择
  4. \n
\n

谢谢你!

\n
\n

@实体类

\n

该实体类是从用户手机中收集到用户步行步数的健康数据。

\n

PK为userIdrecorded_atrecorded_atPK来自请求数据)

\n
@Getter\n@NoArgsConstructor\n@IdClass(StepId.class)\n@Entity\npublic class StepRecord {\n    @Id\n    @ManyToOne(targetEntity = User.class, fetch = FetchType.LAZY)\n    @JoinColumn(name = "user_id", referencedColumnName = "id", insertable = false, updatable = false)\n    private User user;\n\n    @Id\n    private ZonedDateTime recordedAt;\n\n    @Column\n    private Long count;\n\n    @Builder\n    public StepRecord(User user, ZonedDateTime recordedAt, Long count) {\n        this.user = user;\n        this.recordedAt = recordedAt;\n        this.count = count;\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

身份类别

\n

中的user字段Id class(here),是UUID类型In Entity class,user 是用户实体类型。一切正常,这会是一个问题吗?

\n
@NoArgsConstructor\n@AllArgsConstructor\n@EqualsAndHashCode\npublic class StepId implements Serializable {\n    @Type(type = "uuid-char")\n    private UUID user;\n    private ZonedDateTime recordedAt;\n}\n
Run Code Online (Sandbox Code Playgroud)\n

请求数据样本

\n
@NoArgsConstructor\n@AllArgsConstructor\n@EqualsAndHashCode\npublic class StepId implements Serializable {\n    @Type(type = "uuid-char")\n    private UUID user;\n    private ZonedDateTime recordedAt;\n}\n
Run Code Online (Sandbox Code Playgroud)\n

数据库数据样本

\n
\n|user_id (same user here)            |recorded_at        |count|\n|------------------------------------|-------------------|-----|\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356  | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update\n\n
Run Code Online (Sandbox Code Playgroud)\n
\n

解决方案 1:使用 Batch 进行 SaveAll()

\n
    \n
  1. 应用程序属性
  2. \n
\n
// I\'ll get user_id from logined user\n// user_id(UUID) like \'a167d363-bfa4-48ae-8d7b-2f6fc84337f0\'\n\n[{\n    "count": 356,\n    "recorded_at": "2020-09-16T04:02:34.822Z"\n},\n{\n    "count": 3912,\n    "recorded_at": "2020-09-16T08:02:34.822Z"\n},\n{\n    "count": 8912,\n    "recorded_at": "2020-09-16T11:02:34.822Z"\n},\n{\n    "count": 9004,\n    "recorded_at": "2020-09-16T11:02:34.822Z" // <-- if duplicated, update\n}\n]\n
Run Code Online (Sandbox Code Playgroud)\n
    \n
  1. 服务
  2. \n
\n
\n|user_id (same user here)            |recorded_at        |count|\n|------------------------------------|-------------------|-----|\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356  | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update\n\n
Run Code Online (Sandbox Code Playgroud)\n

我读过一篇文章,说如果我在实体类中添加“@Version”字段,但它仍然是额外的选择。并且花费了几乎相同的时间(20秒)。

\n

链接在这里 \xe2\x87\x92 https://persistencelayer.wixsite.com/springboot-hibernate/post/the-best-way-to-batch-inserts-via-saveall-iterable-s-entities

\n

但这对我没有帮助。我想我通过数据传递 PK 密钥,所以它总是调用 merge()。

\n

(如果我误解了@Version,请告诉我)

\n
\n

解决方案2:Mysql Native Query(插入~重复键更新~)

\n

我想Insert into ~ on duplicate key update ~ 在 mysql 中本机查询可能比merge() <- select/insert

\n

mysql 本机查询也可能选择检查重复键,但我猜 mysql 引擎优化得很好。

\n
    \n
  1. 存储库
  2. \n
\n
spring:\n  jpa:\n    properties:\n      hibernate:\n        jdbc.batch_size: 20\n        jdbc.batch_versioned_data: true\n        order_inserts: true\n        order_updates: true\n        generate_statistics: true\n
Run Code Online (Sandbox Code Playgroud)\n
    \n
  1. 服务
  2. \n
\n
public void saveBatch(User user, List<StepRecordDto.SaveRequest> requestList) {\n        List<StepRecord> chunk = new ArrayList<>();\n\n        for (int i = 0; i < requestList.size(); i++) {\n            chunk.add(requestList.get(i).toEntity(user));\n\n            if ( ((i + 1) % BATCH_SIZE) == 0 && i > 0) {\n                repository.saveAll(chunk);\n                chunk.clear();\n                //entityManager.flush(); // doesn\'t help\n                //entityManager.clear(); // doesn\'t help \n            }\n        }\n\n        if (chunk.size() > 0) {\n            repository.saveAll(chunk);\n            chunk.clear();\n        }\n    }\n
Run Code Online (Sandbox Code Playgroud)\n
\n

对于 1,000 个实体,两种方法都需要 20 秒。

\n

hyn*_*iia 4

我自己回答了,但我还在等你的意见。

是时候更新插入以使用本机查询了

  • 1,000 个实体 => 0.8 秒
  • 10,000 个实体 => 2.5 ~ 4.2 秒

这比问题中的上述两种方法更快。这是因为数据直接存储在数据库中,而不经过持久化上下文。

优点

  • 不要额外选择
  • 不需要考虑持久化上下文

缺点

  • 不可读?
  • 太原始了?

如何

服务

@RequiredArgsConstructor
@Service
public class StepRecordService {
    private final StepRecordRepository repository;

    @Transactional
    public void save(User user, List<StepRecordDto.SaveRequest> requestList) {
        int chunkSize = 100;
        Iterator<List<StepRecordDto.SaveRequest>> chunkList = StreamUtils.chunk(requestList.stream(), chunkSize);
        chunkList.forEachRemaining(x-> repository.upsert(user, x));
    }
}
Run Code Online (Sandbox Code Playgroud)

StreamUtils 中的 chunk 函数

public class StreamUtils {
    public static <T> Iterator<List<T>> chunk(Stream<T> iterable, int chunkSize) {
        AtomicInteger counter = new AtomicInteger();
        return iterable.collect(Collectors.groupingBy(x -> counter.getAndIncrement() / chunkSize))
                .values()
                .iterator();
    }
}
Run Code Online (Sandbox Code Playgroud)

存储库

@RequiredArgsConstructor
public class StepRecordRepositoryImpl implements StepRecordRepositoryCustom {
    private final EntityManager entityManager;

      @Override
    public void upsert(User user, List<StepRecordDto.SaveRequest> requestList) {
        String insertSql = "INSERT INTO step_record(user_id, recorded_at, count) VALUES ";
        String onDupSql = "ON DUPLICATE KEY UPDATE count = VALUES(count)";
        StringBuilder paramBuilder = new StringBuilder();

          for ( int i = 0; i < current.size(); i ++ ) {
              if (paramBuilder.length() > 0)
                  paramBuilder.append(",");

              paramBuilder.append("(");
              paramBuilder.append(StringUtils.quote(user.getId().toString()));
              paramBuilder.append(",");
              paramBuilder.append(StringUtils.quote(requestList.get(i).getRecordedAt().toLocalDateTime().toString()));
              paramBuilder.append(",");
              paramBuilder.append(requestList.get(i).getCount());
              paramBuilder.append(")");
          }

          Query query = entityManager.createNativeQuery(insertSql + paramBuilder + onDupSql);
          query.executeUpdate();
    }
}
Run Code Online (Sandbox Code Playgroud)