hyn*_*iia 5 spring jpa bulkinsert batch-processing
当我尝试更新插入测试数据(1,000 个实体)时,花了1m 5s。
\n所以我看了很多文章,然后我把处理时间减少到20秒。
\n但它对我来说仍然很慢,我相信有比我使用的方法更多的好的解决方案。有没有人有好的做法来处理这个问题?
\n我还想知道哪个部分使它变慢?
\n谢谢你!
\n该实体类是从用户手机中收集到用户步行步数的健康数据。
\nPK为userId和recorded_at(recorded_atPK来自请求数据)
@Getter\n@NoArgsConstructor\n@IdClass(StepId.class)\n@Entity\npublic class StepRecord {\n @Id\n @ManyToOne(targetEntity = User.class, fetch = FetchType.LAZY)\n @JoinColumn(name = "user_id", referencedColumnName = "id", insertable = false, updatable = false)\n private User user;\n\n @Id\n private ZonedDateTime recordedAt;\n\n @Column\n private Long count;\n\n @Builder\n public StepRecord(User user, ZonedDateTime recordedAt, Long count) {\n this.user = user;\n this.recordedAt = recordedAt;\n this.count = count;\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n中的user字段Id class(here),是UUID类型。In Entity class,user 是用户实体类型。一切正常,这会是一个问题吗?
@NoArgsConstructor\n@AllArgsConstructor\n@EqualsAndHashCode\npublic class StepId implements Serializable {\n @Type(type = "uuid-char")\n private UUID user;\n private ZonedDateTime recordedAt;\n}\nRun Code Online (Sandbox Code Playgroud)\n@NoArgsConstructor\n@AllArgsConstructor\n@EqualsAndHashCode\npublic class StepId implements Serializable {\n @Type(type = "uuid-char")\n private UUID user;\n private ZonedDateTime recordedAt;\n}\nRun Code Online (Sandbox Code Playgroud)\n\n|user_id (same user here) |recorded_at |count|\n|------------------------------------|-------------------|-----|\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update\n\nRun Code Online (Sandbox Code Playgroud)\n// I\'ll get user_id from logined user\n// user_id(UUID) like \'a167d363-bfa4-48ae-8d7b-2f6fc84337f0\'\n\n[{\n "count": 356,\n "recorded_at": "2020-09-16T04:02:34.822Z"\n},\n{\n "count": 3912,\n "recorded_at": "2020-09-16T08:02:34.822Z"\n},\n{\n "count": 8912,\n "recorded_at": "2020-09-16T11:02:34.822Z"\n},\n{\n "count": 9004,\n "recorded_at": "2020-09-16T11:02:34.822Z" // <-- if duplicated, update\n}\n]\nRun Code Online (Sandbox Code Playgroud)\n\n|user_id (same user here) |recorded_at |count|\n|------------------------------------|-------------------|-----|\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert\n|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update\n\nRun Code Online (Sandbox Code Playgroud)\n我读过一篇文章,说如果我在实体类中添加“@Version”字段,但它仍然是额外的选择。并且花费了几乎相同的时间(20秒)。
\n链接在这里 \xe2\x87\x92 https://persistencelayer.wixsite.com/springboot-hibernate/post/the-best-way-to-batch-inserts-via-saveall-iterable-s-entities
\n但这对我没有帮助。我想我通过数据传递 PK 密钥,所以它总是调用 merge()。
\n(如果我误解了@Version,请告诉我)
\n我想Insert into ~ on duplicate key update ~ 在 mysql 中本机查询可能比merge() <- select/insert
mysql 本机查询也可能选择检查重复键,但我猜 mysql 引擎优化得很好。
\nspring:\n jpa:\n properties:\n hibernate:\n jdbc.batch_size: 20\n jdbc.batch_versioned_data: true\n order_inserts: true\n order_updates: true\n generate_statistics: true\nRun Code Online (Sandbox Code Playgroud)\npublic void saveBatch(User user, List<StepRecordDto.SaveRequest> requestList) {\n List<StepRecord> chunk = new ArrayList<>();\n\n for (int i = 0; i < requestList.size(); i++) {\n chunk.add(requestList.get(i).toEntity(user));\n\n if ( ((i + 1) % BATCH_SIZE) == 0 && i > 0) {\n repository.saveAll(chunk);\n chunk.clear();\n //entityManager.flush(); // doesn\'t help\n //entityManager.clear(); // doesn\'t help \n }\n }\n\n if (chunk.size() > 0) {\n repository.saveAll(chunk);\n chunk.clear();\n }\n }\nRun Code Online (Sandbox Code Playgroud)\n对于 1,000 个实体,两种方法都需要 20 秒。
\n我自己回答了,但我还在等你的意见。
这比问题中的上述两种方法更快。这是因为数据直接存储在数据库中,而不经过持久化上下文。
优点
缺点
服务
@RequiredArgsConstructor
@Service
public class StepRecordService {
private final StepRecordRepository repository;
@Transactional
public void save(User user, List<StepRecordDto.SaveRequest> requestList) {
int chunkSize = 100;
Iterator<List<StepRecordDto.SaveRequest>> chunkList = StreamUtils.chunk(requestList.stream(), chunkSize);
chunkList.forEachRemaining(x-> repository.upsert(user, x));
}
}
Run Code Online (Sandbox Code Playgroud)
StreamUtils 中的 chunk 函数
public class StreamUtils {
public static <T> Iterator<List<T>> chunk(Stream<T> iterable, int chunkSize) {
AtomicInteger counter = new AtomicInteger();
return iterable.collect(Collectors.groupingBy(x -> counter.getAndIncrement() / chunkSize))
.values()
.iterator();
}
}
Run Code Online (Sandbox Code Playgroud)
存储库
@RequiredArgsConstructor
public class StepRecordRepositoryImpl implements StepRecordRepositoryCustom {
private final EntityManager entityManager;
@Override
public void upsert(User user, List<StepRecordDto.SaveRequest> requestList) {
String insertSql = "INSERT INTO step_record(user_id, recorded_at, count) VALUES ";
String onDupSql = "ON DUPLICATE KEY UPDATE count = VALUES(count)";
StringBuilder paramBuilder = new StringBuilder();
for ( int i = 0; i < current.size(); i ++ ) {
if (paramBuilder.length() > 0)
paramBuilder.append(",");
paramBuilder.append("(");
paramBuilder.append(StringUtils.quote(user.getId().toString()));
paramBuilder.append(",");
paramBuilder.append(StringUtils.quote(requestList.get(i).getRecordedAt().toLocalDateTime().toString()));
paramBuilder.append(",");
paramBuilder.append(requestList.get(i).getCount());
paramBuilder.append(")");
}
Query query = entityManager.createNativeQuery(insertSql + paramBuilder + onDupSql);
query.executeUpdate();
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5755 次 |
| 最近记录: |