我在这个集群上遇到了标题中的场景:
这是事件的顺序:
OPTIMIZE TABLE .... PARTITION .... FINAL在每个分片的一个节点上执行了一次。该分区相当大 (120Gb),因此该过程将花费超过一小时的时间。90-20220530_0_1210623_1731部分确实被 OPTIMIZE 语句生成的合并所覆盖SELECT
replica_name,
postpone_reason,
type
FROM system.replication_queue
Run Code Online (Sandbox Code Playgroud)
(已格式化)
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035589 for part 90-20220530_0_1210420_1730 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035590 for part 90-20220530_1210421_1210598_37 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason: Not executing log entry queue-0055035591 for part 90-20220530_1210599_1210623_6 because it is covered by part 90-20220530_0_1210623_1731 that is currently executing.
type: GET_PART
replica_name: snuba-errors-tiger-4-4
postpone_reason:
type: MERGE_PARTS
Run Code Online (Sandbox Code Playgroud)
这是正常行为吗?如果是,有没有办法防止长合并在重新启动时阻止复制?
max_replica_delay_for_distributed_queries在集群上设置为 300 秒。我原以为 1:30 分钟的延迟会被忽略,但事实似乎并非如此,因为没有查询被路由到受影响的节点。还有另一种方法可以告诉 Clickhouse 忽略复制延迟吗?
谢谢菲利波
| 归档时间: |
|
| 查看次数: |
341 次 |
| 最近记录: |