MongoDB 复制：进入维护模式，正在进行 10333 个其他维护模式任务

Question

MongoDB 复制：进入维护模式，正在进行 10333 个其他维护模式任务

Jos*_*ine 5 mongodb maintenance data-synchronization mongodb-3.2

我有一个需要重新同步的 MongoDB 实例。

2016-11-07T11:59:23.330+0000 I REPL     [ReplicationExecutor] syncing from: x.x.x.x:27017
2016-11-07T11:59:23.354+0000 W REPL     [rsBackgroundSync] we are too stale to use x.x.x.x:27017 as a sync source
2016-11-07T11:59:23.354+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2016-11-07T11:59:23.354+0000 E REPL     [rsBackgroundSync] too stale to catch up -- entering maintenance mode
2016-11-07T11:59:23.354+0000 I REPL     [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct  4 07:41:29:1)
2016-11-07T11:59:23.354+0000 I REPL     [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)
2016-11-07T11:59:23.354+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2016-11-07T11:59:23.355+0000 I REPL     [ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progress

Run Code Online (Sandbox Code Playgroud)

这条线是什么意思？

[ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progress

Run Code Online (Sandbox Code Playgroud)

什么是维护模式任务？没有来自 MongoDB 的文档。为什么有10333排队？如何查看它们（列表）？使用搜索引擎，我还发现了日志条目with 0 other maintenance mode tasks in progress

Answer 1

Ste*_*nie 4

什么是维护模式任务？

replSetMaintenance“维护模式任务”消息是指连续调用命令的计数器，并且（在 MongoDB 3.4 中）与特定的排队任务无关。该replSetMaintenance命令用于在完成一些维护工作时使辅助设备保持在 RECOVERING 状态。RECOVERING 成员保持在线并可能同步，但被排除在正常读取操作之外（例如，使用驱动程序的辅助读取首选项）。每次调用replSetMaintenance都会增加任务计数器（如果true）或减少任务计数器（如果false）。当计数器达到 0 时，成员将从 RECOVERING 转换回 SECONDARY 状态（假设其健康）。

与 MongoDB 3.4 一样，维护模式的更改目前仅记录在 MongoDB 日志中。该命令通常仅由内部使用mongod，但您也可以手动调用它。

下面是一组带注释的日志行和相关的mongoshell 命令，显示任务计数器的变化：

// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress
[ReplicationExecutor] transition to RECOVERING

// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 1 other maintenance mode tasks in progress

// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (1 other maintenance mode tasks ongoing)

// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (0 other maintenance mode tasks ongoing)
[ReplicationExecutor] transition to SECONDARY

// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] Attempted to leave maintenance mode but it is not currently active

Run Code Online (Sandbox Code Playgroud)

为什么有10333个在排队？

在 MongoDB 3.2 中，变得“太陈旧”的副本集成员（即与副本集的另一个健康成员没有任何共同的 oplog 条目）将保持在 RECOVERING 模式，并定期检查新的有效同步源是否可用。当前，每次检查都会增加“维护任务”计数器，因此，如果成员已过时，这实际上并不表示有意义的任务数量。

理论上，“太陈旧”并不是一种最终状态，因为可以想象，具有较大 oplog 的成员可能会暂时离线；实际上，“太陈旧而无法捕获错误”通常意味着需要手动重新同步。

2016-11-07T11:59:23.354+0000 I REPL     [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct  4 07:41:29:1)
2016-11-07T11:59:23.354+0000 I REPL     [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)

Run Code Online (Sandbox Code Playgroud)

在这种情况下，有问题的副本集成员在大约两周前就已经过时了，因此维护模式计数器随着时间的推移不断增加。MongoDB Jira 中有一个相关问题，您可以观看/投票：SERVER 23899: Reset Maintenance mode when conversion from Too-stale to validsync source。

归档时间：	9 年，7 月前
查看次数：	2529 次
最近记录：	9 年，6 月前