我有一个带有 6 个副本集的 mongo 集群。5个可以,一个不行。每个副本集有三个成员。这是rs.status()
它的原因:
{
"set" : "rs_5",
"date" : ISODate("2015-12-16T02:37:39Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "mongo_rs_5_member_1:27018",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 33600,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-12-16T02:37:38Z"),
"lastHeartbeatRecv" : ISODate("2015-12-16T02:37:37Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
},
{
"_id" : 1,
"name" : "mongo_rs_5_member_2:27019",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 33842,
"optime" : Timestamp(1449898728, 18),
"optimeDate" : ISODate("2015-12-12T05:38:48Z"),
"lastHeartbeat" : ISODate("2015-12-16T02:37:37Z"),
"lastHeartbeatRecv" : ISODate("2015-12-16T02:37:37Z"),
"pingMs" : 3,
"lastHeartbeatMessage" : "still syncing, not yet to minValid optime 566bb328:3"
},
{
"_id" : 2,
"name" : "mongo_rs_5_member_3:27020",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 33845,
"optime" : Timestamp(1449898728, 18),
"optimeDate" : ISODate("2015-12-12T05:38:48Z"),
"errmsg" : "still syncing, not yet to minValid optime 566bb327:1",
"self" : true
}
],
"ok" : 1
}
Run Code Online (Sandbox Code Playgroud)
在日志中,我看到如下内容:
Wed Dec 16 02:40:34.033 [rsMgr] replSet I don't see a primary and I can't elect myself
Run Code Online (Sandbox Code Playgroud)
和
Tue Dec 15 21:41:27.686 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync
Run Code Online (Sandbox Code Playgroud)
这是 rs.conf():
{
"_id" : "rs_5",
"version" : 125967,
"members" : [
{
"_id" : 0,
"host" : "mongo_rs_5_member_1:27018",
"priority" : 3
},
{
"_id" : 1,
"host" : "mongo_rs_5_member_2:27019",
"priority" : 2
},
{
"_id" : 2,
"host" : "mongo_rs_5_member_3:27020"
}
]
}
Run Code Online (Sandbox Code Playgroud)
好几天都是这样。cpu 和网络没有显示任何实际运动,表明没有发生任何事情。显然,我不想丢失数据,我需要做什么才能让它恢复到健康的 PRIMARY/SECONDARY/SECONDARY 副本集。
我能够通过Breaking the Mirror解决这个问题。本质上,我选择了其中一个成员,将其关闭,删除 /data/local* 文件,打开它,然后执行rs.initiate()
. 在这一点上,我是 1(我自己)和主要(显然)的副本集。然后,对于其他两个人,我将它们关闭,擦除他们的整个 /data/* 文件并重新打开它们。从最初的主要成员中,我只是添加了两个带有rs.add("mongo_rs_5_member_1:27018")
和 的新人rs.add("mongo_rs_5_member_2:27019")
。然后主将所有内容同步给其他人(很多小时),副本集是健康的。相关应用程序中不再有错误。