将 Cassandra 3.11 升级到 4.0,失败并显示“地址为 ... 的节点已存在”

evy*_*ars 3 cassandra

我们尝试将apache cassandra 3.11.12升级到4.0.2,这是我们在此集群中升级的第一个节点(种子节点)。我们在替换版本之前耗尽节点并停止服务。

系统日志:

NFO  [RMI TCP Connection(16)-IP] 2022-03-03 15:50:18,811 StorageService.java:1568 - DRAINED
....
....
INFO  [main] 2022-03-03 15:58:02,970 QueryProcessor.java:167 - Preloaded 0 prepared statements
INFO  [main] 2022-03-03 15:58:02,970 StorageService.java:735 - Cassandra version: 4.0.2
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:736 - CQL version: 3.4.5
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:737 - Native protocol supported versions: 3/v3, 4/v4, 5/v5, 6/v6-beta (default: 5/v5)
...
...
WARN  [main] 2022-03-03 15:58:03,328 SystemKeyspace.java:1130 - No host ID found, created d78ab047-f1f9-4a07-8118-2fa83f4571ef (Note: This should happen exactly once per node).
....
...
ERROR [main] 2022-03-03 15:58:04,543 CassandraDaemon.java:911 - Exception encountered during startup
java.lang.RuntimeException: A node with address /HOST_IP:7001 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:660)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:935)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:785)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:730)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,558 HintsService.java:222 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 Gossiper.java:2032 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 MessagingService.java:441 - Waiting for messaging service to quiesce
...
..
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:06,956 HintsService.java:222 - Paused hints dispatch
Run Code Online (Sandbox Code Playgroud)

在启动新的 cassandra 版本之前,我们是否需要在耗尽节点后删除 \rm -rf system* 数据目录(我们没有这样做)?我们如何解决这个问题?

Eri*_*rez 5

在启动期间,Cassandra 尝试通过使用以下命令查询本地系统表来检索主机 ID:

SELECT host_id FROM system.local WHERE key = 'local'
Run Code Online (Sandbox Code Playgroud)

但如果system.local表为空或数据子目录中缺少 SSTable system/local-*/,Cassandra 会假定它是一个全新的节点并分配一个新的主机 ID。然而,在您的情况下,Cassandra 意识到,当它与其他节点进行闲聊时,具有相同 IP 地址的另一个节点已经是集群的一部分。

您需要弄清楚为什么 Cassandra 无法访问本地system.local表。如果有人system/local-*/从数据目录中删除,那么您将无法再次启动该节点。如果是这种情况,您需要从头开始,其中涉及:

  • 擦除 的所有内容data/commitlog/并且saved_caches/
  • 卸载 C* 4.0
  • 重新安装 C* 3.11

然后,您需要使用方法“用其自身”替换replace_address节点。干杯!