Kafka主题具有leader = -1(Kafka Leader Election)的分区,而节点已启动并正在运行

Spi*_*Xel 8 apache-kafka apache-zookeeper kafka-topic

我有一个3成员的kafka-cluster设置,__ consumer_offsets主题有50个分区.

以下是describe命令的结果:

root@kafka-cluster-0:~# kafka-topics.sh --zookeeper localhost:2181 --describe
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 4    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 5    Leader: 0   Replicas: 0 Isr: 0
    ...
    ...
Run Code Online (Sandbox Code Playgroud)

成员是节点0,1和2.

很明显,replica = 2中的分区没有为它们设置领导,并且它们的领导者= -1

我想知道是什么导致了这个问题,我重新启动了第二个成员kafka服务,但我从未想过它会产生这种副作用.

现在,所有节点都已经运行了几个小时,这是ls broker/ids的结果:

/home/kafka/bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[0, 1, 2]
Run Code Online (Sandbox Code Playgroud)

此外,群集中有许多主题,并且节点2不是其中任何一个的领导者,只要它只有数据(replication-factor = 1,并且在此节点上托管分区),leader = -1,显而易见从下面.

Here, node 2 is in the ISR, but never a leader, since replication-factor=2.
Topic:upstream-t2   PartitionCount:20   ReplicationFactor:2 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t2  Partition: 0    Leader: 1   Replicas: 1,2   Isr: 1,2
    Topic: upstream-t2  Partition: 1    Leader: 0   Replicas: 2,0   Isr: 0
    Topic: upstream-t2  Partition: 2    Leader: 0   Replicas: 0,1   Isr: 0
    Topic: upstream-t2  Partition: 3    Leader: 0   Replicas: 1,0   Isr: 0
    Topic: upstream-t2  Partition: 4    Leader: 1   Replicas: 2,1   Isr: 1,2
    Topic: upstream-t2  Partition: 5    Leader: 0   Replicas: 0,2   Isr: 0
    Topic: upstream-t2  Partition: 6    Leader: 1   Replicas: 1,2   Isr: 1,2


Here, node 2 is the only partition some chunks of data are hosted on, but leader=-1.
Topic:upstream-t20  PartitionCount:10   ReplicationFactor:1 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t20 Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: upstream-t20 Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: upstream-t20 Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 4    Leader: -1  Replicas: 2 Isr: 2
Run Code Online (Sandbox Code Playgroud)

非常感谢任何有关如何解决领导者未被选举的帮助.

同样很高兴知道这可能会对我的经纪人的行为产生什么影响.

编辑---

Kafka版本:1.1.0(2.12-1.1.0)空间可用,如800GB的可用磁盘.日志文件非常正常,在节点2上,下面是日志文件的最后10行.请告诉我,如果有什么特别的东西我应该寻找.

[2018-12-18 10:31:43,828] INFO [Log partition=upstream-t14-1, dir=/var/lib/kafka] Rolled new log segment at offset 79149636 in 2 ms. (kafka.log.Log)
[2018-12-18 10:32:03,622] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6435}, Current: {epoch:8, offset:6386} for Partition: upstream-t41-8. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:32:03,693] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6333}, Current: {epoch:8, offset:6324} for Partition: upstream-t41-3. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:38:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:40:04,831] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6354}, Current: {epoch:8, offset:6340} for Partition: upstream-t41-9. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:48:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:58:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 11:05:50,770] INFO [ProducerStateManager partition=upstream-t4-17] Writing producer snapshot at offset 3086815 (kafka.log.ProducerStateManager)
[2018-12-18 11:05:50,772] INFO [Log partition=upstream-t4-17, dir=/var/lib/kafka] Rolled new log segment at offset 3086815 in 2 ms. (kafka.log.Log)
[2018-12-18 11:07:16,634] INFO [ProducerStateManager partition=upstream-t4-11] Writing producer snapshot at offset 3086497 (kafka.log.ProducerStateManager)
[2018-12-18 11:07:16,635] INFO [Log partition=upstream-t4-11, dir=/var/lib/kafka] Rolled new log segment at offset 3086497 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:15,803] INFO [ProducerStateManager partition=upstream-t4-5] Writing producer snapshot at offset 3086616 (kafka.log.ProducerStateManager)
[2018-12-18 11:08:15,804] INFO [Log partition=upstream-t4-5, dir=/var/lib/kafka] Rolled new log segment at offset 3086616 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
Run Code Online (Sandbox Code Playgroud)

编辑2 ----

好吧,我已经停止了领导者zookeeper实例,现在第二个zookeeper实例被选为领导者!有了这个,未选择的领导者问题现在已经解决了!

我不知道可能出了什么问题,所以任何关于" 为什么要改变动物园管理员领导者修复未选择的领导者问题 "的想法都非常受欢迎!

谢谢!

Den*_*din 0

尽管根本原因从未确定,但提问者似乎确实找到了解决方案:

我已经停止了领导者 Zookeeper 实例,现在第二个 Zookeeper 实例被选为领导者!至此,未选领导者的问题就解决了!