Rep*_*pox 9 sharding elasticsearch
我收到此错误,在我的ES日志中我使用了三个节点.
Caused by: java.lang.ArrayIndexOutOfBoundsException
[2014-09-08 13:53:56,167][WARN ][cluster.action.shard ] [Dancing Destroyer] [events][3] sending failed shard for [events][3], node[RDZy21y7SRep7n6oWT8ogg], [P], s[INITIALIZING], indexUUID [gzj1aHTnQX6XDc0SxkvxDQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[events][3] failed recovery]; nested: FlushFailedEngineException[[events][3] Flush failed]; nested: ArrayIndexOutOfBoundsException; ]]
[2014-09-08 13:53:56,357][WARN ][indices.cluster ] [Dancing Destroyer] [events][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [events][3] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.engine.FlushFailedEngineException: [events][3] Flush failed
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryFinalization(InternalIndexShard.java:726)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:249)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
... 3 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
[2014-09-08 13:53:56,381][WARN ][cluster.action.shard ] [Dancing Destroyer] [events][3] sending failed shard for [events][3], node[RDZy21y7SRep7n6oWT8ogg], [P], s[INITIALIZING], indexUUID [gzj1aHTnQX6XDc0SxkvxDQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[events][3] failed recovery]; nested: FlushFailedEngineException[[events][3] Flush failed]; nested: ArrayIndexOutOfBoundsException; ]]
Run Code Online (Sandbox Code Playgroud)
这意味着ES的状态为红色,我丢失了近1000万份文档.这个错误意味着什么,以便我能够恢复?
Rep*_*pox 12
似乎我有一个混乱的碎片,需要修复.这是一个Lucene的东西,你告诉Lucene修复碎片.
对于Ubuntu的,解决办法是去到/usr/share/elasticsearch/lib
目录,并找出哪些Lucene的核心版本正在运行(运行ls
会告诉你一个类似的Lucene核心4.8.1.jar文件命名),然后键入:
java -cp lucene-core-x.x.x.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /var/lib/elasticsearch/<clustername>/nodes/0/indices/<index>/<shard>/index/ -fix
Run Code Online (Sandbox Code Playgroud)
将xxx替换为Lucene核心版本,使用您的clustername,索引和索引名称,当然还有失败的分片编号.
这可能会导致文档丢失
但它解决了我们的问题.
我多次遇到这个问题.由于我的设置是阅读点击流数据(每天12-20M点击),我无法承受数据丢失.
所以这是我的解决方案,它运行得很漂亮:
方案:
问题根本原因
碎片由于各种原因而失败,尤其是当碎片无法满足Kibana请求时.
Lucene与此过程没有直接联系.因此,当存在问题时,elasticsearch无法有效地从存储在segments.gen中的Lucene段引用中选择分片值.
Lucene在下次运行中再次将此值设置为新值.因此,elasticsearch能够正确地引用这些值.并且碎片问题已解决.
归档时间: |
|
查看次数: |
13117 次 |
最近记录: |