ElasticSearch:如何读取 _node/hot_threads 的输出

Dan*_*iel 7 elasticsearch

我在 ElasticSearch 中有一个数据节点在 CPU (99%) 上运行很高并且搜索速度很慢。使用top表明它正在使用所有 CPU 的 elasticsearch 进程。

_nodes/hot_threads在该节点中运行了API 并得到了这个输出,但我不知道如何解释它。有人能解释一下吗?

::: {warm-xxx}{XXXXXXX}{YYYYYYYY}{10.10.10.10}{10.10.10.10:9300}{aws_availability_zone=us-west-2b, data_type=warm, ml.machine_memory=64388997120, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   Hot threads at 2019-12-30T23:22:24.304Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   44.0% (220.2ms out of 500ms) cpu usage by thread 'elasticsearch[warm-xxx][management][T#1]'
     3/10 snapshots sharing following 57 elements
       org.elasticsearch.index.engine.Engine.segmentsStats(Engine.java:831)
       org.elasticsearch.index.shard.IndexShard.segmentStats(IndexShard.java:1051)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:213)
       org.elasticsearch.indices.IndicesService.indexShardStats(IndicesService.java:403)
       org.elasticsearch.indices.IndicesService.statsByShard(IndicesService.java:357)
       org.elasticsearch.indices.IndicesService.stats(IndicesService.java:348)
...

   42.7% (213.4ms out of 500ms) cpu usage by thread 'elasticsearch[warm-xxx][search][T#2]'                                                                                             
     10/10 snapshots sharing following 21 elements
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:263)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:670)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
...
   41.8% (208.9ms out of 500ms) cpu usage by thread 'elasticsearch[warm-xxx][search][T#7]'
     10/10 snapshots sharing following 21 elements
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:263)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:670)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)

Run Code Online (Sandbox Code Playgroud)

我正在运行 ElasticSearch 6.8

Ami*_*wal -1

现在我添加评论已经有一段时间了,由于没有答案和详细文档,我计划写一篇,一旦完成我将分享。

但到目前为止,简而言之,Elasticsearch 使用各种线程池进行各种操作(搜索、索引、管理、分析等),您可以在官方文档上找到所有线程池及其配置,即线程数和队列大小。

现在,如果您密切关注 API 的输出,它会告诉您节点名称、线程池名称(在您的情况下是管理和搜索)和线程号(T#1,t#7)以及该线程到底在做什么以及做了多少花费时间,以便您可以了解集群中的热点是什么。

正如您的示例所示,这种特定的搜索成本非常高

41.8% (208.9ms out of 500ms) cpu usage by thread 'elasticsearch[warm-xxx][search][T#7]'
     10/10 snapshots sharing following 21 elements
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:263)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:670)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
Run Code Online (Sandbox Code Playgroud)

因此,您需要弄清楚在您的应用程序中如何使用此搜索 org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)并尝试改进它。