Elasticsearch drops too many requests -- would a buffer improve things?

Question

Elasticsearch drops too many requests -- would a buffer improve things?

We have a cluster of workers that send indexing requests to a 4-node Elasticsearch cluster. The documents are indexed as they are generated, and since the workers have a high degree of concurrency, Elasticsearch is having trouble handling all the requests. To give some numbers, the workers process up to 3,200 tasks at the same time, and each task usually generates about 13 indexing requests. This generates an instantaneous rate that is between 60 and 250 indexing requests per second.

From the start, Elasticsearch had problems and requests were timing out or returning 429. To get around this, we increased the timeout on our workers to 200 seconds and increased the write thread pool queue size on our nodes to 700.

That's not a satisfactory long-term solution though, and I was looking for alternatives. I have noticed that when I copied an index within the same cluster with elasticdump, the write thread pool was almost empty and I attributed that to the fact that elasticdump batches indexing requests and (probably) uses the bulk API to communicate with Elasticsearch.

That gave me the idea that I could write a buffer that receives requests from the workers, batches them in groups of 200-300 requests and then sends the bulk request to Elasticsearch for one group only.

这样的东西是否已经存在，听起来是个好主意吗？

Answer 1

Ami*_*wal 7

首先，当您将索引请求发送到 Elasticsearch 以解决问题或找到根本原因时，了解幕后发生的事情很重要。

Elasticsearch 有多个线程池，但是对于索引请求（单个/批量）写入线程池正在使用，请根据您的 Elasticsearch 版本检查这一点，因为 Elastic 不断更改线程池（之前有一个单独的线程池用于具有不同队列的单个和批量请求容量）。

在最新的 ES 版本（7.10）中，写入线程池的队列容量从 200（早期版本中存在）显着增加到 10000，可能有以下原因。

Elasticsearch 现在更喜欢缓冲更多索引请求而不是拒绝请求。
虽然增加队列容量意味着更多的延迟，但这是一种权衡，如果客户端没有重试机制，这将减少数据丢失。

我敢肯定，当容量增加时，您不会迁移到 ES 7.9 版本，但是您可以通过此官方示例中提到的配置更改缓慢增加此队列的大小并轻松分配更多处理器（如果您有更多容量）. 虽然这是一个非常有争议的话题，很多人认为这是一种创可贴的解决方案，而不是正确的修复，但现在随着 Elastic 自己增加了队列大小，你也可以尝试一下，如果你的增加持续时间很短交通比它更有意义。

另一个关键是找出 ES 节点排队更多请求的根本原因，这可能是合法的，例如增加索引流量和基础设施达到极限。但是如果它不合法，您可以查看我提高一次性索引性能和整体索引性能的简短提示，通过实施这些提示，您将获得更好的索引率，从而减少写入线程池队列的压力。

编辑：正如@Val 在评论中所提到的，如果您还对文档一一索引，那么转向批量索引 API会给您带来最大的提升。

我认为关键点在于 OP 没有使用批量 API，而是逐一索引每个文档，这在线程和网络方面并不是最佳的。通过巧妙地利用批量 API，几乎不需要调整线程池。 (2认同)

归档时间：	6 年，10 月前
查看次数：	6059 次
最近记录：	5 年，6 月前