Jcy*_*rss 6 broken-pipe rabbitmq celery gevent eventlet
我运行时发现 Celery Worker 与 Rabbitmq 的连接在 Gevent 模式下遇到了损坏的管道错误。当 Celery 工作线程在进程池模式下工作时没有问题(没有 gevent 没有猴子补丁)。
之后,Celery Worker 将不再从 Rabbitmq 获取任务消息,直到重新启动为止。
当 Celery 工作线程消耗任务消息的速度比 Django 应用程序生成消息的速度慢并且 Rabbitmq 中堆积了大约 3 千条消息时,这个问题总是会发生。
Gevent版本1.1.0
芹菜版本 3.1.22
======芹菜原木======
[2016-08-08 13:52:06,913: CRITICAL/MainProcess] Couldn't ack 293, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 412, in sendall
timeleft = self.__send_chunk(chunk, flags, timeleft, end)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 351, in __send_chunk
data_sent += self.send(chunk, flags)
File "/usr/local/lib/python2.7/site-packages/gevent/_socket2.py", line 320, in send
return sock.send(data, flags)
error: [Errno 32] Broken pipe
Run Code Online (Sandbox Code Playgroud)
======= Rabbitmq 日志 ==================
=ERROR REPORT==== 8-Aug-2016::14:28:33 ===
closing AMQP connection <0.15928.4> (10.26.39.183:60732 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15981.4> (10.26.39.183:60736 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
=ERROR REPORT==== 8-Aug-2016::14:29:03 ===
closing AMQP connection <0.15955.4> (10.26.39.183:60734 -> 10.26.39.183:5672):
{writer,send_failed,{error,enotconn}}
Run Code Online (Sandbox Code Playgroud)
当 Celery Worker 使用 eventlet 时也会出现类似的问题。
[2016-08-09 17:41:37,952: CRITICAL/MainProcess] Couldn't ack 583, reason:error(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqp/channel.py", line 1584, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall
tail = self.send(data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send
return self._send_loop(self.fd.send, data, flags)
File "/usr/local/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe
Run Code Online (Sandbox Code Playgroud)
添加设置和负载测试信息
我们使用supervisor来启动celery,选项如下
celery worker -A celerytasks.celery_worker_init -Q default -P gevent -c 1000 --loglevel=info
Run Code Online (Sandbox Code Playgroud)
Celery 使用 Rabbitmq 作为代理。
通过在主管配置中指定“numprocs=4”,我们有 4 个 Celery 工作进程。
我们使用 jmeter 来模拟 Web 访问负载,Django 应用程序将生成任务供 Celery 工作人员使用。这些任务基本上需要访问 Mysql DB 来获取/更新一些数据。
从rabbitmq的Web管理页面来看,任务生成速度约为每秒50个,而消耗速度约为每秒20个。经过大约 1 分钟的负载测试后,日志文件显示 Rabbitmq 和 Celery 之间的许多连接遇到了 Broken-Pipe 错误
我们注意到这个问题也是由于高级数和高并发性的结合而引起的。
我们将并发设置为500并将预取设置为100,这意味着最终预取为50,000每个工作线程 500*100= 。我们堆积了大约 10 万个任务,由于这种配置,一个工作线程为自己保留了所有任务,而其他工作线程甚至没有被使用,这个工作线程不断出错Broken pipe并且从不确认任何任务,这导致任务永远不会从队列中清除。
然后,我们将预取更改为3并重新启动所有解决了问题的工作程序,在将预取更改为较低的数字后,我们看到了 0 个“损坏管道”错误实例,因为我们之前经常看到它。
| 归档时间: |
|
| 查看次数: |
4918 次 |
| 最近记录: |