multiprocessing.Queue 间歇性失败。Python中的错误?

cha*_*umQ 7 python ipc pipe multiprocessing

Pythonmultiprocessing.Queue间歇性失败,我不知道为什么。这是 Python 还是我的脚本中的错误?

最小的失败脚本

import multiprocessing
import time
import logging
import multiprocessing.util
multiprocessing.util.log_to_stderr(level=logging.DEBUG)

queue = multiprocessing.Queue(maxsize=10)

def worker(queue):
    queue.put('abcdefghijklmnop')

    # "Indicate that no more data will be put on this queue by the
    # current process." --Documentation
    # time.sleep(0.01)
    queue.close()

proc = multiprocessing.Process(target=worker, args=(queue,))
proc.start()

# "Indicate that no more data will be put on this queue by the current
# process." --Documentation
# time.sleep(0.01)
queue.close()

proc.join()
Run Code Online (Sandbox Code Playgroud)

我正在 Debian 中的 CPython 3.6.6 中对此进行测试。它也失败了 docker python:3.7.0-alpine

docker run --rm -v "${PWD}/test.py:/test.py" \
    python:3-alpine python3 /test.py
Run Code Online (Sandbox Code Playgroud)

上述脚本有时会因 BrokenPipeError 而失败。

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Run Code Online (Sandbox Code Playgroud)

测试线束

因为这是间歇性的,所以我写了一个shell脚本来多次调用它并统计失败的次数。

#!/bin/sh
total=10

successes=0
for i in `seq ${total}`
do
    if ! docker run --rm -v "${PWD}/test.py:/test.py" python:3-alpine \
         python3 test.py 2>&1 \
         | grep --silent BrokenPipeError
    then
        successes=$(expr ${successes} + 1)
    fi
done
python3 -c "print(${successes} / ${total})"
Run Code Online (Sandbox Code Playgroud)

这通常显示一些分数,可能 0.2 表示间歇性故障。

时间调整

如果我time.sleep(0.01)在任何一个之前插入queue.close(),它都会一致地工作。我在源代码中注意到写入发生在它自己的线程中。我认为如果写入线程仍在尝试写入数据并且所有其他线程关闭队列,那么它会导致错误。

调试日志

通过取消对前几行的注释,我可以跟踪失败和成功的执行情况。

失败:

[DEBUG/MainProcess] created semlock with handle 140480257941504
[DEBUG/MainProcess] created semlock with handle 140480257937408
[DEBUG/MainProcess] created semlock with handle 140480257933312
[DEBUG/MainProcess] Queue._after_fork()
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/Process-1] Queue._start_thread()
[DEBUG/Process-1] doing self._thread.start()
[DEBUG/Process-1] starting thread to feed data to pipe
[DEBUG/Process-1] ... done self._thread.start()
[DEBUG/Process-1] telling queue thread to quit
[INFO/Process-1] process shutting down
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] running the remaining "atexit" finalizers
[DEBUG/Process-1] joining queue thread
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
    send_bytes(obj)
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
[DEBUG/Process-1] feeder thread got sentinel -- exiting
[DEBUG/Process-1] ... queue thread joined
[INFO/Process-1] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
Run Code Online (Sandbox Code Playgroud)

“成功”(真正无声的失败,只能用 Python 3.6 复制):

[DEBUG/MainProcess] created semlock with handle 139710276231168
[DEBUG/MainProcess] created semlock with handle 139710276227072
[DEBUG/MainProcess] created semlock with handle 139710276222976
[DEBUG/MainProcess] Queue._after_fork()
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/Process-1] Queue._start_thread()
[DEBUG/Process-1] doing self._thread.start()
[DEBUG/Process-1] starting thread to feed data to pipe
[DEBUG/Process-1] ... done self._thread.start()
[DEBUG/Process-1] telling queue thread to quit
[INFO/Process-1] process shutting down
[INFO/Process-1] error in queue thread: [Errno 32] Broken pipe
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] running the remaining "atexit" finalizers
[DEBUG/Process-1] joining queue thread
[DEBUG/Process-1] ... queue thread joined
[INFO/Process-1] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
Run Code Online (Sandbox Code Playgroud)

真正的成功(使用任一time.sleep(0.01)):

[DEBUG/MainProcess] created semlock with handle 140283921616896
[DEBUG/MainProcess] created semlock with handle 140283921612800
[DEBUG/MainProcess] created semlock with handle 140283921608704
[DEBUG/MainProcess] Queue._after_fork()
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/Process-1] Queue._start_thread()
[DEBUG/Process-1] doing self._thread.start()
[DEBUG/Process-1] starting thread to feed data to pipe
[DEBUG/Process-1] ... done self._thread.start()
[DEBUG/Process-1] telling queue thread to quit
[INFO/Process-1] process shutting down
[DEBUG/Process-1] feeder thread got sentinel -- exiting
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] running the remaining "atexit" finalizers
[DEBUG/Process-1] joining queue thread
[DEBUG/Process-1] ... queue thread joined
[INFO/Process-1] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
Run Code Online (Sandbox Code Playgroud)

区别似乎在于,在真正成功的情况下,馈送器在atexit处理程序之前接收哨兵对象。

fjl*_*fob 0

您的代码的主要问题是没有人消耗您的工作进程放入队列中的内容。python 队列期望队列中的数据在放入数据的进程被终止之前被消耗(“刷新到管道”)。

从这个角度来看,你的例子没有多大意义,但如果你想让它工作:

关键是queue.cancel_join_thread()- https://docs.python.org/3/library/multiprocessing.html

警告 如上所述,如果子进程已将项目放入队列(并且尚未使用 JoinableQueue.cancel_join_thread),则该进程将不会终止,直到所有缓冲的项目都已刷新到管道。这意味着,如果您尝试加入该进程,则可能会遇到死锁,除非您确定已放入队列的所有项目都已被消耗。类似地,如果子进程是非守护进程,则当父进程尝试加入其所有非守护进程子进程时,它可能会在退出时挂起。

请注意,使用管理器创建的队列不存在此问题

^ 相关位。问题是,东西被从子进程放入队列,但没有被任何人消耗。在这种情况下,cancel_join_queue必须在请求之前调用子进程join。此代码示例将消除该错误。

import multiprocessing
import time
import logging
import multiprocessing.util
multiprocessing.util.log_to_stderr(level=logging.DEBUG)

queue = multiprocessing.Queue(maxsize=10)

def worker(queue):
    queue.put('abcdefghijklmnop')

    # "Indicate that no more data will be put on this queue by the
    # current process." --Documentation
    # time.sleep(0.01)
    queue.close()
    
    queue.cancel_join_thread() # ideally, this would not be here but would rather be a response to a signal (or other IPC message) sent from the main process


proc = multiprocessing.Process(target=worker, args=(queue,))
proc.start()

# "Indicate that no more data will be put on this queue by the current
# process." --Documentation
# time.sleep(0.01)
queue.close()

proc.join()
Run Code Online (Sandbox Code Playgroud)

我没有为此打扰 IPC,因为根本没有消费者,但我希望这个想法是明确的。