Vik*_*tor 5 python multiprocessing python-3.x python-multiprocessing
我们注意到在我们的一个部署中遗留了一堆已失效的(僵尸)进程,并设法生成了一个显示问题的非常小的程序:
多.py:
from multiprocessing import Pool, set_start_method
def f(x):
return x*x
if __name__ == '__main__':
set_start_method('spawn')
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
p.close()
p.join()
Run Code Online (Sandbox Code Playgroud)
该程序似乎正在离开僵尸进程,但很难捕获,因为从常规 shell 运行它会导致 shell 收割僵尸进程。
在我们的部署中,我们从另一个 python 程序运行它,所以为了模拟它,我们有这个:
主要.py:
from subprocess import run
from time import sleep
while True:
result = run(["python", "multi.py"], capture_output=True)
print(result.stdout.decode('utf-8'))
result = run(["ps", "-ef", "--forest"], capture_output=True)
print(result.stdout.decode('utf-8'), flush=True)
sleep(1)
Run Code Online (Sandbox Code Playgroud)
运行main.py会产生以下输出:
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 11 11:33 pts/0 00:00:00 python main.py
root 8 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 17 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 6 11:33 pts/0 00:00:00 python main.py
root 8 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 28 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 4 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 39 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 3 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 4 11:33 pts/0 00:00:00 [python] <defunct>
root 41 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 50 1 0 11:33 pts/0 00:00:00 ps -ef --forest
Run Code Online (Sandbox Code Playgroud)
另一方面,以下程序不会产生失效的进程:
主 sig.py:
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 11 11:33 pts/0 00:00:00 python main.py
root 8 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 17 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 6 11:33 pts/0 00:00:00 python main.py
root 8 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 28 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 4 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 3 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 39 1 0 11:33 pts/0 00:00:00 ps -ef --forest
[1, 4, 9]
UID PID PPID C STIME TTY TIME CMD
root 1 0 3 11:33 pts/0 00:00:00 python main.py
root 8 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 19 1 1 11:33 pts/0 00:00:00 [python] <defunct>
root 30 1 4 11:33 pts/0 00:00:00 [python] <defunct>
root 41 1 0 11:33 pts/0 00:00:00 [python] <defunct>
root 50 1 0 11:33 pts/0 00:00:00 ps -ef --forest
Run Code Online (Sandbox Code Playgroud)
此外,以下简单的 shell 脚本deo 不会产生僵尸:
from os import wait
import signal
from subprocess import run
from time import sleep
def chld_handler(_signum, _frame):
wait()
signal.signal(signal.SIGCHLD, chld_handler)
while True:
result = run(["python", "multi.py"], capture_output=True)
print(result.stdout.decode('utf-8'))
result = run(["ps", "-ef", "--forest"], capture_output=True)
print(result.stdout.decode('utf-8'), flush=True)
sleep(1)
Run Code Online (Sandbox Code Playgroud)
这是 Python 中的错误还是您需要处理来自子进程的任何僵尸(就像 Bash 似乎在做的那样)?
Dockerfile
可在此处获得所有代码和轻松重现问题的代码:https :
//github.com/viktorvia/python-multi-issue
该问题可在 Python 3.9.6、3.7.4 和 3.7.11 中重现。