使用Python pool.map让多个进程对列表执行操作

Question

使用Python pool.map让多个进程对列表执行操作

我试图启动 6 个线程，每个线程从列表文件中获取一个项目，将其删除，然后打印该值。

from multiprocessing import Pool

files = ['a','b','c','d','e','f']

def convert(file):
    process_file = files.pop()
    print process_file

if __name__ == '__main__':

    pool = Pool(processes=6)
    pool.map(convert,range(6))

Run Code Online (Sandbox Code Playgroud)

预期输出应该是：

a
b
c
d
e
f

Run Code Online (Sandbox Code Playgroud)

相反，输出是：

f
f
f
f
f
f

Run Code Online (Sandbox Code Playgroud)

这是怎么回事？提前致谢。

Answer 1

Cor*_*hin 5

部分问题是您没有处理池的多进程性质（请注意，在 Python 中，多线程不会由于全局解释器锁而获得性能）。

您需要更改原始列表是否有原因？您当前的代码不使用传入的可迭代对象，而是编辑共享可变对象，这在并发世界中是危险的。一个简单的解决方案如下：

from multiprocessing import Pool

files = ['a','b','c','d','e','f']

def convert(aFile):
    print aFile

if __name__ == '__main__':

    pool = Pool() #note the default will use the optimal number of workers
    pool.map(convert,files)

Run Code Online (Sandbox Code Playgroud)

你的问题确实引起了我的思考，所以我做了更多的探索来理解为什么 Python 会以这种方式运行。看起来Python正在做一些有趣的黑魔法，并将对象深度复制（同时维护id，这是非标准的）到新进程中。这可以通过改变所使用的数量或进程来看到：

from multiprocessing import Pool

files = ['d','e','f','a','b','c',]

a = sorted(files)
def convert(_):
    print a == files
    files.sort()
    #print id(files) #note this is the same for every process, which is interesting

if __name__ == '__main__':

    pool = Pool(processes=1) #
    pool.map(convert,range(6))

Run Code Online (Sandbox Code Playgroud)

==> 除了第一次调用之外的所有调用都按预期打印“True”。

如果将进程数设置为 2，则确定性较差，因为这取决于哪个进程首先实际执行其语句。

归档时间：	14 年，2 月前
查看次数：	9669 次
最近记录：	7 年，7 月前