如何解决python多处理matplotlib savefig()问题?

big*_*bug 3 matplotlib multiprocessing

我希望通过多处理模块为许多数字加速matplotlib.savefig(),并尝试对并行和序列之间的性能进行基准测试.

以下是代码:

# -*- coding: utf-8 -*-
"""
Compare the time of matplotlib savefig() in parallel and sequence
"""

import numpy as np
import matplotlib.pyplot as plt
import multiprocessing
import time


def gen_fig_list(n):
    ''' generate a list to contain n demo scatter figure object '''
    plt.ioff()
    fig_list = []
    for i in range(n):
        plt.figure();
        dt = np.random.randn(5, 4);
        fig = plt.scatter(dt[:,0], dt[:,1], s=abs(dt[:,2]*1000), c=abs(dt[:,3]*100)).get_figure()
        fig.FM_figname = "img"+str(i)
        fig_list.append(fig)
    plt.ion()
    return fig_list


def savefig_worker(fig, img_type, folder):
    file_name = folder+"\\"+fig.FM_figname+"."+img_type
    fig.savefig(file_name, format=img_type, dpi=fig.dpi)
    return file_name


def parallel_savefig(fig_list, folder):
    proclist = []
    for fig in fig_list:
        print fig.FM_figname,
        p = multiprocessing.Process(target=savefig_worker, args=(fig, 'png', folder)) # cause error
        proclist.append(p)
        p.start()

    for i in proclist:
        i.join()



if __name__ == '__main__':
    folder_1, folder_2 = 'Z:\\A1', 'Z:\\A2'
    fig_list = gen_fig_list(10)

    t1 = time.time()
    parallel_savefig(fig_list,folder_1)
    t2 = time.time()
    print '\nMulprocessing time    : %0.3f'%((t2-t1))

    t3 = time.time()
    for fig in fig_list:
        savefig_worker(fig, 'png', folder_2)
    t4 = time.time()
    print 'Non_Mulprocessing time: %0.3f'%((t4-t3))
Run Code Online (Sandbox Code Playgroud)

而且我遇到了"This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information."由此引起的问题错误p = multiprocessing.Process(target=savefig_worker, args=(fig, 'png', folder)).

为什么?以及如何解决?

(Windows XP + Python:2.6.1 + Numpy:1.6.2 + Matplotlib:1.2.0)

编辑:(在python 2.7.3上添加错误消息)

当在python 2.7.3的IDLE上运行时,它给出了以下错误消息:

>>> 
img0

Traceback (most recent call last):
  File "C:\Documents and Settings\Administrator\desktop\mulsavefig_pilot.py", line 61, in <module>
    proc.start()
  File "d:\Python27\lib\multiprocessing\process.py", line 130, in start

  File "d:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "d:\Python27\lib\pickle.py", line 748, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function notify_axes_change at 0x029F5030>: it's not found as matplotlib.backends.backend_qt4.notify_axes_change
Run Code Online (Sandbox Code Playgroud)

编辑:(我的解决方案演示)

灵感来自Matplotlib:在多个线程中同时绘图

# -*- coding: utf-8 -*-
"""
Compare the time of matplotlib savefig() in parallel and sequence
"""

import numpy as np
import matplotlib.pyplot as plt
import multiprocessing
import time


def gen_data(fig_qty, bubble_qty):
    ''' generate data for fig drawing '''
    dt = np.random.randn(fig_qty, bubble_qty, 4)
    return dt


def parallel_savefig(draw_data, folder):
    ''' prepare data and pass to worker '''

    pool = multiprocessing.Pool()

    fig_qty = len(draw_data)
    fig_para = zip(range(fig_qty), draw_data, [folder]*fig_qty)

    pool.map(fig_draw_save_worker, fig_para)
    return None


def fig_draw_save_worker(args):
    seq, dt, folder = args
    plt.figure()
    fig = plt.scatter(dt[:,0], dt[:,1], s=abs(dt[:,2]*1000), c=abs(dt[:,3]*100), alpha=0.7).get_figure()
    plt.title('Plot of a scatter of %i' % seq)
    fig.savefig(folder+"\\"+'fig_%02i.png' % seq)
    plt.close()
    return None


if __name__ == '__main__':
    folder_1, folder_2 = 'A1', 'A2'
    fig_qty, bubble_qty =  500, 100
    draw_data = gen_data(fig_qty, bubble_qty)

    print 'Mulprocessing  ...   ',
    t1 = time.time()
    parallel_savefig(draw_data, folder_1)
    t2 = time.time()
    print 'Time : %0.3f'%((t2-t1))

    print 'Non_Mulprocessing .. ', 
    t3 = time.time()
    for para in zip(range(fig_qty), draw_data, [folder_2]*fig_qty):
        fig_draw_save_worker(para)
    t4 = time.time()
    print 'Time : %0.3f'%((t4-t3))

    print 'Speed Up: %0.1fx'%(((t4-t3)/(t2-t1)))
Run Code Online (Sandbox Code Playgroud)

Hid*_*ame 8

您可以尝试将所有matplotlib代码(包括导入)移动到函数.

  1. 确保您没有导入matplotlib或导入matplotlib.pyplot作为代码顶部的plt.

  2. 创建一个执行所有matplotlib的函数,包括导入.

例:

import numpy as np
from multiprocessing import pool

def graphing_function(graph_data):
    import matplotlib.pyplot as plt
    plt.figure()
    plt.hist(graph_data.data)
    plt.savefig(graph_data.filename)
    plt.close()
    return

pool = Pool(4)
pool.map(graphing_function, data_list) 
Run Code Online (Sandbox Code Playgroud)

  • 为什么这有用?!?在运行python3.5.2的windows或Linux机器上,matplotlib 2.1,numpy 1.13我可以保持全局导入matplotlib并且一切正常.在OS-X上运行代码时,我得到关于"__THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__"的投诉.将导入放在被调用函数中使其正常工作. (2认同)

tac*_*ell 3

这并不是一个真正的错误,更多的是一个限制。

解释在错误消息的最后一行:

PicklingError: Can't pickle <function notify_axes_change at 0x029F5030>: it's not found as matplotlib.backends.backend_qt4.notify_axes_change
Run Code Online (Sandbox Code Playgroud)

它告诉您图形对象的元素不能被腌制,这是MultiProcess在进程之间传递数据的方式。这些物体在主要流程中进行腌制,作为腌菜运输,然后在另一侧重新构建。即使您修复了这个确切的问题(也许通过使用不同的后端,或剥离有问题的函数(这可能会以其他方式破坏事物)),我很确定FigureAxesCanvas对象的核心部分无法被腌制。

正如 @bigbug 所指出的,如何解决此限制的一个示例是Matplotlib:在多个线程中同时绘图。基本思想是将整个绘图例程推送到子流程,这样您只需numpy跨流程边界推送数组和可能的一些配置信息。