我可以在类的方法中使用multiprocessing.Pool吗?

Pal*_*ron 9 python multiprocessing python-3.x python-multiprocessing

我想multiprocessing在我的代码中使用以获得更好的性能.

但是,我收到如下错误:

Traceback (most recent call last):
  File "D:\EpubBuilder\TinyEpub.py", line 49, in <module>
    e.epub2txt()
  File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt
    tempread = self.get_text()
  File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text
    txtlist = pool.map(self.char2text,charlist)
  File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get
    raise self._value
  File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks
    put(task)
  File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
Run Code Online (Sandbox Code Playgroud)

我已经尝试了另一种方式并得到了这个错误:

TypeError: cannot serialize '_io.TextIOWrapper' object
Run Code Online (Sandbox Code Playgroud)

我的代码看起来像这样:

from multiprocessing import Pool
class Book(object):
    def __init__(self, arg):
        self.namelist = arg
    def format_char(self,char):
        char = char + "a"
        return char
    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(self.format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

if __name__ == '__main__':
    import os
    b = Book([open(f) for f in os.listdir()])
    t = b.format_book()
    print(t)
Run Code Online (Sandbox Code Playgroud)

我认为由于没有Pool在main函数中使用错误而引发错误.

我的推测是对的吗?我如何修改我的代码来修复错误?

dan*_*ano 24

问题是你在实例中有一个不可打开的实例变量(namelist)Book.因为您正在调用pool.map实例方法,并且您在Windows上运行,所以需要对整个实例进行可选择,以便将其传递给子进程.Book.namelist是一个打开的文件对象(_io.BufferedReader),不能被pickle.你可以通过几种方式解决这个问题.基于示例代码,看起来您可以创建format_char一个顶级函数:

def format_char(char):
    char = char + "a"
    return char


class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread
Run Code Online (Sandbox Code Playgroud)

但是,如果实际上你需要format_char成为一个实例方法,你可以使用__getstate__/__setstate__来制作Bookpicklable,方法是namelist在pickle之前从实例中删除参数:

class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def __getstate__(self):
        """ This is called before pickling. """
        state = self.__dict__.copy()
        del state['namelist']
        return state

    def __setstate__(self, state):
        """ This is called while unpickling. """
        self.__dict__.update(state)

    def format_char(self,char):
        char = char + "a"

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(self.format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread
Run Code Online (Sandbox Code Playgroud)

只要您不需要namelist在子进程中访问,这就没问题.

  • 这是真正的答案: self 必须是可腌制的。天哪,我搜索了太久才得到明确正确的答案。 (2认同)