为什么 multiprocessing.pool.map 引发 PicklingError (编码)?

Che*_* A. 3 python multithreading multiprocessing python-2.7

为什么下面的代码在使用时运行threads但在使用时抛出异常multiprocessing

from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadsPool
import urllib2

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.python.org/doc/',
  'http://www.python.org/download/']

def use_threads():

    pool = ThreadsPool(4)
    results = pool.map(urllib2.urlopen, urls)
    pool.close()
    pool.join()

    print [len(x.read()) for x in results]

def use_procs():

    p_pool = Pool(4)
    p_results = p_pool.map(urllib2.urlopen, urls)
    p_pool.close()
    p_pool.join()

    print 'using procs instead of threads'
    print [len(x.read()) for x in p_results]

if __name__ == '__main__':
    use_procs()
Run Code Online (Sandbox Code Playgroud)

例外是

from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadsPool
import urllib2

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.python.org/doc/',
  'http://www.python.org/download/']

def use_threads():

    pool = ThreadsPool(4)
    results = pool.map(urllib2.urlopen, urls)
    pool.close()
    pool.join()

    print [len(x.read()) for x in results]

def use_procs():

    p_pool = Pool(4)
    p_results = p_pool.map(urllib2.urlopen, urls)
    p_pool.close()
    p_pool.join()

    print 'using procs instead of threads'
    print [len(x.read()) for x in p_results]

if __name__ == '__main__':
    use_procs()
Run Code Online (Sandbox Code Playgroud)

我知道进程和线程如何相互通信是有区别的。为什么pickle网站内容失败?如何设置编码来解决这个问题?

mar*_*eau 5

问题不是编码错误,而是由于酸洗错误,因为结果urllib2.urlopen()返回的是一个不可酸洗的对象(_ssl._SSLSocket根据我在您的代码中收到的错误消息中显示的略有不同的原因)。为了解决这个问题,您可以通过在打开 url 后读取数据来将返回对象的使用限制为子进程本身,如下所示。然而,这可能意味着需要在进程之间传递更多数据。

# Added.
def get_data(url):

    soc = urllib2.urlopen(url)
    return soc.read()

def use_procs():

    p_pool = Pool(4)
#    p_results = p_pool.map(urllib2.urlopen, urls)
    p_results = p_pool.map(get_data, urls)
    p_pool.close()
    p_pool.join()

    print 'using procs instead of threads'
#    print [len(x.read()) for x in results]
    print [len(x) for x in p_results]
Run Code Online (Sandbox Code Playgroud)

输出:

# Added.
def get_data(url):

    soc = urllib2.urlopen(url)
    return soc.read()

def use_procs():

    p_pool = Pool(4)
#    p_results = p_pool.map(urllib2.urlopen, urls)
    p_results = p_pool.map(get_data, urls)
    p_pool.close()
    p_pool.join()

    print 'using procs instead of threads'
#    print [len(x.read()) for x in results]
    print [len(x) for x in p_results]
Run Code Online (Sandbox Code Playgroud)