多处理脚本比普通脚本慢

luk*_*kik 1 python multiprocessing

我似乎无法绕过多处理.我试图做一些基本的操作,但多处理脚本似乎需要永远.

import multiprocessing, time, psycopg2

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue

    def run(self):
        proc_name = self.name
        while True:
            next_task = self.task_queue.get()
            if next_task is None:
                print ('Tasks Complete')
                self.task_queue.task_done()
                break            
            answer = next_task()
            self.task_queue.task_done()
            self.result_queue.put(answer)
        return

class Task(object):
    def __init__(self, a):
        self.a = a

    def __call__(self):
        #Some more work will go in here but for now just return the value
        return self.a

    def __str__(self):
        return 'ARC'
    def run(self):
        print ('IN')


if __name__ == '__main__':
    start_time = time.time()
    numberList = []

    for x in range(1000000):
        numberList.append(x) 

    result = []
    counter = 0
    total = 0
    for id in numberList:
        total =+ id
        counter += 1
    print(counter)
    print("Finished in Seconds: %s" %(time.time()-start_time))
    ###############################################################################################################################
    #Mutliprocessing starts here....
    ###############################################################################################################################        
    start_time = time.time()
    tasks = multiprocessing.JoinableQueue()
    results = multiprocessing.Queue()

    num_consumers = multiprocessing.cpu_count() 
    consumers = [Consumer(tasks, results) for i in range(num_consumers)]
    for w in consumers:
        w.start()

    num_jobs = len(numberList)

    for i in range(num_jobs):
        tasks.put(Task(numberList[i]))

    for i in range(num_consumers):
        tasks.put(None)

    print("So far: %s" %(time.time()-start_time))
    result = []
    while num_jobs:
        result.append(results.get())
        num_jobs -= 1
    print (len(result))
    print("Finished in Seconds: %s" %(time.time()-start_time))
Run Code Online (Sandbox Code Playgroud)

原始脚本来自这里

循环的第一个基本平均值为0.4秒,多处理器完成时间为56秒,而我预计它会反过来.

是否有一些逻辑丢失或实际上更慢?另外,我如何构建它比循环标准更快?

Jan*_*ila 5

将每个对象从进程传递到队列上进行处理会增加开销.您现在已经测量了百万个对象的开销为56秒.传递更少,更大的对象可以减少开销,但不能消除它.为了从多处理中受益,与需要传输的数据量相比,每个任务执行的计算应该相对较重.


dan*_*ano 5

您的多处理代码实际上是过度设计的,并且实际上并没有完成它应该做的工作.我把它重写为更简单,实际做了它应该做的事情,现在它比简单的循环更快:

import multiprocessing
import time


def add_list(l):
    total = 0 
    counter = 0 
    for ent in l:
        total += ent 
        counter += 1
    return (total, counter)

def split_list(l, n): 
    # Split `l` into `n` equal lists.
    # Borrowed from http://stackoverflow.com/a/2136090/2073595
    return [l[i::n] for i in xrange(n)]

if __name__ == '__main__':
    start_time = time.time()
    numberList = range(1000000):

    counter = 0 
    total = 0 
    for id in numberList:
        total += id
        counter += 1
    print(counter)
    print(total)
    print("Finished in Seconds: %s" %(time.time()-start_time))
    start_time = time.time()

    num_consumers = multiprocessing.cpu_count() 
    # Split the list up so that each consumer can add up a subsection of the list.
    lists = split_list(numberList, num_consumers)
    p = multiprocessing.Pool(num_consumers)
    results = p.map(add_list, lists)
    total = 0 
    counter = 0 
    # Combine the results each worker returned.
    for t, c in results:
        total += t
        counter += c
    print(counter)
    print(total)

    print("Finished in Seconds: %s" %(time.time()-start_time))
Run Code Online (Sandbox Code Playgroud)

这是输出:

Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391
Run Code Online (Sandbox Code Playgroud)

正如@aruisdante指出的那样,你的工作量很小,所以多处理的好处在这里并没有真正充分发挥.如果你正在做更重的处理,你会看到更大的差异.