在 Windows 上用 Python 演示多核加速的一些示例代码是什么？

Question

在 Windows 上用 Python 演示多核加速的一些示例代码是什么？

Abu*_*fia 3 python multicore

我在 Windows 上使用 Python 3 并尝试构建一个玩具示例，演示如何使用多个 CPU 内核来加速计算。玩具示例是 Mandelbrot 分形的渲染。

迄今为止：

我避免了线程，因为全局解释器锁在这种情况下禁止多核
我正在放弃在 Windows 上不起作用的示例代码，因为它缺乏 Linux 的分叉功能
尝试使用“多处理”包。我声明 p=Pool(8) （8 是我的核心数）并使用 p.starmap(..) 委派工作。这应该会产生多个“子进程”，窗口将自动委托给不同的 CPU

但是，我无法证明任何加速，无论是由于开销还是没有实际的多处理。因此，指向具有可证明加速的玩具示例的指针将非常有帮助:-)

编辑：谢谢！这将我推向了正确的方向，我现在有了一个工作示例，该示例演示了在具有 4 个内核的 CPU 上速度翻倍。
我的代码副本和“讲义”在这里：https : //pastebin.com/c9HZ2vAV

我决定使用 Pool() 但稍后会尝试@16num 指出的“进程”替代方案。下面是 Pool() 的代码示例：

    p = Pool(cpu_count())

    #Unlike map, starmap only allows 1 input. "partial" provides a workaround
    partial_calculatePixel = partial(calculatePixel, dataarray=data) 
    koord = []
    for j in range(height):
        for k in range(width):
            koord.append((j,k))

    #Runs the calls to calculatePixel in a pool. "hmm" collects the output
    hmm = p.starmap(partial_calculatePixel,koord)

Run Code Online (Sandbox Code Playgroud)

Answer 1

zwe*_*wer 5

演示多处理加速非常简单：

import multiprocessing
import sys
import time

# multi-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def cube_function(num):
    time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
    return num**3

if __name__ == "__main__":  # multiprocessing guard
    # we'll test multiprocessing with pools from one to the number of CPU cores on the system
    # it won't show significant improvements after that and it will soon start going
    # downhill due to the underlying OS thread context switches
    for workers in range(1, multiprocessing.cpu_count() + 1):
        pool = multiprocessing.Pool(processes=workers)
        # lets 'warm up' our pool so it doesn't affect our measurements
        pool.map(cube_function, range(multiprocessing.cpu_count()))
        # now to the business, we'll have 10000 numbers to quart via our expensive function
        print("Cubing 10000 numbers over {} processes:".format(workers))
        timer = get_timer()  # time measuring starts now
        results = pool.map(cube_function, range(10000))  # map our range to the cube_function
        timer = get_timer() - timer  # get our delta time as soon as it finishes
        print("\tTotal: {:.2f} seconds".format(timer))
        print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
        pool.close()  # lets clear out our pool for the next run
        time.sleep(1)  # waiting for a second to make sure everything is cleaned up

Run Code Online (Sandbox Code Playgroud)

当然，我们只是在这里模拟每个数字 10 毫秒的计算，您可以cube_function用任何 CPU 负担来代替实际演示。结果如预期：

Cubing 10000 numbers over 1 processes:
        Total: 100.01 seconds
        Avg. per process: 100.01 seconds
Cubing 10000 numbers over 2 processes:
        Total: 50.02 seconds
        Avg. per process: 25.01 seconds
Cubing 10000 numbers over 3 processes:
        Total: 33.36 seconds
        Avg. per process: 11.12 seconds
Cubing 10000 numbers over 4 processes:
        Total: 25.00 seconds
        Avg. per process: 6.25 seconds
Cubing 10000 numbers over 5 processes:
        Total: 20.00 seconds
        Avg. per process: 4.00 seconds
Cubing 10000 numbers over 6 processes:
        Total: 16.68 seconds
        Avg. per process: 2.78 seconds
Cubing 10000 numbers over 7 processes:
        Total: 14.32 seconds
        Avg. per process: 2.05 seconds
Cubing 10000 numbers over 8 processes:
        Total: 12.52 seconds
        Avg. per process: 1.57 seconds

Run Code Online (Sandbox Code Playgroud)

现在，为什么不是 100% 线性？嗯，首先，将数据映射/分发到子进程并取回需要一些时间，上下文切换需要一些成本，还有其他任务不时使用我的 CPU，time.sleep()不是完全精确（也不能在非 RT 操作系统上）......但结果大致在并行处理的预期范围内。

归档时间：	9 年前
查看次数：	330 次
最近记录：	9 年前