如何在Python中使用多处理时显示进度条(tqdm)?

cat*_*s25 4 python multiprocessing progress-bar tqdm

我有以下代码create_data()引用了我之前已经定义的函数。

%%time
from tqdm import tqdm
from multiprocessing import Pool
import pandas as pd
import os

with Pool(processes=os.cpu_count()) as pool:
    results = pool.map(create_data, date)
    data = [ent for sublist in results for ent in sublist]
    data = pd.DataFrame(data, columns = cols)
    data.to_csv("%s"%str(date), index=False)
Run Code Online (Sandbox Code Playgroud)

我基本上想create_data()在传递日期参数的同时打电话。然后获得的所有结果将被收集到results变量中。然后我会将它们全部合并到一个列表中并将其转换为数据框。该函数create_data计算量较大,计算时间较长。这就是为什么我需要进度条来查看进程。

我尝试将该行更改为以下内容。

results = list(tqdm(pool.map(create_od, date), total = os.cpu_count()))
Run Code Online (Sandbox Code Playgroud)

但它似乎不起作用。我已经等了很长一段时间了,没有进度条出现。我在这里该怎么办?

Len*_*mju 8

cf multiprocessing.Pool.map

\n
\n

它会阻塞直到结果准备好

\n
\n

tqdm.tqdm

\n
\n

装饰一个可迭代对象,返回一个迭代器,其行为与原始可迭代对象完全相同,但每次请求值时都会打印动态更新的进度条。

\n
\n

所以在被调用之前mapping 就已经完全完成了。tqdm

\n

我用这段代码重现:

\n
from time import sleep\nfrom tqdm import tqdm\nfrom multiprocessing import Pool\n\n\ndef crunch(numbers):\n    print(numbers)\n    sleep(2)\n\n\nif __name__ == "__main__":\n    with Pool(processes=4) as pool:\n        print("mapping ...")\n        results = tqdm(pool.map(crunch, range(40)), total=40)\n        print("done")\n
Run Code Online (Sandbox Code Playgroud)\n

打印:

\n
mapping ...\n0\n3\n6\n[...]\n37\n38\n  0%|          | 0/40 [00:00<?, ?it/s]done\n
Run Code Online (Sandbox Code Playgroud)\n

相反,您应该使用惰性版本multiprocessing.Pool.imap:它将立即返回一个生成器,您必须迭代该生成器才能获得实际结果,该结果可以包装在tqdm.

\n
mapping ...\n0\n3\n6\n[...]\n37\n38\n  0%|          | 0/40 [00:00<?, ?it/s]done\n
Run Code Online (Sandbox Code Playgroud)\n

打印:

\n
from time import sleep\nfrom multiprocessing import Pool\n\nfrom tqdm import tqdm\n\n\ndef crunch(numbers):\n    # print(numbers)  # commented out to not mess the tqdm output\n    sleep(2)\n\n\nif __name__ == "__main__":\n    with Pool(processes=4) as pool:\n        print("mapping ...")\n        results = tqdm(pool.imap(crunch, range(40)), total=40)\n        print("running ...")\n        tuple(results)  # fetch the lazy results\n        print("done")\n
Run Code Online (Sandbox Code Playgroud)\n

(进度条位于多行,因为我的 Windows 终端上的 PyCharm 不支持\\r,但它应该在你的终端上正常工作)

\n