使用多处理时合并 Pandas DataFrame

The*_*man 4 python multiprocessing pandas

我正在使用多重处理,并为每个进程生成一个 pandas DataFrame。我想将它们合并在一起并输出数据。以下策略似乎几乎可行,但是当尝试用它读入数据时,df.read_csv()仅使用第一个作为name列标题。

from multiprocessing import Process, Lock

def foo(name, lock):
    d = {f'{name}': [1, 2]}
    df = pd.DataFrame(data=d)

    lock.acquire()
    try:
        df.to_csv('output.txt', mode='a')
    finally:
        lock.release()

if __name__ == '__main__':
    lock = Lock()

    for name in ['bob','steve']
        p = Process(target=foo, args=(name, lock))
        p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

Cor*_*ien 6

您可以使用multiprocessing.Pool

import multiprocessing
import pandas as pd

def foo(name):
    d = {f'{name}': [1, 2]}
    df = pd.DataFrame(data=d)
    return df

if __name__ == '__main__':
    data = ['bob', 'steve']
    with multiprocessing.Pool(2) as pool:
        data = pool.map(foo, data)
    pd.concat(data, axis=1).to_csv('output.csv')
Run Code Online (Sandbox Code Playgroud)

输出:

>>> pd.concat(data, axis=1)
   bob  steve
0    1      1
1    2      2
Run Code Online (Sandbox Code Playgroud)