小编Rob*_*ler的帖子

使用dict替换pandas数据帧中的字符串时性能很慢

以下代码有效但需要运行得更快.该字典有~25K键,数据帧为~3M行.有没有办法产生相同的结果,但python代码将运行得更快?(没有多处理,处理速度会慢8倍).

miscdict={" isn't ": ' is not '," aren't ":' are not '," wasn't ":' was not '," snevada ":' Sierra Nevada '}

df=pd.DataFrame({"q1":["beer is ok","beer isn't ok","beer wasn't available"," snevada is good"]})

def parse_text(data):
    for key, replacement in miscdict.items():
        data['q1'] = data['q1'].str.replace( key, replacement )
    return data

if __name__ == '__main__':
    t1_1 = datetime.datetime.now()
    p = multiprocessing.Pool(processes=8)
    split_dfs = np.array_split(df,8)
    pool_results = p.map(parse_text, split_dfs)
    p.close()
    p.join()
    parts = pd.concat(pool_results, axis=0)
    df = pd.concat([parts], axis=1)
    t2_1 = datetime.datetime.now()
    print("done"+ str(t2_1-t1_1)) 
Run Code Online (Sandbox Code Playgroud)

python dictionary pandas

6
推荐指数
2
解决办法
2307
查看次数

标签 统计

dictionary ×1

pandas ×1

python ×1