以下代码有效但需要运行得更快.该字典有~25K键,数据帧为~3M行.有没有办法产生相同的结果,但python代码将运行得更快?(没有多处理,处理速度会慢8倍).
miscdict={" isn't ": ' is not '," aren't ":' are not '," wasn't ":' was not '," snevada ":' Sierra Nevada '}
df=pd.DataFrame({"q1":["beer is ok","beer isn't ok","beer wasn't available"," snevada is good"]})
def parse_text(data):
for key, replacement in miscdict.items():
data['q1'] = data['q1'].str.replace( key, replacement )
return data
if __name__ == '__main__':
t1_1 = datetime.datetime.now()
p = multiprocessing.Pool(processes=8)
split_dfs = np.array_split(df,8)
pool_results = p.map(parse_text, split_dfs)
p.close()
p.join()
parts = pd.concat(pool_results, axis=0)
df = pd.concat([parts], axis=1)
t2_1 = datetime.datetime.now()
print("done"+ str(t2_1-t1_1))
Run Code Online (Sandbox Code Playgroud)