Ale*_*ana 4 python dataframe python-3.x pandas
我有以下 Pandas DataFrame,其中包含 city 和 arr 列:
city arr final_target
paris 11 paris_11
paris 12 paris_12
dallas 22 dallas
miami 15 miami
paris 16 paris_16
Run Code Online (Sandbox Code Playgroud)
我的目标是当城市名称是巴黎时,填充连接巴黎和arr号的final_target列,当名称不是巴黎时,只填充名称。
最Pythonic的方法是什么?
\n\n最Pythonic的方法是什么?
\n
这取决于定义。如果它是更优选、最常见和最快的方法,那么np.where解决方案就是这里最Pythonic的方法。
使用numpy.where,如果需要 pandaic,该解决方案也是矢量化的,因此应该更可取,例如apply(在引擎盖下循环):
df[\'final_target\'] = np.where(df[\'city\'].eq(\'paris\'), \n df[\'city\'] + \'_\' + df[\'arr\'].astype(str), \n df[\'city\'])\nRun Code Online (Sandbox Code Playgroud)\n熊猫的替代品:
\ndf[\'final_target\'] = df[\'city\'].mask(df[\'city\'].eq(\'paris\'), \n df[\'city\'] + \'_\' + df[\'arr\'].astype(str))\nRun Code Online (Sandbox Code Playgroud)\ndf[\'final_target\'] = df[\'city\'].where(df[\'city\'].ne(\'paris\'), \n df[\'city\'] + \'_\' + df[\'arr\'].astype(str))\nprint (df)\n city arr final_target\n0 paris 11 paris_11\n1 paris 12 paris_12\n2 dallas 22 dallas\n3 miami 15 miami\n4 paris 16 paris_16\nRun Code Online (Sandbox Code Playgroud)\n表现:
\n#50k rows\ndf = pd.concat([df] * 10000, ignore_index=True)\n \n\nIn [157]: %%timeit\n ...: df[\'final_target\'] = np.where(df[\'city\'].eq(\'paris\'), \n ...: df[\'city\'] + \'_\' + df[\'arr\'].astype(str), \n ...: df[\'city\'])\n ...: \n48.6 ms \xc2\xb1 444 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [158]: %%timeit\n ...: df[\'city\'] + (df[\'city\'] == \'paris\')*(\'_\' + df[\'arr\'].astype(str))\n ...: \n ...: \n49.2 ms \xc2\xb1 1.37 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [159]: %%timeit\n ...: df[\'final_target\'] = df[\'city\']\n ...: df.loc[df[\'city\'] == \'paris\', \'final_target\'] += \'_\' + df[\'arr\'].astype(str)\n ...: \n63.8 ms \xc2\xb1 764 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [160]: %%timeit\n ...: df[\'final_target\'] = df.apply(lambda x: x.city + \'_\' + str(x.arr) if x.city == \'paris\' else x.city, axis = 1)\n ...: \n ...: \n1.33 s \xc2\xb1 119 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
147 次 |
| 最近记录: |