将 pandas 单元格与条件连接起来的最 Pythonic 方法

Ale*_*ana 4 python dataframe python-3.x pandas

我有以下 Pandas DataFrame,其中包含 city 和 arr 列:

city      arr  final_target
paris     11   paris_11
paris     12   paris_12
dallas    22   dallas
miami     15   miami
paris     16   paris_16
Run Code Online (Sandbox Code Playgroud)

我的目标是当城市名称是巴黎时,填充连接巴黎和arr号的final_target列,当名称不是巴黎时,只填充名称。

最Pythonic的方法是什么?

jez*_*ael 5

\n

最Pythonic的方法是什么?

\n
\n

这取决于定义。如果它是更优选、最常见和最快的方法,那么np.where解决方案就是这里最Pythonic的方法。

\n
\n

使用numpy.where,如果需要 pandaic,该解决方案也是矢量化的,因此应该更可取,例如apply(在引擎盖下循环)

\n
df[\'final_target\'] = np.where(df[\'city\'].eq(\'paris\'), \n                              df[\'city\'] + \'_\' + df[\'arr\'].astype(str), \n                              df[\'city\'])\n
Run Code Online (Sandbox Code Playgroud)\n

熊猫的替代品:

\n
df[\'final_target\'] = df[\'city\'].mask(df[\'city\'].eq(\'paris\'), \n                                     df[\'city\'] + \'_\' + df[\'arr\'].astype(str))\n
Run Code Online (Sandbox Code Playgroud)\n
\n
df[\'final_target\'] = df[\'city\'].where(df[\'city\'].ne(\'paris\'), \n                                      df[\'city\'] + \'_\' + df[\'arr\'].astype(str))\nprint (df)\n     city  arr final_target\n0   paris   11     paris_11\n1   paris   12     paris_12\n2  dallas   22       dallas\n3   miami   15        miami\n4   paris   16     paris_16\n
Run Code Online (Sandbox Code Playgroud)\n

表现

\n
#50k rows\ndf = pd.concat([df] * 10000, ignore_index=True)\n    \n\nIn [157]: %%timeit\n     ...: df[\'final_target\'] = np.where(df[\'city\'].eq(\'paris\'), \n     ...:                               df[\'city\'] + \'_\' + df[\'arr\'].astype(str), \n     ...:                               df[\'city\'])\n     ...:                               \n48.6 ms \xc2\xb1 444 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [158]: %%timeit\n     ...: df[\'city\'] + (df[\'city\'] == \'paris\')*(\'_\' + df[\'arr\'].astype(str))\n     ...: \n     ...: \n49.2 ms \xc2\xb1 1.37 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [159]: %%timeit\n     ...: df[\'final_target\'] = df[\'city\']\n     ...: df.loc[df[\'city\'] == \'paris\', \'final_target\'] +=  \'_\' + df[\'arr\'].astype(str)\n     ...: \n63.8 ms \xc2\xb1 764 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [160]: %%timeit\n     ...: df[\'final_target\'] = df.apply(lambda x: x.city + \'_\' + str(x.arr) if x.city == \'paris\' else x.city, axis = 1)\n     ...: \n     ...: \n1.33 s \xc2\xb1 119 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n