在 Python 中,我有一个 pandas 数据框。我想过滤 column 的一个值A。
I am looking for the row, where column A is the highest value that is smaller than '5' (so if column A does have values '1', '2', '4', '7', it should be '4'). Another condition exists, too.
The following statement does not work.
How do I have to change it with regards to the maximum condition, so that it is working?
df_new = df[(df['some_other_column'] < XYZ) & max(df['A'] <= '5')]
Run Code Online (Sandbox Code Playgroud)
使用np.searchsorted-
df\n\n x\n0 1\n1 2\n2 4\n3 7\n\ndf.iloc[(np.searchsorted(df.x.values, 5) - 1).clip(0)]\n\n x\n2 4\nRun Code Online (Sandbox Code Playgroud)\n\n时间安排
\n\ndf = pd.DataFrame({'x' : np.arange(100000)})\nRun Code Online (Sandbox Code Playgroud)\n\n\n\n%%timeit \nx = df.x\ng = x[x <= 12345].max()\ndf[x == g]\n\n1000 loops, best of 3: 1.27 ms per loop\nRun Code Online (Sandbox Code Playgroud)\n\n\n\n%timeit df.iloc[(np.searchsorted(df.x.values, 12345) - 1).clip(0)]\n10000 loops, best of 3: 139 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n没有可比性,使用起来searchsorted要快得多。