J_H*_*ads 5 python group-by dataframe pandas pandas-groupby
我正在使用看起来像这样的数据框.
id time diff
0 0 34 nan
1 0 36 2
2 1 43 7
3 1 55 12
4 1 59 4
5 2 2 -57
6 2 10 8
Run Code Online (Sandbox Code Playgroud)
有效的方法是通过id找到'time'的最小值,然后在这些最小值处将'diff'设置为nan.我正在寻找一个解决方案,导致:
id time diff
0 0 34 nan
1 0 36 2
2 1 43 nan
3 1 55 12
4 1 59 4
5 2 2 nan
6 2 10 8
Run Code Online (Sandbox Code Playgroud)
groupby('id')
并用于idxmin
查找最小值的位置'time'
.最后,loc
用来分配np.nan
df.loc[df.groupby('id').time.idxmin(), 'diff'] = np.nan
df
Run Code Online (Sandbox Code Playgroud)
您可以按 id 对时间进行分组并计算一个逻辑向量,如果组内时间最小,则值为 True,否则为 False,并使用逻辑向量分配NaN
给相应的行:
import numpy as np
import pandas as pd
df.loc[df.groupby('id')['time'].apply(lambda g: g == min(g)), "diff"] = np.nan
df
# id time diff
#0 0 34 NaN
#1 0 36 2.0
#2 1 43 NaN
#3 1 55 12.0
#4 1 59 4.0
#5 2 2 NaN
#6 2 10 8.0
Run Code Online (Sandbox Code Playgroud)