用groupby替换值意味着

Def*_*_Os 8 python pandas pandas-groupby

我有一个DataFrame,其列包含一些带有各种负值的错误数据.我想将值<0替换为它们所在组的平均值.

对于作为NA的缺失值,我会这样做:

data = df.groupby(['GroupID']).column
data.transform(lambda x: x.fillna(x.mean()))
Run Code Online (Sandbox Code Playgroud)

但是如何在这样的条件下做这个操作x < 0呢?

谢谢!

unu*_*tbu 9

使用@ AndyHayden的示例,您可以使用groupby/ transformwith replace:

df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
print(df)
#    a  b
# 0  1  1
# 1  1 -1
# 2  2  1
# 3  2  2

data = df.groupby(['a'])
def replace(group):
    mask = group<0
    # Select those values where it is < 0, and replace
    # them with the mean of the values which are not < 0.
    group[mask] = group[~mask].mean()
    return group
print(data.transform(replace))
#    b
# 0  1
# 1  1
# 2  1
# 3  2
Run Code Online (Sandbox Code Playgroud)