当频率小于3时,如何进行逐列计数和更改值?

aid*_*att 3 python replace pandas

我有一个日期框架,有很多行,有一些低频值.我需要进行逐列计数,然后在频率小于3时更改值.

DF-输入

Col1     Col2     Col3       Col4
 1        apple    tomato     apple
 1        apple    potato     nan
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        grape    tomato     banana
 1        pear     tomato     banana
 1        lemon    tomato     burger
Run Code Online (Sandbox Code Playgroud)

DF-输出

Col1     Col2     Col3       Col4
 1        apple    tomato     Other
 1        apple    Other      nan
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        Other    tomato     banana
 1        Other    tomato     banana
 1        Other    tomato     Other
Run Code Online (Sandbox Code Playgroud)

Sco*_*ton 5

您可以使用where具有value_counts:

df.where(df.apply(lambda x: x.groupby(x).transform('count')>2), 'Other')
Run Code Online (Sandbox Code Playgroud)

输出:

       Col2    Col3    Col4
Col1                       
1     apple  tomato   Other
1     apple   Other  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     Other  tomato  banana
1     Other  tomato  banana
1     Other  tomato   Other
Run Code Online (Sandbox Code Playgroud)

更新:在原始数据框中处理NaN:

d = df.apply(lambda x: x.groupby(x).transform('count'))
df.where(d.gt(2.0).where(d.notnull()).astype(bool), 'Other')
Run Code Online (Sandbox Code Playgroud)

输出:

       Col2    Col3    Col4
Col1                       
1     apple  tomato   Other
1     apple   Other     NaN
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     Other  tomato  banana
1     Other  tomato  banana
1     Other  tomato   Other
Run Code Online (Sandbox Code Playgroud)