Ano*_*onX 3 python dataframe pandas
删除具有低频率值的列的所有行的最佳实践是什么?
数据框:
IN:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
7 c C
8 d B
9 e B
Run Code Online (Sandbox Code Playgroud)
示例1:删除“ poo”列中频率值小于3的所有行:
OUT:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
8 d B
9 e B
Run Code Online (Sandbox Code Playgroud)
示例2:删除“ bar”列中频率值小于3的所有行:
OUT:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
Run Code Online (Sandbox Code Playgroud)
这应该很容易推广。您将需要groupby+ transform+ count,然后过滤结果:
col = 'poo' # 'bar'
n = 3 # 2
df[df.groupby(col)[col].transform('count').ge(n)]
foo bar poo
0 1 a A
1 2 a A
2 3 a B
3 4 b B
4 5 b A
5 6 b A
7 8 d B
8 9 e B
Run Code Online (Sandbox Code Playgroud)
IIUC 过滤器..
df.groupby('poo').filter(lambda x : (x['poo'].count()>=3).any())
Out[81]:
foo bar poo
0 1 a A
1 2 a A
2 3 a B
3 4 b B
4 5 b A
5 6 b A
7 8 d B
8 9 e B
Run Code Online (Sandbox Code Playgroud)
或合并value_counts有isin
s=df.poo.value_counts().gt(3)
df.loc[df.poo.isin(s[s].index)]
Out[89]:
foo bar poo
0 1 a A
1 2 a A
2 3 a B
3 4 b B
4 5 b A
5 6 b A
7 8 d B
8 9 e B
Run Code Online (Sandbox Code Playgroud)