fut*_*eer 3 python dataframe pandas drop
我想忽略该职业的唯一名称少于 2 个的行:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
d 30 doctor
d 20 doctor
e 70 plumber
e 71 plumber
f 30 plumber
g 50 tailor
Run Code Online (Sandbox Code Playgroud)
我做了:
df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic 3
doctor 1
plumber 2
tailor 1
Name: name, dtype: int64
Run Code Online (Sandbox Code Playgroud)
是否可以使用类似的东西df = df.drop(df[<some boolean condition>].index)?
期望的输出:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
e 70 plumber
e 71 plumber
f 30 plumber
Run Code Online (Sandbox Code Playgroud)
使用GroupBy.transformwithSeries.ge来获取等于或大于的值,例如2:
df = df[df.groupby('occupation')['name'].transform('nunique').ge(2)]
print (df)
name value occupation
0 a 23 mechanic
1 a 24 mechanic
2 b 30 mechanic
3 c 40 mechanic
4 c 41 mechanic
7 e 70 plumber
8 e 71 plumber
9 f 30 plumber
Run Code Online (Sandbox Code Playgroud)
您的解决方案是在系列中比较索引的过滤值Series.isin:
s = df.groupby('occupation')['name'].nunique()
df = df[df['occupation'].isin(s[s.ge(2)].index)]
print (df)
name value occupation
0 a 23 mechanic
1 a 24 mechanic
2 b 30 mechanic
3 c 40 mechanic
4 c 41 mechanic
7 e 70 plumber
8 e 71 plumber
9 f 30 plumber
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
296 次 |
| 最近记录: |