可以根据 nunique 值删除数据框中的行吗?

fut*_*eer 3 python dataframe pandas drop

我想忽略该职业的唯一名称少于 2 个的行:

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   d           30      doctor
   d           20      doctor
   e           70      plumber
   e           71      plumber
   f           30      plumber
   g           50      tailor
Run Code Online (Sandbox Code Playgroud)

我做了:

df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic   3
doctor     1
plumber    2
tailor     1
Name: name, dtype: int64
Run Code Online (Sandbox Code Playgroud)

是否可以使用类似的东西df = df.drop(df[<some boolean condition>].index)

期望的输出:

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   e           70      plumber
   e           71      plumber
   f           30      plumber
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 6

使用GroupBy.transformwithSeries.ge来获取等于或大于的值,例如2

df = df[df.groupby('occupation')['name'].transform('nunique').ge(2)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber
Run Code Online (Sandbox Code Playgroud)

您的解决方案是在系列中比较索引的过滤值Series.isin

s = df.groupby('occupation')['name'].nunique()

df = df[df['occupation'].isin(s[s.ge(2)].index)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber
Run Code Online (Sandbox Code Playgroud)