sql选择组由python pandas中的count(1)> 1等效?

tan*_*gkk 7 python sql dataframe pandas

我很难过滤掉groupby熊猫中的物品.我想要做

select email, count(1) as cnt 
from customers 
group by email 
having count(email) > 1 
order by cnt desc
Run Code Online (Sandbox Code Playgroud)

我做到了

customers.groupby('Email')['CustomerID'].size()
Run Code Online (Sandbox Code Playgroud)

它正确地给了我电子邮件列表及其各自的计数,但我无法实现这一having count(email) > 1部分.

email_cnt[email_cnt.size > 1]
Run Code Online (Sandbox Code Playgroud)

回报 1

email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)
Run Code Online (Sandbox Code Playgroud)

给出了客户的整个记录​​,email > 1但我想要聚合表.

Ily*_*rov 6

另外两个解决方案(采用现代“方法链”方法):

使用可调用选择

customers.groupby('Email').size().loc[lambda x: x>1].sort_values()
Run Code Online (Sandbox Code Playgroud)

使用查询方法

(customers.groupby('Email')['CustomerID'].
    agg([len]).query('len > 1').sort_values('len'))
Run Code Online (Sandbox Code Playgroud)


Ale*_*ley 5

不必编写email_cnt[email_cnt.size > 1],而只需编写email_cnt[email_cnt > 1](无需.size再次调用)。这使用布尔级数email_cnt > 1仅返回的相关值email_cnt

例如:

>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
                              'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar      2
foo      3
dtype: int64
Run Code Online (Sandbox Code Playgroud)