sql选择组由python pandas中的count(1)> 1等效？

Question

sql选择组由python pandas中的count(1)> 1等效？

我很难过滤掉groupby熊猫中的物品.我想要做

select email, count(1) as cnt 
from customers 
group by email 
having count(email) > 1 
order by cnt desc

Run Code Online (Sandbox Code Playgroud)

我做到了

customers.groupby('Email')['CustomerID'].size()

Run Code Online (Sandbox Code Playgroud)

它正确地给了我电子邮件列表及其各自的计数,但我无法实现这一having count(email) > 1部分.

email_cnt[email_cnt.size > 1]

Run Code Online (Sandbox Code Playgroud)

回报 1

email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)

Run Code Online (Sandbox Code Playgroud)

给出了客户的整个记录,email > 1但我想要聚合表.

Answer 1

Ily*_*rov 6

另外两个解决方案（采用现代“方法链”方法）：

使用可调用选择：

customers.groupby('Email').size().loc[lambda x: x>1].sort_values()

Run Code Online (Sandbox Code Playgroud)

使用查询方法：

(customers.groupby('Email')['CustomerID'].
    agg([len]).query('len > 1').sort_values('len'))

Run Code Online (Sandbox Code Playgroud)

Answer 2

Ale*_*ley 5

不必编写email_cnt[email_cnt.size > 1]，而只需编写email_cnt[email_cnt > 1]（无需.size再次调用）。这使用布尔级数email_cnt > 1仅返回的相关值email_cnt。

例如：

>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
                              'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar      2
foo      3
dtype: int64

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	5659 次
最近记录：	8 年，5 月前