tan*_*gkk 7 python sql dataframe pandas
我很难过滤掉groupby熊猫中的物品.我想要做
select email, count(1) as cnt
from customers
group by email
having count(email) > 1
order by cnt desc
Run Code Online (Sandbox Code Playgroud)
我做到了
customers.groupby('Email')['CustomerID'].size()
Run Code Online (Sandbox Code Playgroud)
它正确地给了我电子邮件列表及其各自的计数,但我无法实现这一having count(email) > 1部分.
email_cnt[email_cnt.size > 1]
Run Code Online (Sandbox Code Playgroud)
回报 1
email_cnt = customers.groupby('Email')
email_dup = email_cnt.filter(lambda x:len(x) > 2)
Run Code Online (Sandbox Code Playgroud)
给出了客户的整个记录,email > 1但我想要聚合表.
另外两个解决方案(采用现代“方法链”方法):
使用可调用选择:
customers.groupby('Email').size().loc[lambda x: x>1].sort_values()
Run Code Online (Sandbox Code Playgroud)
使用查询方法:
(customers.groupby('Email')['CustomerID'].
agg([len]).query('len > 1').sort_values('len'))
Run Code Online (Sandbox Code Playgroud)
不必编写email_cnt[email_cnt.size > 1],而只需编写email_cnt[email_cnt > 1](无需.size再次调用)。这使用布尔级数email_cnt > 1仅返回的相关值email_cnt。
例如:
>>> customers = pd.DataFrame({'Email':['foo','bar','foo','foo','baz','bar'],
'CustomerID':[1,2,1,2,1,1]})
>>> email_cnt = customers.groupby('Email')['CustomerID'].size()
>>> email_cnt[email_cnt > 1]
Email
bar 2
foo 3
dtype: int64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5659 次 |
| 最近记录: |