cph*_*sto 2 python dataframe pandas
我希望SQL在 Python 中有一个样式聚合。
# Example DataFrame
df = pd.DataFrame({'ID':[1,1,2,2,2],
'revenue':[1,3,5,1,5],
'month':['2012-01-01','2012-01-01','2012-03-01','2014-01-01','2012-01-01']})
print(df)
ID month revenue
0 1 2012-01-01 1
1 1 2012-01-01 3
2 2 2012-03-01 5
3 2 2014-01-01 1
4 2 2012-01-01 5
Run Code Online (Sandbox Code Playgroud)
现在,我想计算出总revenue的唯一编号months和第一month每一个ID。我得到了我想要的数字,但不是列名样式,因为它们分布在两行中。
df = df.groupby(['ID']).agg({'revenue':'sum','month':['nunique','first']}).reset_index()
print(df)
ID revenue month
sum nunique first
0 1 4 1 2012-01-01
1 2 11 3 2012-03-01
Run Code Online (Sandbox Code Playgroud)
普通的 SQL 脚本类似于以下伪代码 -
select ID, sum(revenue) as revenue, count(month) as distinct_m, first(month) as first_m from table group by ID ...
Run Code Online (Sandbox Code Playgroud)
我想要的输出:
ID revenue distinct_m first_m
0 1 4 1 2012-01-01
1 2 11 3 2012-03-01
Run Code Online (Sandbox Code Playgroud)
你可以试试这个。
df.groupby('ID').agg(revenue = ('revenue','sum'),
distinct_m = ('month','nunique'),
first_m = ('month','first')).reset_index()
ID revenue distinct_m first_m
1 4 1 2012-01-01
2 11 3 2012-03-01
Run Code Online (Sandbox Code Playgroud)