我有一张桌子:
country | name | medals_won | year
-----------------------------------
US | sarah | 1 | 2010
US | sarah | 2 | 2011
US | sarah | 5 | 2015
US | alice | 3 | 2010
US | alice | 4 | 2012
US | alice | 1 | 2015
AU | jones | 2 | 2013
AU | jones | 8 | 2015
Run Code Online (Sandbox Code Playgroud)
我希望它像:
country | name | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US | sarah | 1 | 2 | 0 | 0 | 0 | 5
US | alice | 3 | 0 | 4 | 0 | 0 | 1
AU | jones | 0 | 0 | 0 | 2 | 0 | 8
Run Code Online (Sandbox Code Playgroud)
我已经修改过df.apply,甚至是暴力迭代,但你可能会猜到棘手的部分是这些行值并不是严格顺序的,所以这不是一个简单的转置操作(没有人在2014年获得任何奖牌,例如,但我希望结果表显示在一个满是零的列中.
df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year 2010 2011 2012 2013 2015
country name
AU jones 0 0 0 2 8
US alice 3 0 4 0 1
sarah 1 2 0 0 5
Run Code Online (Sandbox Code Playgroud)
如果重复项需要聚合,例如mean,sum... with pivot_table或groupby+ aggregate function+ unstack:
print (df)
country name medals_won year
0 US sarah 1 2010 <-same US sarah 2010, different 1
1 US sarah 4 2010 <-same US sarah 2010, different 4
2 US sarah 2 2011
3 US sarah 5 2015
4 US alice 3 2010
5 US alice 4 2012
6 US alice 1 2015
7 AU jones 2 2013
8 AU jones 8 2015
df = df.pivot_table(index=['country','name'],
columns='year',
values='medals_won',
fill_value=0,
aggfunc='mean')
print (df)
year 2010 2011 2012 2013 2015
country name
AU jones 0.0 0 0 2 8
US alice 3.0 0 4 0 1
sarah 2.5 2 0 0 5 <- (1+4)/2 = 2.5
Run Code Online (Sandbox Code Playgroud)
或者:
df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year 2010 2011 2012 2013 2015
country name
AU jones 0.0 0.0 0.0 2.0 8.0
US alice 3.0 0.0 4.0 0.0 1.0
sarah 2.5 2.0 0.0 0.0 5.0
Run Code Online (Sandbox Code Playgroud)
持续:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
country name 2010 2011 2012 2013 2015
0 AU jones 0 0 0 2 8
1 US alice 3 0 4 0 1
2 US sarah 1 2 0 0 5
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
651 次 |
| 最近记录: |