python pandas:将列转换为行

And*_*Ito 3 python pandas

我有一张桌子:

country | name  | medals_won | year
-----------------------------------
US      | sarah |      1     | 2010
US      | sarah |      2     | 2011
US      | sarah |      5     | 2015
US      | alice |      3     | 2010
US      | alice |      4     | 2012
US      | alice |      1     | 2015
AU      | jones |      2     | 2013
AU      | jones |      8     | 2015
Run Code Online (Sandbox Code Playgroud)

我希望它像:

country | name  | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US      | sarah | 1    | 2    | 0    | 0    | 0    | 5
US      | alice | 3    | 0    | 4    | 0    | 0    | 1
AU      | jones | 0    | 0    | 0    | 2    | 0    | 8
Run Code Online (Sandbox Code Playgroud)

我已经修改过df.apply,甚至是暴力迭代,但你可能会猜到棘手的部分是这些行值并不是严格顺序的,所以这不是一个简单的转置操作(没有人在2014年获得任何奖牌,例如,但我希望结果表显示在一个满是零的列中.

jez*_*ael 5

你可以使用set_index+ unstack:

df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones     0     0     0     2     8
US      alice     3     0     4     0     1
        sarah     1     2     0     0     5
Run Code Online (Sandbox Code Playgroud)

如果重复项需要聚合,例如mean,sum... with pivot_tablegroupby+ aggregate function+ unstack:

print (df)
  country   name  medals_won  year
0      US  sarah           1  2010 <-same US  sarah 2010, different 1
1      US  sarah           4  2010 <-same US  sarah 2010, different 4
2      US  sarah           2  2011
3      US  sarah           5  2015
4      US  alice           3  2010
5      US  alice           4  2012
6      US  alice           1  2015
7      AU  jones           2  2013
8      AU  jones           8  2015

df = df.pivot_table(index=['country','name'], 
                    columns='year', 
                    values='medals_won', 
                    fill_value=0, 
                    aggfunc='mean')
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0     0     0     2     8
US      alice   3.0     0     4     0     1
        sarah   2.5     2     0     0     5 <- (1+4)/2 = 2.5
Run Code Online (Sandbox Code Playgroud)

或者:

df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0   0.0   0.0   2.0   8.0
US      alice   3.0   0.0   4.0   0.0   1.0
        sarah   2.5   2.0   0.0   0.0   5.0
Run Code Online (Sandbox Code Playgroud)

持续:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  country   name  2010  2011  2012  2013  2015
0      AU  jones     0     0     0     2     8
1      US  alice     3     0     4     0     1
2      US  sarah     1     2     0     0     5
Run Code Online (Sandbox Code Playgroud)

  • 现在,这只是贪婪的所有答案.:-) (2认同)