熊猫融化多重价值

Lau*_*raF 3 python melt pandas

我有一个像这样的宽格式的数据集

   Index Country     Variable 2000 2001 2002 2003 2004 2005
   0     Argentina   var1     12   15   18    17  23   29
   1     Argentina   var2     1    3    2     5   7    5
   2     Brazil      var1     20   23   25   29   31   32
   3     Brazil      var2     0    1    2    2    3    3
Run Code Online (Sandbox Code Playgroud)

我希望将我的数据重新整形为长,以便那一年,var1和var2成为新列

  Index Country     year   var1 var2
  0     Argentina   2000   12   1
  1     Argentina   2001   15   3
  2     Argentina   2002   18   2
  ....
  6     Brazil      2000   20   0
  7     Brazil      2001   23   1
Run Code Online (Sandbox Code Playgroud)

当我通过编写只有一个变量时,我得到了我的代码

df=(pd.melt(df,id_vars='Country',value_name='Var1', var_name='year'))
Run Code Online (Sandbox Code Playgroud)

我无法弄清楚如何为var1,var2,var3等做这个.

ayh*_*han 8

您可以使用堆栈和取消堆栈的组合来代替融化:

(df.set_index(['Country', 'Variable'])
   .rename_axis(['Year'], axis=1)
   .stack()
   .unstack('Variable')
   .reset_index())

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3
Run Code Online (Sandbox Code Playgroud)


Sco*_*ton 6

选项1

meltthen unstack用于var1,var2等...

(df1.melt(id_vars=['Country','Variable'],var_name='Year')
    .set_index(['Country','Year','Variable'])
    .squeeze()
    .unstack()
    .reset_index())
Run Code Online (Sandbox Code Playgroud)

输出:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3
Run Code Online (Sandbox Code Playgroud)

选项2

pivot然后使用stack

(df1.pivot(index='Country',columns='Variable')
   .stack(0)
   .rename_axis(['Country','Year'])
   .reset_index())
Run Code Online (Sandbox Code Playgroud)

输出:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3
Run Code Online (Sandbox Code Playgroud)

选项3(ayhan的解决方案)

使用set_indexstack以及unstack

(df.set_index(['Country', 'Variable'])
   .rename_axis(['Year'], axis=1)
   .stack()
   .unstack('Variable')
   .reset_index())
Run Code Online (Sandbox Code Playgroud)

输出:

Variable    Country  Year  var1  var2
0         Argentina  2000    12     1
1         Argentina  2001    15     3
2         Argentina  2002    18     2
3         Argentina  2003    17     5
4         Argentina  2004    23     7
5         Argentina  2005    29     5
6            Brazil  2000    20     0
7            Brazil  2001    23     1
8            Brazil  2002    25     2
9            Brazil  2003    29     2
10           Brazil  2004    31     3
11           Brazil  2005    32     3
Run Code Online (Sandbox Code Playgroud)

  • 我喜欢你以非常易于阅读的方式布置选项的方式:-) (2认同)