使用 Pandas 以特定格式堆叠数据框

Sta*_*tan 0 python transpose pivot-table melt pandas

我有一个 panas 数据框,如下所示:

 df 

   Prod  ProdDesc   tot    avg   qtr        val_qtr
   A      Cyl       110   8.7    202301     12
   A      Cyl       110   8.7    202302     56.9
   A      Cyl       110   8.7    202303      9
   A      Cyl       110   8.7    202304      0
Run Code Online (Sandbox Code Playgroud)

所以我想要的是堆叠/转置数据帧。我用熊猫融化,

    df_tra = df.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value')
    df_tra.drop_duplicates()
Run Code Online (Sandbox Code Playgroud)

所以我的输出如下:

df_tra


    Prod  ProdDesc  Attrib   Value
     A    Cyl       tot      110           
     A    Cyl       avg      8.7           
     A    Cyl       quarter  202301    
     A    Cyl       quarter  202302
     A    Cyl       quarter  202303
     A    Cyl       quarter  202304        
     A    Cyl       val_qtr  12    
     A    Cyl       val_qtr  56.9
     A    Cyl       val_qtr  9
     A    Cyl       val_qtr  0  
Run Code Online (Sandbox Code Playgroud)

但我想要/想要的输出是不同的。我想要的是以下内容:

 df_actual_wanted 

    Prod  ProdDesc  Attrib   Value
    A     Cyl       tot      110           
    A     Cyl       avg      8.7           
    A     Cyl       202301   12 
    A     Cyl       202302   56.9
    A     Cyl       202303    9
    A     Cyl       202304    0     
Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点?

jez*_*ael 5

使用DataFrame.drop_duplicatesand选择多列,并使用byDataFrame.melt与 snoter 子集连接,最后如果需要按两列排序:renameconcat

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value'))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_values(['Prod','ProdDesc'], ignore_index=True)
print (out)
  Prod ProdDesc  Attrib  Value
0    A      Cyl     tot  110.0
1    A      Cyl     avg    8.7
2    A      Cyl  202301   12.0
3    A      Cyl  202302   56.9
4    A      Cyl  202303    9.0
5    A      Cyl  202304    0.0
Run Code Online (Sandbox Code Playgroud)

如果默认索引和排序需要与原始更改解决方案相同:

print (df)
   Prod ProdDesc  tot   avg     qtr  val_qtr
0     A      Cyl  110  8.70  202301     12.0
1     A      Cyl  110  8.70  202302     56.9
2     A      Cyl  110  8.70  202303      9.0
3     A      Cyl  110  8.70  202304      0.0
4     B      Cyl  133  8.76  202301     12.0
5     B      Cyl  133  8.76  202302     56.9
6     B      Cyl  133  8.76  202303      9.0
7     B      Cyl  133  8.76  202304      0.0
8     A     Cyl1  117  8.37  202301     12.0
9     A     Cyl1  117  8.37  202302     56.9
10    A     Cyl1  117  8.37  202303      9.0
11    A     Cyl1  117  8.37  202304      0.0
Run Code Online (Sandbox Code Playgroud)
df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], 
                     var_name='Attrib', 
                     value_name='Value',
                     ignore_index=False))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_index(kind='stable', ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00
Run Code Online (Sandbox Code Playgroud)

如果小数据或性能不重要:

def f(x):
    y = x[['tot','avg']].iloc[0].T.reset_index().set_axis(['Attrib', 'Value'], axis=1)
    return pd.concat([y, x[['Attrib','Value']]])

out = (df.rename(columns={'qtr':'Attrib','val_qtr':'Value'})
         .groupby(['Prod', 'ProdDesc'], sort=False)
         .apply(f)
         .droplevel(-1)
         .reset_index())
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00
Run Code Online (Sandbox Code Playgroud)