将 Pandas 数据帧转为具有多层的长格式

arv*_*arv 5 python pivot-table dataframe python-3.x pandas

|            |          Var1                 Var2                 
|------------|------|------|-----|------|------|-----|
|            | SPY  | AAPL | MSFT| SPY  | AAPL | MSFT 
|       Date |      |      |     |      |      |     |         
| 2011-01-03 | 30   | 30  | 30   | 30   | 30  | 30   | 
| 2011-01-04 | 30   | 30  | 30   | 21   | 30  | 30   | 
| 2011-01-05 | 30   | 30  | 30   | 30   | 30  | 30   | 


Run Code Online (Sandbox Code Playgroud)

我怎样才能将上面有多层的数据框转换成如下所示的长格式?预期输出如下所示:

|            | firm | Var1 | Var2 |
|------------|------|------|------|
| Date       |      |      |      |    
| 2011-01-03 | AAPL |   30 |   30 | 
| 2011-01-04 | SPY  |   30 |   30 |
| 2011-01-05 | MSFT |   30 |   30 |  
Run Code Online (Sandbox Code Playgroud)

样本数据 :

df = pd.DataFrame([{('Var1', 'SPY'): 30.0,
      ('Var1', 'AAPL'): 30.0,
      ('Var1', 'MSFT'): 30.0,
      ('Var2', 'SPY'): 30.0,
      ('Var2', 'AAPL'): 30.0,
      ('Var2', 'MSFT'): 30.0},
     {('Var1', 'SPY'): 30.0,
      ('Var1', 'AAPL'): 30.0,
      ('Var1', 'MSFT'): 30.0,
      ('Var2', 'SPY'): 21.0,
      ('Var2', 'AAPL'): 30.0,
      ('Var2', 'MSFT'): 30.0},
     {('Var1', 'SPY'): 30.0,
      ('Var1', 'AAPL'): 30.0,
      ('Var1', 'MSFT'): 30.0,
      ('Var2', 'SPY'): 30.0,
      ('Var2', 'AAPL'): 30.0,
      ('Var2', 'MSFT'): 30.0}]
Run Code Online (Sandbox Code Playgroud)

Pyg*_*irl 2

让我们重现第一个数据帧。

A:

            SPL AAPL MSFT
2011-01-03  30  30  30
2011-01-04  30  30  30
2011-01-05  30  30  30
Run Code Online (Sandbox Code Playgroud)

乙:

            SPL AAPL MSFT
2011-01-03  30  30  30
2011-01-04  21  30  30
2011-01-05  30  30  30
Run Code Online (Sandbox Code Playgroud)
A.columns = pd.MultiIndex.from_product([['Var1'], A.columns])
B.columns = pd.MultiIndex.from_product([['Var2'], B.columns])
df = pd.concat([A, B], axis = 1)
Run Code Online (Sandbox Code Playgroud)

您当前的数据框df

                Var1           Var2
            SPL AAPL MSFT   SPL AAPL MSFT
2011-01-03  30  30  30      30  30  30
2011-01-04  30  30  30      21  30  30
2011-01-05  30  30  30      30  30  30
Run Code Online (Sandbox Code Playgroud)

代码:

df = df.stack().reset_index().rename(columns={'level_0':'Date', 'level_1': 'firm'})
df.set_index(['Date'], inplace=True)
Run Code Online (Sandbox Code Playgroud)

结果df:

            firm    Var1    Var2
Date            
2011-01-03  AAPL    30      30
2011-01-03  MSFT    30      30
2011-01-03  SPL     30      30
2011-01-04  AAPL    30      30
2011-01-04  MSFT    30      30
2011-01-04  SPL     30      21
2011-01-05  AAPL    30      30
2011-01-05  MSFT    30      30
2011-01-05  SPL     30      30
Run Code Online (Sandbox Code Playgroud)