Pandas 将行转换为列

Ksh*_*j G 7 python dataframe pandas pandas-groupby

我有一个 CSV,可以生成以下格式的数据框

--------------------------------------------------------------
|Date       | Fund | TradeGroup | LongShort | Alpha | Details|
--------------------------------------------------------------
|2018-05-22 |A     | TGG-A      | Long      | 3.99  | Misc   |
|2018-05-22 |A     | TGG-B      | Long      | 4.99  | Misc   |
|2018-05-22 |B     | TGG-A      | Long      | 5.99  | Misc   |
|2018-05-22 |B     | TGG-B      | Short     | 6.99  | Misc   |
|2018-05-22 |C     | TGG-A      | Long      | 1.99  | Misc   |
|2018-05-22 |C     | TGG-B      | Long      | 5.29  | Misc   |
--------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我想做的是将 TradeGroups 组合在一起并将 Fund 转换为列。因此,最终的数据框应如下所示:

  --------------------------------------------------------
  |TradeGroup| Date      | A         | B         | C     |
  --------------------------------------------------------
  | TGG-A    |2018-05-22 | 3.99      | 5.99      | 1.99  |
  | TGG-B    |2018-05-22 | 4.99      | 6.99      | 5.29  | 
  --------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

另外,我并不真正关心 LongShort 列和详细信息列。所以,如果它们被丢弃也没关系。谢谢!!我已经尝试过df.pivot(),但没有给出所需的格式

jpp*_*jpp 6

使用pd.pivot_table

res = df.pivot_table(index=['Date', 'TradeGroup'], columns='Fund',
                     values='Alpha', aggfunc='first').reset_index()

print(res)

Fund        Date TradeGroup     A     B     C
0     2018-05-22      TGG-A  3.99  5.99  1.99
1     2018-05-22      TGG-B  4.99  6.99  5.29
Run Code Online (Sandbox Code Playgroud)


Ant*_*vBR 4

看起来您正在尝试从多重索引中取消堆叠列。

尝试这个:

import pandas as pd

data = '''\
Date        Fund  TradeGroup  LongShort  Alpha  Details
2018-05-22 A      TGG-A       Long       3.99   Misc   
2018-05-22 A      TGG-B       Long       4.99   Misc   
2018-05-22 B      TGG-A       Long       5.99   Misc   
2018-05-22 B      TGG-B       Short      6.99   Misc   
2018-05-22 C      TGG-A       Long       1.99   Misc   
2018-05-22 C      TGG-B       Long       5.29   Misc'''

fileobj = pd.compat.StringIO(data)

df = pd.read_csv(fileobj, sep='\s+')

dfout = df.set_index(['TradeGroup','Date','Fund']).unstack()['Alpha']
print(dfout)
Run Code Online (Sandbox Code Playgroud)

返回:

Fund                      A     B     C
TradeGroup Date                        
TGG-A      2018-05-22  3.99  5.99  1.99
TGG-B      2018-05-22  4.99  6.99  5.29
Run Code Online (Sandbox Code Playgroud)

如果你愿意,你也可以应用.reset_index()之后,你会得到:

Fund TradeGroup        Date     A     B     C
0         TGG-A  2018-05-22  3.99  5.99  1.99
1         TGG-B  2018-05-22  4.99  6.99  5.29
Run Code Online (Sandbox Code Playgroud)