我有一个如下所示的数据框
+----+------+------+-----+-----+
| id | year | sell | buy | own |
+----+------+------+-----+-----+
| 1 | 2016 | 9 | 2 | 10 |
| 1 | 2017 | 9 | 0 | 10 |
| 1 | 2018 | 0 | 2 | 10 |
| 2 | 2016 | 7 | 2 | 11 |
| 2 | 2017 | 2 | 0 | 0 |
| 2 | 2018 | 0 | 0 | 18 |
+----+------+------+-----+-----+
Run Code Online (Sandbox Code Playgroud)
我试图将行转置为列,但不是聚合值,我想保留一些字母,如果不是0(S-Sell,B-Buy,O-Own)。如果特定年份的所有列都有值,那么我需要该年份的 S_B_O。如果只有卖出和买入的值,那么 S_B 等,所以预期输出是
+----+-------+------+------+
| ID | 2016 | 2017 | 2018 |
+----+-------+------+------+
| 1 | S_B_O | S_O | B_O |
+----+-------+------+------+
| 2 | S_B_O | S | O |
+----+-------+------+------+
Run Code Online (Sandbox Code Playgroud)
我是 python 新手,不知道我们如何做到这一点。我只知道聚合的基本支点,如下所示。是否可以?任何建议,将不胜感激。
import pandas as pd
import numpy as np
df=pd.read_excel('Pivot.xlsx')
pivot = pd.pivot_table(df,index=["ID"],columns='year',values ='sell' ,aggfunc = np.sum,fill_value=0)
Run Code Online (Sandbox Code Playgroud)
数据框
id,year,sell,buy,own
1,2016,9,2,10
1,2017,9,0,10
1,2018,0,2,10
2,2016,7,2,11
2,2017,2,0,0
2,2018,0,0,18
Run Code Online (Sandbox Code Playgroud)
你可以df.dot在df.pivot这里使用:
u = df[['sell','buy','own']]
(df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v"))
Run Code Online (Sandbox Code Playgroud)
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
Run Code Online (Sandbox Code Playgroud)
完全格式化;
u = df[['sell','buy','own']]
out = (df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v").rename_axis(columns=None).reset_index())
print(out)
Run Code Online (Sandbox Code Playgroud)
id 2016 2017 2018
0 1 S_B_O S_O B_O
1 2 S_B_O S O
Run Code Online (Sandbox Code Playgroud)