Sta*_*tan 0 python transpose pivot-table melt pandas
我有一个 panas 数据框,如下所示:
df
Prod ProdDesc tot avg qtr val_qtr
A Cyl 110 8.7 202301 12
A Cyl 110 8.7 202302 56.9
A Cyl 110 8.7 202303 9
A Cyl 110 8.7 202304 0
Run Code Online (Sandbox Code Playgroud)
所以我想要的是堆叠/转置数据帧。我用熊猫融化,
df_tra = df.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value')
df_tra.drop_duplicates()
Run Code Online (Sandbox Code Playgroud)
所以我的输出如下:
df_tra
Prod ProdDesc Attrib Value
A Cyl tot 110
A Cyl avg 8.7
A Cyl quarter 202301
A Cyl quarter 202302
A Cyl quarter 202303
A Cyl quarter 202304
A Cyl val_qtr 12
A Cyl val_qtr 56.9
A Cyl val_qtr 9
A Cyl val_qtr 0
Run Code Online (Sandbox Code Playgroud)
但我想要/想要的输出是不同的。我想要的是以下内容:
df_actual_wanted
Prod ProdDesc Attrib Value
A Cyl tot 110
A Cyl avg 8.7
A Cyl 202301 12
A Cyl 202302 56.9
A Cyl 202303 9
A Cyl 202304 0
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?
使用DataFrame.drop_duplicatesand选择多列,并使用byDataFrame.melt与 snoter 子集连接,最后如果需要按两列排序:renameconcat
df1 = (df[['Prod','ProdDesc','tot','avg']]
.drop_duplicates()
.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value'))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
.rename(columns={'qtr':'Attrib','val_qtr':'Value'}))
out = pd.concat([df1, df2]).sort_values(['Prod','ProdDesc'], ignore_index=True)
print (out)
Prod ProdDesc Attrib Value
0 A Cyl tot 110.0
1 A Cyl avg 8.7
2 A Cyl 202301 12.0
3 A Cyl 202302 56.9
4 A Cyl 202303 9.0
5 A Cyl 202304 0.0
Run Code Online (Sandbox Code Playgroud)
如果默认索引和排序需要与原始更改解决方案相同:
print (df)
Prod ProdDesc tot avg qtr val_qtr
0 A Cyl 110 8.70 202301 12.0
1 A Cyl 110 8.70 202302 56.9
2 A Cyl 110 8.70 202303 9.0
3 A Cyl 110 8.70 202304 0.0
4 B Cyl 133 8.76 202301 12.0
5 B Cyl 133 8.76 202302 56.9
6 B Cyl 133 8.76 202303 9.0
7 B Cyl 133 8.76 202304 0.0
8 A Cyl1 117 8.37 202301 12.0
9 A Cyl1 117 8.37 202302 56.9
10 A Cyl1 117 8.37 202303 9.0
11 A Cyl1 117 8.37 202304 0.0
Run Code Online (Sandbox Code Playgroud)
df1 = (df[['Prod','ProdDesc','tot','avg']]
.drop_duplicates()
.melt(id_vars=['Prod', 'ProdDesc'],
var_name='Attrib',
value_name='Value',
ignore_index=False))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
.rename(columns={'qtr':'Attrib','val_qtr':'Value'}))
out = pd.concat([df1, df2]).sort_index(kind='stable', ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
print (out)
Prod ProdDesc Attrib Value
0 A Cyl tot 110.00
1 A Cyl avg 8.70
2 A Cyl 202301 12.00
3 A Cyl 202302 56.90
4 A Cyl 202303 9.00
5 A Cyl 202304 0.00
6 B Cyl tot 133.00
7 B Cyl avg 8.76
8 B Cyl 202301 12.00
9 B Cyl 202302 56.90
10 B Cyl 202303 9.00
11 B Cyl 202304 0.00
12 A Cyl1 tot 117.00
13 A Cyl1 avg 8.37
14 A Cyl1 202301 12.00
15 A Cyl1 202302 56.90
16 A Cyl1 202303 9.00
17 A Cyl1 202304 0.00
Run Code Online (Sandbox Code Playgroud)
如果小数据或性能不重要:
def f(x):
y = x[['tot','avg']].iloc[0].T.reset_index().set_axis(['Attrib', 'Value'], axis=1)
return pd.concat([y, x[['Attrib','Value']]])
out = (df.rename(columns={'qtr':'Attrib','val_qtr':'Value'})
.groupby(['Prod', 'ProdDesc'], sort=False)
.apply(f)
.droplevel(-1)
.reset_index())
print (out)
Prod ProdDesc Attrib Value
0 A Cyl tot 110.00
1 A Cyl avg 8.70
2 A Cyl 202301 12.00
3 A Cyl 202302 56.90
4 A Cyl 202303 9.00
5 A Cyl 202304 0.00
6 B Cyl tot 133.00
7 B Cyl avg 8.76
8 B Cyl 202301 12.00
9 B Cyl 202302 56.90
10 B Cyl 202303 9.00
11 B Cyl 202304 0.00
12 A Cyl1 tot 117.00
13 A Cyl1 avg 8.37
14 A Cyl1 202301 12.00
15 A Cyl1 202302 56.90
16 A Cyl1 202303 9.00
17 A Cyl1 202304 0.00
Run Code Online (Sandbox Code Playgroud)