我有这样的数据帧
name tag time val
0 ABC A 1 10
0 ABC A 1 12
1 ABC B 1 12
1 ABC B 1 14
2 ABC A 2 11
3 ABC C 2 12
4 DEF B 3 10
5 DEF C 3 9
6 GHI A 4 14
7 GHI B 4 12
8 GHI C 5 10
Run Code Online (Sandbox Code Playgroud)
每行都是一个时间戳,显示该行中名称和标记之间的值.
我想要的是一个数据框,其中每一行显示每个时间戳的每个标记的平均值,如下所示:
name time A B C
0 ABC 1 11.0 13.0 NaN
1 ABC 2 11.0 NaN 12.0
2 DEF 3 NaN 10.0 9.0
3 GHI 4 14.0 12.0 NaN
4 GHI 5 NaN NaN 10.0
Run Code Online (Sandbox Code Playgroud)
我可以通过分组name和time每次返回转置系列来成功实现这一目标:
def transpose_df(observation_df):
ser = pd.Series()
for tag in tags:
ser[tag] = observation_df[observation_df['tag'] == tag]['val'].mean()
return ser
tdf = df.groupby(['name', 'time']).apply(transpose_df).reset_index()
Run Code Online (Sandbox Code Playgroud)
但这很慢.我觉得必须有一个更聪明的方法使用内置的转置/重塑工具,但我无法弄清楚.任何人都可以看到建议更好的选择?
In [175]: df.pivot_table(index=['name','time'], columns='tag', values='val').reset_index()
Out[175]:
tag name time A B C
0 ABC 1 11.0 13.0 NaN
1 ABC 2 11.0 NaN 12.0
2 DEF 3 NaN 10.0 9.0
3 GHI 4 14.0 12.0 NaN
4 GHI 5 NaN NaN 10.0
Run Code Online (Sandbox Code Playgroud)
用途pivot_table:
df.pivot_table(values='val',index=['name','time'],columns='tag',aggfunc='mean').reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
tag name time A B C
0 ABC 1 11.0 13.0 NaN
1 ABC 2 11.0 NaN 12.0
2 DEF 3 NaN 10.0 9.0
3 GHI 4 14.0 12.0 NaN
4 GHI 5 NaN NaN 10.0
Run Code Online (Sandbox Code Playgroud)
使用groupby和unstack
df.groupby(['name','time','tag']).agg('mean')['val'].unstack().reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
tag name time A B C
0 ABC 1 11.0 13.0 NaN
1 ABC 2 11.0 NaN 12.0
2 DEF 3 NaN 10.0 9.0
3 GHI 4 14.0 12.0 NaN
4 GHI 5 NaN NaN 10.0
Run Code Online (Sandbox Code Playgroud)
使用set_index与mean和unstack:
df.set_index(['name','time','tag']).mean(level=[0,1,2])['val'].unstack().reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
tag name time A B C
0 ABC 1 11.0 13.0 NaN
1 ABC 2 11.0 NaN 12.0
2 DEF 3 NaN 10.0 9.0
3 GHI 4 14.0 12.0 NaN
4 GHI 5 NaN NaN 10.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
60 次 |
| 最近记录: |