转置和聚合DataFrame

Dav*_*ips 3 python pandas

我有这样的数据帧

  name tag  time  val
0  ABC   A     1   10
0  ABC   A     1   12
1  ABC   B     1   12
1  ABC   B     1   14
2  ABC   A     2   11
3  ABC   C     2   12
4  DEF   B     3   10
5  DEF   C     3    9
6  GHI   A     4   14
7  GHI   B     4   12
8  GHI   C     5   10
Run Code Online (Sandbox Code Playgroud)

每行都是一个时间戳,显示该行中名称和标记之间的值.

我想要的是一个数据框,其中每一行显示每个时间戳的每个标记的平均值,如下所示:

  name  time     A     B     C
0  ABC     1  11.0  13.0   NaN
1  ABC     2  11.0   NaN  12.0
2  DEF     3   NaN  10.0   9.0
3  GHI     4  14.0  12.0   NaN
4  GHI     5   NaN   NaN  10.0
Run Code Online (Sandbox Code Playgroud)

我可以通过分组nametime每次返回转置系列来成功实现这一目标:

def transpose_df(observation_df):
  ser = pd.Series()
  for tag in tags:
    ser[tag] = observation_df[observation_df['tag'] == tag]['val'].mean()
  return ser


tdf = df.groupby(['name', 'time']).apply(transpose_df).reset_index()
Run Code Online (Sandbox Code Playgroud)

但这很慢.我觉得必须有一个更聪明的方法使用内置的转置/重塑工具,但我无法弄清楚.任何人都可以看到建议更好的选择?

Max*_*axU 6

In [175]: df.pivot_table(index=['name','time'], columns='tag', values='val').reset_index()
Out[175]:
tag name  time     A     B     C
0    ABC     1  11.0  13.0   NaN
1    ABC     2  11.0   NaN  12.0
2    DEF     3   NaN  10.0   9.0
3    GHI     4  14.0  12.0   NaN
4    GHI     5   NaN   NaN  10.0
Run Code Online (Sandbox Code Playgroud)


Sco*_*ton 5

选项1

用途pivot_table:

df.pivot_table(values='val',index=['name','time'],columns='tag',aggfunc='mean').reset_index()
Run Code Online (Sandbox Code Playgroud)

输出:

tag name  time     A     B     C
0    ABC     1  11.0  13.0   NaN
1    ABC     2  11.0   NaN  12.0
2    DEF     3   NaN  10.0   9.0
3    GHI     4  14.0  12.0   NaN
4    GHI     5   NaN   NaN  10.0
Run Code Online (Sandbox Code Playgroud)

选项2:

使用groupbyunstack

df.groupby(['name','time','tag']).agg('mean')['val'].unstack().reset_index()
Run Code Online (Sandbox Code Playgroud)

输出:

tag name  time     A     B     C
0    ABC     1  11.0  13.0   NaN
1    ABC     2  11.0   NaN  12.0
2    DEF     3   NaN  10.0   9.0
3    GHI     4  14.0  12.0   NaN
4    GHI     5   NaN   NaN  10.0
Run Code Online (Sandbox Code Playgroud)

选项3

使用set_indexmeanunstack:

df.set_index(['name','time','tag']).mean(level=[0,1,2])['val'].unstack().reset_index()
Run Code Online (Sandbox Code Playgroud)

输出:

tag name  time     A     B     C
0    ABC     1  11.0  13.0   NaN
1    ABC     2  11.0   NaN  12.0
2    DEF     3   NaN  10.0   9.0
3    GHI     4  14.0  12.0   NaN
4    GHI     5   NaN   NaN  10.0
Run Code Online (Sandbox Code Playgroud)