在pandas groupby之后缺少专栏

use*_*329 8 python group-by dataframe pandas

我有一个熊猫数据帧df.我将它分为3列,并计算结果.当我这样做时,我会丢失一些信息,特别是name列.此列与desk_id列以1:1映射.无论如何都要在我的最终数据框中包含两者吗?

这是数据帧:

   shift_id    shift_start_time      shift_end_time        name                   end_time       desk_id  shift_hour
0  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.040000  15557987           2
1  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:16:41.096000  15557987           2
2  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 10:52:17.402000  15557987           2
3  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 11:06:59.083000  15557987           3
4  37423064 2014-01-17 08:00:00 2014-01-17 12:00:00  Adam Scott 2014-01-17 08:27:57.998000  15557987           0
Run Code Online (Sandbox Code Playgroud)

我这样分组:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour']).size()
grouped = grouped.reset_index()
Run Code Online (Sandbox Code Playgroud)

这是结果,错过了name专栏.

    desk_id  shift_id  shift_hour  0
0  14468690  37729081           0  7
1  14468690  37729081           1  3
2  14468690  37729081           2  6
3  14468690  37729081           3  5
4  14468690  37729082           0  5
Run Code Online (Sandbox Code Playgroud)

另外,无论如何将count列重命名为'count'而不是'0'?

CT *_*Zhu 5

您需要包括'name'groupby通过组:

In [43]:

grouped = df.groupby(['desk_id', 'shift_id', 'shift_hour', 'name']).size()
grouped = grouped.reset_index()
grouped.columns=np.where(grouped.columns==0, 'count', grouped.columns) #replace the default 0 to 'count'
print grouped
    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           2  Adam Scott      3
2  15557987  37423064           3  Adam Scott      1
Run Code Online (Sandbox Code Playgroud)

如果name-to-id关系是多对一类型,假设我们对同一组数据有一个pete scott,结果将变为:

    desk_id  shift_id  shift_hour        name  count
0  15557987  37423064           0  Adam Scott      1
1  15557987  37423064           0  Pete Scott      1
2  15557987  37423064           2  Adam Scott      3
3  15557987  37423064           2  Pete Scott      3
4  15557987  37423064           3  Adam Scott      1
5  15557987  37423064           3  Pete Scott      1
Run Code Online (Sandbox Code Playgroud)

  • 所以,如果我想在最终结果中包含`shift_start_time`.添加到groupby列表是好的吗?即使我真的不想按照这个专栏进行分组? (5认同)