在Python列中查找最大值

Ada*_*dam 5 python group-by pandas pandas-groupby

combined_ranking_df在pandas python中有这样的数据框():

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
2              24259   1.0                         NaN
3              24259   6.0                         WIP
4              14251   8.0                         deployed
5              14250   1.0                         NaN
6              14250   6.0                         WIP
7              14250   5.0                         NaN
8              14250   5.0                         NaN
9              14250   1.0                         NaN
Run Code Online (Sandbox Code Playgroud)

我想获得每个id的最大值.例如,14250它应该是6.0.24259它应该是6.0.

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
3              24259   6.0                         WIP
4              14251   8.0                         deployed
6              14250   6.0                         WIP
Run Code Online (Sandbox Code Playgroud)

我尝试过,combined_ranking_df.groupby(['Id'], sort=False)['Rank'].max()但我实现的结果是第一个dataframe(没有改变).

我究竟做错了什么?

piR*_*red 9

选项1
与@ ayhan的答案相同这里
通过对'Id'每组中最后一个位置留下最大值的数据帧进行排序回答问题. pd.DataFrame.drop_duplicates使我们能够保持每组的第一个或最后一个.然而,这是一个非常快速的方便巧合.它没有概括说每个前两个'Id'.

df.sort_values('Rank').drop_duplicates('Id', 'last')

      Id  Rank  Activity
3  24259   6.0       WIP
6  14250   6.0       WIP
0  14035   8.0  deployed
1  47728   8.0  deployed
4  14251   8.0  deployed
Run Code Online (Sandbox Code Playgroud)

您可以在最后对索引进行排序

df.sort_values('Rank').drop_duplicates('Id', 'last').sort_index()

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP
Run Code Online (Sandbox Code Playgroud)

选项2
groupbyidxmax
这是我认为解决这个问题最惯用的方法.@ MaxU的回答是,推广到最大最好的方式n'Id'.

df.loc[df.groupby('Id', sort=False).Rank.idxmax()]

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP
Run Code Online (Sandbox Code Playgroud)


Max*_*axU 6

IIUC:

In [40]: df.groupby('Id', as_index=False, sort=False) \
           .apply(lambda x: x.nlargest(1, ['Rank'])) \
    ...:   .reset_index(level=1, drop=True)
Out[40]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
2  24259   6.0       WIP
3  14251   8.0  deployed
4  14250   6.0       WIP
Run Code Online (Sandbox Code Playgroud)

@piRSquared更好的版本:

In [41]: df.groupby('Id', group_keys=False, sort=False) \
           .apply(pd.DataFrame.nlargest, n=1, columns='Rank')
Out[41]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP
Run Code Online (Sandbox Code Playgroud)


Die*_*ado 4

尝试存储它,然后查阅存储的groupedby

groups = combined_ranking_df.groupby(['Id'], as_index=False, sort=False).max()[['Id','Rank']].

      Id  Rank
0  14035   8.0
1  47728   8.0
2  24259   6.0
3  14251   8.0
4  14250   6.0
Run Code Online (Sandbox Code Playgroud)