rig*_*ere 5 python dataframe pandas
我有一个像这样的数据框:
df = pd.DataFrame({'asset_id': [10,10, 10, 20, 20, 20], 'method_id': ['p2','p3','p4', 'p3', 'p1', 'p2'], 'method_rank': [5, 2, 2, 2, 5, 1], 'conf_score': [0.8, 0.6, 0.8, 0.9, 0.7, 0.5]} , columns= ['asset_id', 'method_id','method_rank', 'conf_score'])
Run Code Online (Sandbox Code Playgroud)
它看起来像这样:
asset_id method_id method_rank conf_score
0 10 p2 5 0.8
1 10 p3 2 0.6
2 10 p4 2 0.8
3 20 p3 2 0.9
4 20 p1 5 0.7
5 20 p2 1 0.5
Run Code Online (Sandbox Code Playgroud)
我想按资产 ID 对行进行分组,然后根据method_rank升序和conf_score降序为每行提供总体排名。
IE。我希望结果看起来像这样:
asset_id method_id method_rank conf_score overall_rank
5 20 p2 1 0.5 1.0
3 20 p3 2 0.9 2.0
2 10 p4 2 0.8 1.0
1 10 p3 2 0.6 2.0
0 10 p2 5 0.8 3.0
4 20 p1 5 0.7 3.0
Run Code Online (Sandbox Code Playgroud)
如何使用 pandas 中的分组依据和排名来做到这一点?看起来在 pandas 中你只能基于一列来完成,比如
df["overall_rank"] = df.groupby('asset_id')['method_rank'].rank("first")
Run Code Online (Sandbox Code Playgroud)
但我想实现类似的目标
df["overall_rank"] = df.groupby('asset_id')[['method_rank', 'conf_score']].rank("first", ascending = [True, False])
Run Code Online (Sandbox Code Playgroud)
我该怎么做呢?我知道一种黑客方法是首先sort_values在整个数据帧上使用,然后执行groupby,但是当我只想对每个组中的几行进行排序时,对整个数据帧的行进行排序似乎太昂贵了。
l m*_*zhi 10
方法一:
\ndf.sort_values([\'asset_id\', \'method_rank\', \'conf_score\'], ascending=[True, True, False], inplace=True)\ndf[\'overall_rank\'] = 1\ndf[\'overall_rank\'] = df.groupby([\'asset_id\'])[\'overall_rank\'].cumsum()\nRun Code Online (Sandbox Code Playgroud)\ndf
\n asset_id method_id method_rank conf_score overall_rank\n2 10 p4 2 0.8 1\n1 10 p3 2 0.6 2\n0 10 p2 5 0.8 3\n5 20 p2 1 0.5 1\n3 20 p3 2 0.9 2\n4 20 p1 5 0.7 3\nRun Code Online (Sandbox Code Playgroud)\n方法2:
\n定义一个函数对每个组进行排序:
\ndf.sort_values([\'asset_id\', \'method_rank\', \'conf_score\'], ascending=[True, True, False], inplace=True)\ndf[\'overall_rank\'] = 1\ndf[\'overall_rank\'] = df.groupby([\'asset_id\'])[\'overall_rank\'].cumsum()\nRun Code Online (Sandbox Code Playgroud)\n性能测试:
\ndef run1(df):\n df = df.sort_values([\'asset_id\', \'method_rank\', \'conf_score\'], ascending=[True, True, False])\n df[\'overall_rank\'] = 1\n df[\'overall_rank\'] = df.groupby([\'asset_id\'])[\'overall_rank\'].cumsum() \n return df\n\ndef handle_group(group):\n group.sort_values([\'method_rank\', \'conf_score\'], ascending=[True, False], inplace=True)\n group[\'overall_rank\'] = np.arange(1, len(group)+1)\n return group\n\ndef run2(df):\n df = df.groupby(\'asset_id\', as_index=False).apply(handle_group)\n return df\n\ndfn = pd.concat([df]*10000, ignore_index=True)\n\n%%timeit\ndf1 = run1(dfn)\n# 8.61 ms \xc2\xb1 317 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n\n%%timeit\ndf2 = run2(dfn).droplevel(0)\n# 31.6 ms \xc2\xb1 404 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\nRun Code Online (Sandbox Code Playgroud)\n