我有一个 pandas 数据框,如下所示
Dominant_Topic word appearance
Topic 0 aaaawww 50
Topic 0 aacn 100
Topic 0 aaren 20
Topic 0 aarongoodwin 200
Topic 1 aaronjfentress 10
Topic 1 aaronrodger 20
Topic 1 aasmiitkap 30
Topic 2 aavqbketmh 10
Topic 2 ab 10
Topic 2 abandon 1
Run Code Online (Sandbox Code Playgroud)
我想为每个分区获得密集排名,分区列是名为 的列Dominant_Topic。排名应根据每个分区中单词出现的次数降序排列。所以输出看起来像 -
Dominant_Topic word appearance dense_rank
Topic 0 aaaawww 50 3
Topic 0 aacn 100 2
Topic 0 aaren 20 4
Topic 0 aarongoodwin 200 1
Topic 1 aaronjfentress 10 3
Topic 1 aaronrodger 20 2
Topic 1 aasmiitkap 30 1
Topic 2 aavqbketmh 10 1
Topic 2 ab 10 1
Topic 2 abandon 1 2
Run Code Online (Sandbox Code Playgroud)
我如何在 Pandas 中实现这一目标?
SQL 等效项看起来像这样 -
select *, dense_rank() over( partition by dominant_topic order by appearance desc)
from table
Run Code Online (Sandbox Code Playgroud)
这是内置的groupby:
df['dense_rank'] = (df.groupby('Dominant_Topic')['appearance']
.rank(method='dense', ascending=False)
.astype(int)
)
Run Code Online (Sandbox Code Playgroud)
输出:
Dominant_Topic word appearance dense_rank
0 Topic 0 aaaawww 50 3
1 Topic 0 aacn 100 2
2 Topic 0 aaren 20 4
3 Topic 0 aarongoodwin 200 1
4 Topic 1 aaronjfentress 10 3
5 Topic 1 aaronrodger 20 2
6 Topic 1 aasmiitkap 30 1
7 Topic 2 aavqbketmh 10 1
8 Topic 2 ab 10 1
9 Topic 2 abandon 1 2
Run Code Online (Sandbox Code Playgroud)