如何在 pandas 的每个分区窗口中获得密集排名

Reg*_*sor 4 python pandas

我有一个 pandas 数据框,如下所示

Dominant_Topic  word    appearance
Topic 0         aaaawww         50
Topic 0         aacn            100
Topic 0         aaren           20
Topic 0         aarongoodwin    200
Topic 1         aaronjfentress  10
Topic 1         aaronrodger     20
Topic 1         aasmiitkap      30
Topic 2         aavqbketmh      10
Topic 2         ab              10
Topic 2         abandon         1
Run Code Online (Sandbox Code Playgroud)

我想为每个分区获得密集排名,分区列是名为 的列Dominant_Topic。排名应根据每个分区中单词出现的次数降序排列。所以输出看起来像 -

Dominant_Topic  word    appearance    dense_rank
Topic 0         aaaawww         50     3
Topic 0         aacn            100    2
Topic 0         aaren           20     4
Topic 0         aarongoodwin    200    1
Topic 1         aaronjfentress  10     3
Topic 1         aaronrodger     20     2
Topic 1         aasmiitkap      30     1
Topic 2         aavqbketmh      10     1
Topic 2         ab              10     1
Topic 2         abandon         1      2
Run Code Online (Sandbox Code Playgroud)

我如何在 Pandas 中实现这一目标?

SQL 等效项看起来像这样 -

select *, dense_rank() over( partition by dominant_topic order by appearance desc)
from table
Run Code Online (Sandbox Code Playgroud)

Qua*_*ang 6

这是内置的groupby

df['dense_rank'] = (df.groupby('Dominant_Topic')['appearance']
                      .rank(method='dense', ascending=False)
                      .astype(int)
                   )
Run Code Online (Sandbox Code Playgroud)

输出:

  Dominant_Topic            word  appearance  dense_rank
0        Topic 0         aaaawww          50           3
1        Topic 0            aacn         100           2
2        Topic 0           aaren          20           4
3        Topic 0    aarongoodwin         200           1
4        Topic 1  aaronjfentress          10           3
5        Topic 1     aaronrodger          20           2
6        Topic 1      aasmiitkap          30           1
7        Topic 2      aavqbketmh          10           1
8        Topic 2              ab          10           1
9        Topic 2         abandon           1           2
Run Code Online (Sandbox Code Playgroud)