Pandas数据帧:按A分组,B取nlargest,输出C.

use*_*586 4 python dataframe pandas

根据B中的值,每个A的前两个C值是多少?

    df = pd.DataFrame({
            'A': ["first","second","second","first",
                        "second","first","third","fourth",
                        "fifth","second","fifth","first",
                        "first","second","third","fourth","fifth"],
            'B': [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
            'C': ["a", "b", "c", "d",
                     "e", "f", "g", "h",
                     "i", "j", "k", "l",
                     "m", "n", "o", "p", "q"]})
Run Code Online (Sandbox Code Playgroud)

我在尝试

    x = df.groupby(['A'])['B'].nlargest(2)

    A
    fifth   16    7
            10    4
    first   12    6
            11    5
    fourth  15    7
            7     3
    second  13    6
            9     4
    third   14    6
            6     3
Run Code Online (Sandbox Code Playgroud)

但这会丢弃C列,这就是我需要的实际值.

我想在结果中使用C,而不是原始df的行索引.我必须加入吗?我甚至只拿一个C列表......

我需要对每个A的前2个C值(基于B)采取行动.

Max*_*axU 5

IIUC:

In [42]: df.groupby(['A'])['B','C'].apply(lambda x: x.nlargest(2, columns=['B'])
Out[42]:
           B  C
A
fifth  16  7  q
       10  4  k
first  12  6  m
       11  5  l
fourth 15  7  p
       7   3  h
second 13  6  n
       9   4  j
third  14  6  o
       6   3  g
Run Code Online (Sandbox Code Playgroud)

  • 或者:`df.groupby(['A']).apply(lambda x:x.nlargest(2,'B')[['B','C']]`. (2认同)