Python Pandas按多索引和列排序

Question

Python Pandas按多索引和列排序

rau*_*sch 8 python sorting indexing pandas

在Pandas 0.17中,我尝试按特定列排序,同时保持层次索引(A和B).B是通过串联设置数据帧时创建的运行编号.我的数据如下:

          C      D
A   B
bar one   shiny  10
    two   dull   5
    three glossy 8
foo one   dull   3
    two   shiny  9
    three matt   12

Run Code Online (Sandbox Code Playgroud)

这就是我需要的:

          C      D
A   B
bar two   dull   5
    three glossy 8
    one   shiny  10
foo one   dull   3
    three matt   12
    two   shiny  9

Run Code Online (Sandbox Code Playgroud)

下面是我正在使用的代码和结果.注意:Pandas 0.17会警告dataframe.sort将被弃用.

df.sort_values(by="C", ascending=True)
          C      D
A   B
bar two   dull   5
foo one   dull   3
bar three glossy 8
foo three matt   12
bar one   shiny  10
foo two   shiny  9

Run Code Online (Sandbox Code Playgroud)

添加.groupby会产生相同的结果:

df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)

Run Code Online (Sandbox Code Playgroud)

同样,首先切换到排序索引,然后按列分组并不富有成效:

df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)

Run Code Online (Sandbox Code Playgroud)

我不确定重新索引我需要保留第一个索引A,第二个索引B可以重新分配,但不必.如果没有简单的解决方案,我会感到惊讶; 我想我只是找不到它.任何建议表示赞赏.

编辑:在此期间我删除了第二个索引B,将第一个索引A重新分配为一列而不是一个排序多列的索引,然后重新索引它:

df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")

Run Code Online (Sandbox Code Playgroud)

还是很啰嗦.

Answer 1

chr*_*isb 7

感觉可能有更好的方式,但这里有一种方法:

In [163]: def sorter(sub_df):
     ...:     sub_df = sub_df.sort_values('C')
     ...:     sub_df.index = sub_df.index.droplevel(0)
     ...:     return sub_df

In [164]: df.groupby(level='A').apply(sorter)
Out[164]: 
                C   D
A   B                
bar two      dull   5
    three  glossy   8
    one     shiny  10
foo one      dull   3
    three    matt  12
    two     shiny   9

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	3039 次
最近记录：	9 年，3 月前