如何正确使用带有级别和轴参数的熊猫 sort_index？

Question

如何正确使用带有级别和轴参数的熊猫 sort_index？

Alo*_*lon 1 python multi-index dataframe pandas

关于这个df：

               Amount                          type                 
Month_year 2019-06-01     2019-07-01     2019-06-01    2019-07-01   
TYPE_ID             1   2          1   2          1  2          1  2
ID                                                                  
100                20  10         40  20          1  1          2  1
200                80  60         30  10          2  2          1  1

Run Code Online (Sandbox Code Playgroud)

以下代码：

df = df.sort_index(axis=1, level=[1,2])

Run Code Online (Sandbox Code Playgroud)

产生这个：

               Amount       type     Amount  ...       type     Amount       type
Month_year 2019-06-01 2019-06-01 2019-06-01  ... 2019-07-01 2019-07-01 2019-07-01
TYPE_ID             1          1          2  ...          1          2          2
ID                                           ...                                 
100                20          1         10  ...          2         20          1
200                80          2         60  ...          1         10          1

Run Code Online (Sandbox Code Playgroud)

我真的不明白这是怎么回事。我已经阅读了文档，但没有示例，而且描述非常模糊。

谁能向我解释这种方法是如何工作的以及我是如何得到这个结果的？

Answer 1

cs9*_*s95 6

本质上，

sort_indexwithaxis=1对列标题进行排序，然后使用此排序来设置列的顺序。

而且，推论，

sort_indexwithaxis=0对索引进行排序，然后使用此排序来设置行的顺序。

这是您的输入的df样子：

上图中的前三个“行”对应于的 pandas MultiIndex 列df，如下所示：

df.columns
MultiIndex([('Amount', '2019-06-01', 1),
            ('Amount', '2019-06-01', 2),
            ('Amount', '2019-07-01', 1),
            ('Amount', '2019-07-01', 2),
            (  'type', '2019-06-01', 1),
            (  'type', '2019-06-01', 2),
            (  'type', '2019-07-01', 1),
            (  'type', '2019-07-01', 2)])

Run Code Online (Sandbox Code Playgroud)

让我们假设您的 3 级 multiIndex 列被神奇地转换为一个 DataFrame，其中每个级别都有自己的列称为cdf：

cdf
    level_0     level_1  level_2
(1)  Amount  2019-06-01        1
(2)  Amount  2019-06-01        2
(3)  Amount  2019-07-01        1
(4)  Amount  2019-07-01        2
(5)    type  2019-06-01        1
(6)    type  2019-06-01        2
(7)    type  2019-07-01        1
(8)    type  2019-07-01        2

Run Code Online (Sandbox Code Playgroud)

这里的行号对应于原始 DataFrame 中的列标识符。让我们看看当我们cdf按最后两列排序时会发生什么：

cdf.sort_values(['level_1', 'level_2'])

    level_0     level_1  level_2
(1)  Amount  2019-06-01        1
(5)    type  2019-06-01        1
(2)  Amount  2019-06-01        2
(6)    type  2019-06-01        2
(3)  Amount  2019-07-01        1
(7)    type  2019-07-01        1
(4)  Amount  2019-07-01        2
(8)    type  2019-07-01        2

Run Code Online (Sandbox Code Playgroud)

注意 sorted 的索引cdf：

(1) (5) (2) (6) (3) (7) (4) (8)

Run Code Online (Sandbox Code Playgroud)

现在让我们看看当我们将sort_index操作应用于时会发生什么df：

df.sort_index(level=[1, 2], axis=1)

Run Code Online (Sandbox Code Playgroud)

中间的椭圆表示由于终端的宽度，并非所有列都可以显示（实际上，第 (6) 和 (3) 列没有显示，但它们在那里非常多），但这不是有趣的部分. 将此处的列顺序与 sorted 的行顺序进行对比cdf，您会发现它们是相同的。

归档时间：	5 年，9 月前
查看次数：	377 次
最近记录：	5 年，9 月前