当有平局时,大熊猫如何决定排序？

Question

当有平局时,大熊猫如何决定排序？

熊猫0.12.0

在下面的DataFrame中,为什么例如它混杂了索引？看看4,索引从1,15,6,7开始.大熊猫用什么来决定如何订购,我会怀疑索引在相同的值上保持顺序.

mydf=pd.DataFrame(np.random.randint(1, 6, 20),columns=["stars"])
mydf.sort(['stars'], ascending=False)


     stars
19   5
14   5
1    4
15   4
6    4
7    4
4    3
12   3
18   3
8    2
2    2
9    2
10   2
11   2
13   2
16   2
5    1
3    1
17   1
0    1

Run Code Online (Sandbox Code Playgroud)

Answer 1

Rom*_*kar 7

实际上,如果你查看pandas DataFrame的源代码,你会发现sort()只是一个带有不同参数的sort_index()的包装器,而且正如@Jeff在这个问题中所说的那样,sort_index()是首选方法使用.

该sort_index()的使用方法numpy.argsort()使用默认kind=quicksort,如果你通过一个仅列排序.和快速排序()是不是稳定的,这就是为什么你看起来指数洗牌.

但是你可以传递kind参数sort_index()(一'mergesort','quicksort','heapsort'),所以你可以使用稳定的排序('mergesort')为你的任务:

>>> mydf.sort_index(by=['stars'], ascending=False, kind='mergesort')
    stars
17      5
11      5
6       5
1       5
19      4
18      4
15      4
14      4
7       4
5       4
2       4
10      3
8       3
4       3
16      2
12      2
9       2
3       2
13      1
0       1

Run Code Online (Sandbox Code Playgroud)

sort_index()也使用mergesort(或计数排序)如果参数中有一列更多by,这很有趣,例如,你可以这样做:

>>> mydf.sort_index(by=['stars', 'stars'], ascending=False)
    stars
1       5
6       5
11      5
17      5
2       4
5       4
7       4
14      4
15      4
18      4
19      4
4       3
8       3
10      3
3       2
9       2
12      2
16      2
0       1
13      1

Run Code Online (Sandbox Code Playgroud)

现在排序是稳定的,但索引按升序排序

Answer 2

foo*_*cue 5

熊猫正在使用numpy的quicksort。Quicksort涉及交换项目的位置。一旦它们按请求的顺序停止（在这种情况下，它不涉及检查索引，因为您没有要求检查该列）。Quicksort比天真的排序算法（例如冒泡排序）要高效得多，后者可能就是您要记住的，它将使各个数字更接近其原始顺序，但这样做需要更多步骤。

归档时间：	12 年，4 月前
查看次数：	2099 次
最近记录：	7 年，10 月前