列表上的Python sort()方法vs builtin sorted()函数

Question

列表上的Python sort()方法vs builtin sorted()函数

我知道__builtin__sorted()函数适用于任何可迭代的.但有人可以解释anylist.sort()与sorted(anylist)之间的巨大(10x)性能差异吗？另外,请指出我测量的方式是否有任何问题.

"""
Example Output:
$ python list_sort_timeit.py 
Using sort method: 20.0662879944
Using sorted builin method: 259.009809017
"""

import random
import timeit

print 'Using sort method:',
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000)").repeat())
print x

print 'Using sorted builin method:',
x =  min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000)").repeat())
print x

正如标题所说,我有兴趣比较list.sort()和sorted(list).上面的代码片段展示了一些有趣的东西,python的sort函数对于已排序的数据表现得非常好.正如Anurag所指出的那样,在第一种情况下,sort方法正在对已经排序的数据进行处理,而在第二种情况下,它正在处理新的工作,一次又一次地工作.

所以我写了这个测试,是的,他们非常接近.

"""
Example Output:
$ python list_sort_timeit.py 
Using sort method: 19.0166599751
Using sorted builin method: 23.203567028
"""

import random
import timeit

print 'Using sort method:',
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000);test_list1.sort()").repeat())
print x

print 'Using sorted builin method:',
x =  min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000);test_list2.sort()").repeat())
print x

哦,我看到Alex Martelli的回复,因为我正在输入这个...(我将离开编辑,因为它可能有用).

Answer 1

Ale*_*lli 51

您在测量误差如下:您的第一个电话后test_list1.sort(),该列表对象IS分类-和Python的排序,又名timsort,是不怀好意快上已排序列表!这是使用中最常见的错误timeit- 无意中得到副作用而不考虑它们.

这是一组很好的测量,使用timeit命令行,因为它最好用:

$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
y=list(x); y.sort()'
1000 loops, best of 3: 452 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
x.sort()'
10000 loops, best of 3: 37.4 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
sorted(x)'
1000 loops, best of 3: 462 usec per loop

Run Code Online (Sandbox Code Playgroud)

正如你看到的,y.sort()和sorted(x)并驾齐驱,但x.sort()由于副作用涨幅超过幅度的优势下订单-只是因为你的测量误差,但:这个不能告诉你sortVS sorted本身!- )

@Senthil：不应复制列表中的对象，但列表本身是（例如，在y = sorted（x）之后，您具有列表的两个副本x和y）。因此，至少sorted（）调用需要一个malloc（n * sizeof（PyObject *））和n个指针从第一个列表复制到新列表。我的猜测是，复制那些n指针可以弥补Alex基准测试所显示的大约2％的时差。 (2认同)

Answer 2

Anu*_*yal 11

因为list.sort进行了排序,所以第一次排序,但下次排序时会对排序列表进行排序.

例如尝试这个,你会在timeit情况下获得相同的结果,大部分时间花在复制和排序上也会再复制一次

import time
import random
test_list1=random.sample(xrange(1000),1000)
test_list2=random.sample(xrange(1000),1000)

s=time.time()
for i in range(100):
    test_list1.sort()
print time.time()-s

s=time.time()
for i in range(100):
    test_list2=sorted(test_list2)
print time.time()-s

Run Code Online (Sandbox Code Playgroud)

Answer 3

Dan*_*den 7

好吧,.sort()列表方法对列表进行了排序,同时sorted()创建了一个新列表.因此,如果您有一个大型列表,部分性能差异将归因于复制.

不过,一个数量级的差异似乎比我预期的要大.也许list.sort()有一些sorted()无法利用的特殊优化.例如,由于list该类已经具有Py_Object*[]正确大小的内部数组,因此它可以更有效地执行交换.

编辑:Alex和Anurag是对的,数量级差异是由于您不小心在测试用例中对已经排序的列表进行排序.然而,正如亚历克斯的基准测试所显示的那样,list.sort()大约快2%sorted(),由于复制开销,这将是有意义的.

归档时间：	16 年，5 月前
查看次数：	54368 次
最近记录：	8 年，5 月前