Python 中集合、列表和元组的速度测试给出了令人惊讶的结果

Question

Python 中集合、列表和元组的速度测试给出了令人惊讶的结果

jiz*_*AMA 0 python list set python-3.x

set我尝试测试和list之间的速度tuple得到了令人惊讶的结果。

\n\n

在此之前，我知道这比基于此答案set更快list。

\n\n

这是我的测试代码：

\n\n

import timeit,time\nfrom sys import getsizeof as Size\n\nList_Test = [range(1000)]\nprint("The Size of List is : {}".format(Size(List_Test)))\nSet_Test = set(range(1000))\nprint("The Size of Set is : {}".format(Size(Set_Test)))\nTuple_Test = tuple(range(1000))\nprint("The Size of Tuple is : {}".format(Size(Tuple_Test)))\n\n\nprint("\\nNow is to test speed\\n")\ntime.sleep(3)\n\ndef Create_List():\n    List = [i for i in range(1000)]\n\ndef Test_List():\n    for i in List_Test:\n        if i == 6:\n            break\n\ndef Create_Set():\n    Set = set(i for i in range(1000))\n\ndef Test_Set():\n    for i in Set_Test:\n        if i == 6:\n            break\n\ndef Create_Tuple():\n    Tuple = tuple(i for i in range(1000))\n\ndef Test_Tuple():\n    for i in Tuple_Test:\n        if i == 6:\n            break\n\nt = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)\nprint("The Time of Create_List : {}".format(sum(t)/len(t)))\nt = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)\nprint("The Time of Create_Tuple : {}".format(sum(t)/len(t)))\nt = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)\nprint("The Time of Create_Set : {}".format(sum(t)/len(t)))\n\nprint("\\n")\n\nt = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)\nprint("The Time of Test_List : {}".format(sum(t)/len(t)))\nt = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)\nprint("The Time of Test_Tuple : {}".format(sum(t)/len(t)))\nt = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)\nprint("The Time of Test_Set : {}".format(sum(t)/len(t)))\n\nprint("\\nThe end")\n

Run Code Online (Sandbox Code Playgroud)\n\n

最后，我发现：

\n\n

Size: Set > Tuple > List\nSpeed: List > Tuple > Set\n

Run Code Online (Sandbox Code Playgroud)\n\n

我认为我的测试代码是错误的。它出什么问题了？

\n\n

编辑：

\n\n

我更改了测试代码：

\n\n

List_Test = list(range(500000))\n......\n\ndef Test_List():\n    randomNumber = random.randint(0,500000)\n    for i in List_Test:\n        if i == randomNumber:\n            break\n\n# Other test code is the same\n

Run Code Online (Sandbox Code Playgroud)\n\n

测试的结果总是list \xe2\x89\x88 tuple > set。

\n\n

当将数字500000（x20）更改为10000000时，

\n\n

有时是这样list \xe2\x89\x88 tuple \xe2\x89\x88 set，但经常是这样list \xe2\x89\x88 tuple > set。

\n\n

我是否可以推断，只有当它们都具有相同的长度（并且长度的数量很大）时，我们才能使用set（尽管它的大小远大于tuple和list）？

\n

Answer 1

ggo*_*len 5

这个测试套件有很多问题。

[range(1000)] isn't equivalent to the other two declarations. This makes a list with one element, and that single element points to the range. getsizeof is not recursive, so it only gives the size of the outer object. This can be illustrated by creating lists of a single element of different range sizes and noticing that the getsizeof call always gives the same result.

If you use list(range(1000)), you'll get a reasonable result that's about the same size as a tuple. I'm not sure what information is gained in making these lists 1000, though--why not compare sizes of the three empty elements? Even here, this is basically an implementation-dependent test that doesn't really have much bearing on how you might write a Python program, as far as I can tell. Sure, set is a bit larger as you'd expect (hash-based structures generally have more overhead than lists) and that varies from version to version.

至于“创建”测试，请考虑set(i for i in range(1000)). 此方法绝不会测试创建 a 所需的时间，set因为大部分时间都花在创建和运行作为参数传递的生成器上。与上一个测试一样，即使测试是公平的（即您使用了初始化list()器而不是列表理解），我也不确定这证明了什么。与“大小”测试一样，您可以仅使用空列表调用初始化程序，但这一切表明创建时间实际上是相同的：存在一些特定于实现的函数调用和对象分配开销。

最后，查找操作的速度测试：

for i in Set_Test:
    if i == 6:
        break

Run Code Online (Sandbox Code Playgroud)

这硬编码了最好的情况。列表和元组查找执行线性搜索，因此始终在 7 次比较中找到目标。这与集合查找几乎相同，时间复杂度为 O(1)，但需要一些复杂的哈希操作，从而增加开销。此测试应使用随机索引，其中分布意味着列表和元组将定期遇到最坏情况。该集合应该优于正确完成的列表和元组。

此外，像“集合比列表更快”这样的说法几乎没有任何意义——数据结构不能像这样进行比较，而像“更快”这样的词表示的是高度可变和大小写的特定运行时条件-具体的。比较数据结构需要比较其操作的时间复杂度，这有助于描述它们对某些目的的适用性。

总结一下：即使编写正确，前两个测试也不值得执行，最后一个测试将显示集合查找是 O(1)，而列表/元组查找是 O(n)（如果使用随机目标项进行公平）。

归档时间：	5 年，7 月前
查看次数：	2296 次
最近记录：	5 年，7 月前