geo*_*opy 4 python performance list set
我有两个列表,都包含表示为字符串的整数。large_list
很大,最多可达数十个数千个元素。small_list
更小,最多可达数百个元素。我正在用较小的过滤掉较大的。我很想知道哪个更有效率:
(A)
for element in small_list:
if element in large_list:
(do something)
Run Code Online (Sandbox Code Playgroud)
(二)
for element in large_list:
if element in small_list:
(do something)
Run Code Online (Sandbox Code Playgroud)
对于选项 (a),迭代计数受 的长度限制small_list
,但对于每一个,都必须在完整的 上执行搜索large_list
。对于选项 (b),迭代会遍历整个过程large_list
,但每次搜索都会在 上执行small_list
。
我知道集合是散列的,所以这里最好的选择应该是采用large_set = set(large_list)
,迭代small_list
,并在 上进行每次搜索large_set
。但如果我们从方程中去掉集合,是选项(a)更有效,还是选项(b)更有效?
你可以推测,你可以计算
import timeit
small_list = list(range(10,20))
large_list = list(range(15,115))
def small_in_large():
c = 0
for element in small_list:
if element in large_list:
c += 1
return c
def large_in_small():
c = 0
for element in large_list:
if element in small_list:
c += 1
return c
t1 = timeit.timeit(small_in_large, globals=globals(), number=10000)
t2 = timeit.timeit(large_in_small, globals=globals(), number=10000)
print("small_in_large() {:.3f} seconds".format(t1))
print("large_in_small() {:.3f} seconds".format(t2))
Run Code Online (Sandbox Code Playgroud)
在我的机器上:
small_in_large() 0.164 seconds
large_in_small() 0.730 seconds
Run Code Online (Sandbox Code Playgroud)