演示集合比数组更快？

Question

演示集合比数组更快？

Kat*_*tsu 1 python arrays set set-intersection

目标是找到两个数组的交集。我知道你应该使用集合来获得更快的运行时间。然而，在我的笔记本中，我惊讶地发现使用 set 方法我的运行时间并没有更快。（89 毫秒（阵列）与 220 毫秒（组））。我的代码有问题还是我误解了？

A = range(50,1000000)
B = range(-1000000,60)

%%time
# O(m*n)?
def intersection(A, B):
  array = []
  for i in A:
    if i in B:
      array.append(i)
  return array

intersection(A,B)

%%time
# O(1)?
def intersection(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)

intersection(A,B)

Run Code Online (Sandbox Code Playgroud)

Answer 1

And*_*kin 5

您不是在比较数组和集合。

您正在将范围与集合进行比较。

检查范围内的成员资格几乎是即时的（两次比较：一次与下限，一次与上限）。

在这种情况下，检查数组中的成员资格将需要数百万次迭代，这比集合的成员资格检查要长得多。

如果您先将ranges 转换为lists，您会看到巨大的差异：

import time

A = range(50,100000)
B = range(-100000,60)

def intersectionArrays(A, B):
  array = []
  arr_a = list(A)
  arr_b = list(B)
  for i in arr_a:
    if i in arr_b:
      array.append(i)
  return array


def intersectionSets(A, B):
  set_a = set(A)
  set_b = set(B)
  return [i for i in set_a if i in set_b]
  # return set_a.intersection(set_b)



startArr = time.process_time()
intersectionArrays(A,B)
endArr = time.process_time()

startSet = time.process_time()
intersectionSets(A,B)
endSet = time.process_time()

print(endArr - startArr)
print(endSet - startSet)

Run Code Online (Sandbox Code Playgroud)

它打印出类似的东西

Arrays: 39.871636528
Sets:   0.008161553000000765

Run Code Online (Sandbox Code Playgroud)

尝试它是没有意义的1000000——这需要几个小时。

此外，为了测量模糊现实的东西，您应该打乱数字，否则分支预测可能会扭曲结果。

归档时间：	3 年，1 月前
查看次数：	42 次
最近记录：	3 年，1 月前