dir*_*ito 77 python caching list
在下面的代码中,我创建了两个具有相同值的列表:一个列表未排序 (s_not),另一个列表已排序 (s_yes)。这些值由 randint() 创建。我为每个列表运行一些循环并计时。
import random
import time
for x in range(1,9):
r = 10**x # do different val for the bound in randint()
m = int(r/2)
print("For rand", r)
# s_not is non sorted list
s_not = [random.randint(1,r) for i in range(10**7)]
# s_yes is sorted
s_yes = sorted(s_not)
# do some loop over the sorted list
start = time.time()
for i in s_yes:
if i > m:
_ = 1
else:
_ = 1
end = time.time()
print("yes", end-start)
# do the same to the unsorted list
start = time.time()
for i in s_not:
if i > m:
_ = 1
else:
_ = 1
end = time.time()
print("not", end-start)
print()
Run Code Online (Sandbox Code Playgroud)
带输出:
For rand 10
yes 1.0437555313110352
not 1.1074268817901611
For rand 100
yes 1.0802974700927734
not 1.1524150371551514
For rand 1000
yes 2.5082249641418457
not 1.129960298538208
For rand 10000
yes 3.145440101623535
not 1.1366300582885742
For rand 100000
yes 3.313387393951416
not 1.1393756866455078
For rand 1000000
yes 3.3180911540985107
not 1.1336982250213623
For rand 10000000
yes 3.3231537342071533
not 1.13503098487854
For rand 100000000
yes 3.311596393585205
not 1.1345293521881104
Run Code Online (Sandbox Code Playgroud)
因此,当增加 randint() 中的界限时,排序列表上的循环会变慢。为什么?
Tim*_*ers 92
缓存未命中。当Nint 对象被连续分配时,为保存它们而保留的内存往往位于连续的块中。因此,按分配顺序爬行列表往往也会访问按顺序、连续、递增顺序保存 int 值的内存。
将其打乱,并且在列表上爬行时的访问模式也是随机的。缓存未命中的情况比比皆是,只要有足够多的不同 int 对象,而它们并不全部适合缓存。
在r==1、 和 处r==2,CPython 碰巧将如此小的 int 视为单例,因此,例如,尽管列表中有 1000 万个元素,但r==2它仅包含(最多)100 个不同的 int 对象。这些数据的所有数据同时放入缓存中。
但除此之外,您可能会得到越来越多、越来越不同的 int 对象。当访问模式是随机的时,硬件缓存变得越来越无用。
说明:
>>> from random import randint, seed
>>> seed(987987987)
>>> for x in range(1, 9):
... r = 10 ** x
... js = [randint(1, r) for _ in range(10_000_000)]
... unique = set(map(id, js))
... print(f"{r:12,} {len(unique):12,}")
...
10 10
100 100
1,000 7,440,909
10,000 9,744,400
100,000 9,974,838
1,000,000 9,997,739
10,000,000 9,999,908
100,000,000 9,999,998
Run Code Online (Sandbox Code Playgroud)
don*_*ode 35
正如其他人所说,缓存未命中。不是价值观/排序。相同的排序值,但使用新顺序创建的对象,速度又快了(实际上甚至比情况快一点not):
s_new = [--x for x in s_yes]
Run Code Online (Sandbox Code Playgroud)
只选择一种尺寸:
For rand 1000000
yes 3.6270992755889893
not 1.198620080947876
new 1.02010178565979
Run Code Online (Sandbox Code Playgroud)
查看从一个元素到下一个元素(仅 10 6 个元素)的地址差异表明,特别是对于,元素在内存中很好地按顺序排列(99.2% 的时间下一个元素晚于 32 个字节),而 for它们完全是s_new按顺序排列的。s_yes不是(32 字节后只有 0.01%):
s_yes:
741022 different address differences occurred. Top 5:
Address difference 32 occurred 102 times.
Address difference 0 occurred 90 times.
Address difference 64 occurred 37 times.
Address difference 96 occurred 17 times.
Address difference 128 occurred 9 times.
s_not:
1048 different address differences occurred. Top 5:
Address difference 32 occurred 906649 times.
Address difference 96 occurred 8931 times.
Address difference 64 occurred 1845 times.
Address difference -32 occurred 1816 times.
Address difference -64 occurred 1812 times.
s_new:
19 different address differences occurred. Top 5:
Address difference 32 occurred 991911 times.
Address difference 96 occurred 7825 times.
Address difference -524192 occurred 117 times.
Address difference 0 occurred 90 times.
Address difference 64 occurred 37 times.
Run Code Online (Sandbox Code Playgroud)
代码:
from collections import Counter
for s in 's_yes', 's_not', 's_new':
print(s + ':')
ids = list(map(id, eval(s)))
ctr = Counter(j - i for i, j in zip(ids, ids[1:]))
print(' ', len(ctr), 'different address differences occurred. Top 5:')
for delta, count in ctr.most_common(5):
print(f' Address difference {delta} occurred {count} times.')
print()
Run Code Online (Sandbox Code Playgroud)
答案可能是数据的局部性。超过一定大小限制的整数是动态分配的。创建列表时,整数对象是从(大部分)附近的内存分配的。因此,当您循环遍历列表时,事物往往位于缓存中,并且硬件预取器可以将它们放在那里。
在排序的情况下,对象会被打乱,从而导致更多的缓存未命中。