And*_* C. 1 python optimization performance list
我有一个很大的列表,其中有 1000 万个整数(已排序)“列表”。我需要的是获得一些整数(来自“blist”)和列表中的邻居之间的最小距离。我通过找到我要查找的整数的位置,获取前后的项目并测量差异来做到这一点:
alist=[1, 4, 30, 1000, 2000] #~10 million integers
blist=[4, 30, 1000] #~8 million integers
for b in blist:
position=alist.index(b)
distance=min([b-alist[position-1],alist[position+1]-b])
Run Code Online (Sandbox Code Playgroud)
这个操作必须重复数百万次,不幸的是,它在我的机器上需要很长时间。有没有办法提高这段代码的性能?我使用 python 2.6 而 python 3 不是一个选项。
我建议使用二分搜索。使它更快,不花费额外的内存,只需要一点点改变。而不是alist.index(b),只需使用bisect_left(alist, b).
如果您也blist已排序,您还可以使用非常简单的增量搜索,b不是alist从前一个b.
Python 2.7.11 的基准测试和包含 1000 万和 800 万个整数的列表:
389700.01 seconds Andy_original (time estimated)
377100.01 seconds Andy_no_lists (time estimated)
6.30 seconds Stefan_binary_search
2.15 seconds Stefan_incremental_search
3.57 seconds Stefan_incremental_search2
1.21 seconds Jacquot_NumPy
(0.74 seconds Stefan_only_search_no_distance)
Run Code Online (Sandbox Code Playgroud)
Andy 的原件大约需要 4.5 天,所以我只使用了每 100000 个条目blist并按比例放大。二分搜索要快得多,增量搜索还要快,而且 NumPy 击败了所有这些,尽管它们都只需要几秒钟。
最后一个需要 0.74 秒的条目是我没有distance = min(...)行的增量搜索,所以它没有可比性。但它表明搜索只需要总 2.15 秒的 34% 左右。所以我无能为力,因为现在distance = min(...)大部分时间都由计算负责。
与Python 3.5.1的结果相似:
509819.56 seconds Andy_original (time estimated)
505257.32 seconds Andy_no_lists (time estimated)
8.35 seconds Stefan_binary_search
4.61 seconds Stefan_incremental_search
4.53 seconds Stefan_incremental_search2
1.39 seconds Jacquot_NumPy
(1.45 seconds Stefan_only_search_no_distance)
Run Code Online (Sandbox Code Playgroud)
包含所有版本和测试的完整代码:
def Andy_original(alist, blist):
for b in blist:
position = alist.index(b)
distance = min([b-alist[position-1], alist[position+1]-b])
def Andy_no_lists(alist, blist):
for b in blist:
position = alist.index(b)
distance = min(b-alist[position-1], alist[position+1]-b)
from bisect import bisect_left
def Stefan_binary_search(alist, blist):
for b in blist:
position = bisect_left(alist, b)
distance = min(b-alist[position-1], alist[position+1]-b)
def Stefan_incremental_search(alist, blist):
position = 0
for b in blist:
while alist[position] < b:
position += 1
distance = min(b-alist[position-1], alist[position+1]-b)
def Stefan_incremental_search2(alist, blist):
position = 0
for b in blist:
position = alist.index(b, position)
distance = min(b-alist[position-1], alist[position+1]-b)
import numpy as np
def Jacquot_NumPy(alist, blist):
a_array = np.asarray(alist)
b_array = np.asarray(blist)
a_index = np.searchsorted(a_array, b_array) # gives the indexes of the elements of b_array in a_array
a_array_left = a_array[a_index - 1]
a_array_right = a_array[a_index + 1]
distance_left = np.abs(b_array - a_array_left)
distance_right = np.abs(a_array_right - b_array)
min_distance = np.min([distance_left, distance_right], axis=0)
def Stefan_only_search_no_distance(alist, blist):
position = 0
for b in blist:
while alist[position] < b:
position += 1
from time import time
alist = list(range(10000000))
blist = [i for i in alist[1:-1] if i % 5]
blist_small = blist[::100000]
for func in Andy_original, Andy_no_lists:
t0 = time()
func(alist, blist_small)
t = time() - t0
print('%9.2f seconds %s (time estimated)' % (t * 100000, func.__name__))
for func in Stefan_binary_search, Stefan_incremental_search, Stefan_incremental_search2, Jacquot_NumPy, Stefan_only_search_no_distance:
t0 = time()
func(alist, blist)
t = time() - t0
print('%9.2f seconds %s' % (t, func.__name__))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4394 次 |
| 最近记录: |