use*_*386 2 python optimization
我有一个函数,它有一个字典作为输入和一个值n.字典中的每个项目都是具有一个或多个值的集合.该函数应对字典键进行排序,它应提取并返回"n"值.此功能将经常执行,因此我正在尝试优化它.有什么建议?
def select_items(temp_dict, n):
"""Select n items from the dictionary"""
res = []
sort_keys = sorted(temp_dict.keys())
count = 0
for key in sort_keys:
for pair in temp_dict[key]:
if count < n:
res.append(pair)
count += 1
else:
return res
return res
Run Code Online (Sandbox Code Playgroud)
在这段代码中,我有一个count和"if语句"来控制所选值的数量.有没有办法通过在itertools或其他东西中使用某些函数来优化此代码?
这是我的第一次尝试(参见参考资料select_items_faster
),它的速度几乎翻了一番:
In [12]: print _11
import itertools
def select_items_original(temp_dict, n):
"""Select n items from the dictionary"""
res = []
sort_keys = sorted(temp_dict.keys())
count = 0
for key in sort_keys:
for pair in temp_dict[key]:
if count < n:
res.append(pair)
count += 1
else:
return res
return res
def select_items_faster(temp_dict, n):
"""Select n items from the dictionary"""
items = temp_dict.items()
items.sort()
return list(itertools.chain.from_iterable(val for (_, val) in itertools.islice(items, n)))
test_dict = dict((x, ["a"] * int(x / 500)) for x in range(1000))
test_n = 300
In [13]: %timeit select_items_original(test_dict, test_n)
1000 loops, best of 3: 293 us per loop
In [14]: %timeit select_items_faster(test_dict, test_n)
1000 loops, best of 3: 203 us per loop
Run Code Online (Sandbox Code Playgroud)
itertools.islice
用a 取代[:n]
并不能真正帮助你:
def select_items_faster_slice(temp_dict, n):
"""Select n items from the dictionary"""
items = temp_dict.items()
items.sort()
return list(itertools.chain.from_iterable(val for (_, val) in items[:n]))
In [16]: %timeit select_items_faster_slice(test_dict, test_n)
1000 loops, best of 3: 210 us per loop
Run Code Online (Sandbox Code Playgroud)
而且sorted
:
In [18]: %timeit select_items_faster_sorted(test_dict, test_n)
1000 loops, best of 3: 213 us per loop
In [19]: print _17
def select_items_faster_sorted(temp_dict, n):
"""Select n items from the dictionary"""
return list(itertools.chain.from_iterable(val for (_, val) in itertools.islice(sorted(temp_dict.items()), n)))
Run Code Online (Sandbox Code Playgroud)
但是,组合map
和__getitem__
更快:
In [22]: %timeit select_items_faster_map_getitem(test_dict, test_n)
10000 loops, best of 3: 90.7 us per loop
In [23]: print _20
def select_items_faster_map_getitem(temp_dict, n):
"""Select n items from the dictionary"""
keys = temp_dict.keys()
keys.sort()
return list(itertools.chain.from_iterable(map(temp_dict.__getitem__, keys[:n])))
Run Code Online (Sandbox Code Playgroud)
更换list(itertools.chain.from_iterable)
用一些神奇速度东西了不少:
In [28]: %timeit select_items_faster_map_getitem_list_extend(test_dict, test_n)
10000 loops, best of 3: 74.9 us per loop
In 29: print _27
def select_items_faster_map_getitem_list_extend(temp_dict, n):
"""Select n items from the dictionary"""
keys = temp_dict.keys()
keys.sort()
result = []
filter(result.extend, map(temp_dict.__getitem__, keys[:n]))
return result
Run Code Online (Sandbox Code Playgroud)
用itertools函数替换map和slice会挤出更快的速度:
In [31]: %timeit select_items_faster_map_getitem_list_extend_iterables(test_dict, test_n)
10000 loops, best of 3: 72.8 us per loop
In [32]: print _30
def select_items_faster_map_getitem_list_extend_iterables(temp_dict, n):
"""Select n items from the dictionary"""
keys = temp_dict.keys()
keys.sort()
result = []
filter(result.extend, itertools.imap(temp_dict.__getitem__, itertools.islice(keys, n)))
return result
Run Code Online (Sandbox Code Playgroud)
这和我认为的速度一样快,因为在CPython中Python函数调用相当慢,这最大限度地减少了内循环中Python函数调用的次数.
注意:
编辑:使用相同的方法来分析JF Sebastian的代码:
In [2]: %timeit select_items_heapq(test_dict, test_n)
1000 loops, best of 3: 572 us per loop
In [3]: print _1
from itertools import *
import heapq
def select_items_heapq(temp_dict, n):
return list(islice(chain.from_iterable(imap(temp_dict.get, heapq.nsmallest(n, temp_dict))),n))
Run Code Online (Sandbox Code Playgroud)
和TokenMacGuy的代码:
In [5]: %timeit select_items_tokenmacguy_first(test_dict, test_n)
1000 loops, best of 3: 201 us per loop
In [6]: %timeit select_items_tokenmacguy_second(test_dict, test_n)
1000 loops, best of 3: 730 us per loop
In [7]: print _4
def select_items_tokenmacguy_first(m, n):
k, v, r = m.keys(), m.values(), range(len(m))
r.sort(key=k.__getitem__)
return [v[i] for i in r[:n]]
import heapq
def select_items_tokenmacguy_second(m, n):
k, v, r = m.keys(), m.values(), range(len(m))
smallest = heapq.nsmallest(n, r, k.__getitem__)
for i, ind in enumerate(smallest):
smallest[i] = v[ind]
return smallest
Run Code Online (Sandbox Code Playgroud)