Cla*_*dio 3 performance iterator yield generator python-3.x
更新(镜像最先进的知识水平)状态:2017-05-12
这次更新的原因是,在我提出这个问题的时候,我并不知道我发现了一些关于Python3如何在"引擎盖下"工作的东西.
以下所有内容的结论是:
如果为迭代器编写自己的Python3代码并关心执行速度,则应将其编写为生成器函数而不是迭代器类.
下面是一个简约的代码示例,演示了表示为生成器函数的相同算法(此处:自制的Pythons版本range())运行速度比表示为迭代器类的速度快得多:
def gnrtYieldRange(startWith, endAt, step=1):
while startWith <= endAt:
yield startWith
startWith += step
class iterClassRange:
def __init__(self, startWith, endAt, step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
N = 10000000
print(" Size of created list N = {} elements (ints 1 to N)".format(N))
from time import time as t
from customRange import gnrtYieldRange as cthnYieldRange
from customRange import cintYieldRange
from customRange import iterClassRange as cthnClassRange
from customRange import cdefClassRange
iterPythnRangeObj = range(1, N+1)
gnrtYieldRangeObj = gnrtYieldRange(1, N)
cthnYieldRangeObj = cthnYieldRange(1, N)
cintYieldRangeObj = cintYieldRange(1, N)
iterClassRangeObj = iterClassRange(1, N)
cthnClassRangeObj = cthnClassRange(1, N)
cdefClassRangeObj = cdefClassRange(1, N)
sEXECs = [
"liPR = list(iterPythnRangeObj)",
"lgYR = list(gnrtYieldRangeObj)",
"lcYR = list(cthnYieldRangeObj)",
"liGR = list(cintYieldRangeObj)",
"liCR = list(iterClassRangeObj)",
"lcCR = list(cthnClassRangeObj)",
"ldCR = list(cdefClassRangeObj)"
]
sCOMMENTs = [
"Python3 own range(1, N+1) used here as reference for timings ",
"self-made range generator function using yield (run as it is) ",
"self-made range (with yield) run from module created by Cython",
"Cython-optimized self-made range (using yield) run from module",
"self-made range as iterator class using __next__() and return ",
"self-made range (using __next__) from module created by Cython",
"Cython-optimized self-made range (using __next__) from module "
]
for idx, sEXEC in enumerate(sEXECs):
s=t();exec(sEXEC);e=t();print("{} takes: {:3.1f} sec.".format(sCOMMENTs[idx], e-s))
print("All created lists are equal:", all([liPR == lgYR, lgYR == lcYR, lcYR == liGR, liGR == liCR, liCR == lcCR, lcCR == ldCR]) )
print("Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'")
Run Code Online (Sandbox Code Playgroud)
The code above put into a file and runned prints to stdout:
>python3.6 -u "gnrtFunction-fasterThan-iterClass_runMe.py"
Size of created list N = 10000000 elements (ints 1 to N)
Python3 own range(1, N+1) used here as reference for timings takes: 0.2 sec.
self-made range generator function using yield (run as it is) takes: 1.1 sec.
self-made range (with yield) run from module created by Cython takes: 0.5 sec.
Cython-optimized self-made range (using yield) run from module takes: 0.3 sec.
self-made range as iterator class using __next__() and return takes: 3.9 sec.
self-made range (using __next__) from module created by Cython takes: 3.3 sec.
Cython-optimized self-made range (using __next__) from module takes: 0.2 sec.
All created lists are equal: True
Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'
>Exit code: 0
Run Code Online (Sandbox Code Playgroud)
From the timings above you can see that the generator function variant of the self-made range() iterator runs faster than the iterator class variant and when no optimization of code is involved this behavior propagates also into C-code level of C-code created by Cython.
If you are curious why in detail it is that way you can read through the provided answer(s) or play yourself a bit with the provided code yourself.
Below the missing pieces of code necessary to run the code above:
customRange.pyx - the file Cython creates the customRange module from:
def gnrtYieldRange(startWith, endAt, step=1):
while startWith <= endAt:
yield startWith
startWith += step
class iterClassRange:
def __init__(self, startWith, endAt, step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
def cintYieldRange(int startWith, int endAt, int step=1):
while startWith <= endAt:
yield startWith
startWith += step
cdef class cdefClassRange:
cdef int startWith
cdef int endAt
cdef int step
def __init__(self, int startWith, int endAt, int step=1):
self.startWith = startWith - 1
self.endAt = endAt
self.step = step
def __iter__(self):
return self
def __next__(self):
self.startWith += self.step
if self.startWith <= self.endAt:
return self.startWith
else:
raise StopIteration
Run Code Online (Sandbox Code Playgroud)
and the setup file customRange-setup.py used to create the Python customRange module:
import sys
sys.argv += ['build_ext', '--inplace']
from distutils.core import setup
from Cython.Build import cythonize
setup(
name = 'customRange',
ext_modules = cythonize("customRange.pyx"),
)
Run Code Online (Sandbox Code Playgroud)
当我提出这个问题的时候,我忙于一个非常复杂的算法,用于从一个非唯一的列表生成独特的组合,该列表可以使用生成器函数的形式yield.我的目标是使用此算法创建一个用C语言编写的Python模块,以使其运行得更快.为此,我重写了生成器函数,该函数用于yield使用__next__()和的迭代器类return.当我比较算法的两个变体的速度时,我很惊讶迭代器类比生成器函数慢两倍,我(错误地)认为它与我重写算法的方式有关(你需要知道这个,如果你想更好地理解这里的答案是什么)并因此
Originally asked how to make the iterator class version run at the same speed as the generator function and where the speed difference comes from?.
Below some more about the HISTORY of the question:
In the below provided Python script code exactly the same algorithm for creating unique combinations from a non-unique list of elements was implemented using a Python function with yield and using a class with __next__. The code is ready to run after copy/paste, so you can see it for yourself what I am speaking about.
The same phenomenon observed for pure Python code propagates into C code of a Python extension module created out of the script code by Cython, so it is not limited to Python level code because it doesn't vanish at the C code level.
The question is:
执行速度的巨大差异来自哪里?有没有什么可以让两个代码变体以相当的速度运行?与函数/ yield变量相比,类/下一个实现是否出现了问题?根据我的知识,两者都完全相同的代码......
这里的代码(调整突出显示的行中的数字会改变列表中元素的唯一性级别,组合是根据对运行时间产生巨大影响的方式生成的):
def uniqCmboYieldIter(lstItems, lenCmbo):
dctCounter = {}
lenLstItems = len(lstItems)
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
lenUniqs = len(lstUniqs)
cmboAsIdxUniqs = [None] * lenCmbo
multiplicities = [0] * lenUniqs
idxIntoCmbo, idxIntoUniqs = 0, 0
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
if idxIntoCmbo != lenCmbo:
return
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
# print("# multiplicities:", multiplicities)
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
class uniqCmboClassIter:
# ----------------------------------------------------------------------------------------------
def __iter__(self):
return self
# ----------------------------------------------------------------------------------------------
def __init__(self, lstItems, lenCmbo):
dctCounter = {}
lenLstItems = len(lstItems)
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
self.lstUniqs = sorted(dctCounter.keys())
self.lenUniqs = len(self.lstUniqs)
self.lstCntRpts = [dctCounter[item] for item in self.lstUniqs]
self.lenCmbo = lenCmbo
self.cmboAsIdxUniqs = [None] * lenCmbo
self.multiplicities = [0] * self.lenUniqs
self.idxIntoCmbo, self.idxIntoUniqs = 0, 0
while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
self.multiplicities[self.idxIntoUniqs] = count
self.idxIntoCmbo += count
self.idxIntoUniqs += 1
# print("self.multiplicities:", self.multiplicities)
# print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)
if self.idxIntoCmbo != self.lenCmbo:
return
self.stopIteration = False
self.x = None
self.y = None
return
# ----------------------------------------------------------------------------------------------
def __next__(self):
if self.stopIteration is True:
raise StopIteration
return
nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)
for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.y = self.x + 1
if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
break
else:
self.stopIteration = True
return nextCmbo
for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
self.multiplicities[self.x] -= 1
self.multiplicities[self.y] += 1
# print("# multiplicities:", multiplicities)
while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
self.y += 1
if self.y == self.lenUniqs:
break
return nextCmbo
# ============================================================================================================================================
lstSize = 48 # 48
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)uniqLevel = 12 # (7 ~60% unique) higher level => more unique items in the generated list
aList = []
from random import randint
for _ in range(lstSize):
aList.append( ( randint(1,uniqLevel), randint(1,uniqLevel) ) )
lenCmbo = 6
percUnique = 100.0 - 100.0*(lstSize-len(set(aList)))/lstSize
print("======================== lenCmbo:", lenCmbo,
" sizeOfList:", len(aList),
" noOfUniqueInList", len(set(aList)),
" percUnique", int(percUnique) )
import time
from itertools import combinations
# itertools.combinations
# ---
# def uniqCmboYieldIter(lstItems, lenCmbo):
# class uniqCmboClassIter: def __init__(self, lstItems, lenCmbo):
# ---
start_time = time.time()
print("Combos:%9i"%len(list(combinations(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list( combinations(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboYieldIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboClassIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboClassIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")
Run Code Online (Sandbox Code Playgroud)
和我的盒子上的时间:
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 32 percUnique 66
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.04635 seconds.
Combos: 1296058 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 3.25447 seconds.
Combos: 1296058 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 5.97371 seconds.
>Exit code: 0
[2017-05-02_03:23] 207474 <-Chrs,Keys-> 1277194 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 22 percUnique 45
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.05199 seconds.
Combos: 191072 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 0.47343 seconds.
Combos: 191072 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 0.89860 seconds.
>Exit code: 0
[2017-05-02_03:23] 207476 <-Chrs,Keys-> 1277202 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
======================== lenCmbo: 6 sizeOfList: 48 noOfUniqueInList 43 percUnique 89
Combos: 12271512 print(len(list( combinations(aList, lenCmbo)))): 2.17285 seconds.
Combos: 6560701 print(len(list(uniqCmboYieldIter(aList, lenCmbo)))): 16.72573 seconds.
Combos: 6560701 print(len(list(uniqCmboClassIter(aList, lenCmbo)))): 31.17714 seconds.
>Exit code: 0
Run Code Online (Sandbox Code Playgroud)
更新(状态2017-05-07):
At the time of asking the question and offering a bounty it was not known to me that there is a way to easily create C code of an extension module for an iterator object out of Python script code using Cython and that such C code can be created also from an iterator function using
yield.
Considering that the generated faster version of the C extension module is still not fast enough to compete with itertools.combinations it doesn't make much sense to dive deeply into knowing what exactly is causing the slowdown when using an iterator class compared to an iterator function and how to overcome this. It makes much more sense to find a way to speed up the faster version using Cython, especially because I am a total novice in writing Python extension modules failing to create a working code after hours and hours of intense focused work spend on tweaking existing C code of itertools.combinations with own modifications because of Segmentation Fault errors for which I was not able to grasp the reason of.
Currently I think that there is still room to speed up the by me used Cython code and no need to go the harder way of writing the C code myself.
Below Cython code that runs OK and for speed optimized Cython code which changes somehow (I can't currently see the reason for that) the way the algorithm works and produce therefore wrong results. The idea behind the Cython optimization was to use in Cython code Python/Cython arrays instead of a Python lists. Any hints how to get a faster running Python extension module out of the used algorithm in a for a novice "safe" way are welcome.
def subbags_by_loops_with_dict_counter(lstItems, int lenCmbo):
dctCounter = {}
cdef int lenLstItems = len(lstItems)
cdef int idx = 0
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
cdef int lenUniqs = len(lstUniqs)
cmboAsIdxUniqs = [None] * lenCmbo
multiplicities = [0] * lenUniqs
cdef int idxIntoCmbo
cdef int idxIntoUniqs
cdef int count
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
if idxIntoCmbo != lenCmbo:
return
cdef int x
cdef int y
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
Run Code Online (Sandbox Code Playgroud)
Below OPTIMIZED CYTHON CODE which produces wrong results:
def subbags_loops_dict_cython_optimized(lstItems, int lenCmbo):
dctCounter = {}
cdef int lenLstItems = len(lstItems)
cdef int idx = 0
for idx in range(lenLstItems):
item = lstItems[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
#:if
#:for
lstUniqs = sorted(dctCounter.keys())
lstCntRpts = [dctCounter[item] for item in lstUniqs]
cdef int lenUniqs = len(lstUniqs)
cdef array.array cmboAsIdxUniqs = array.array('i', [])
array.resize(cmboAsIdxUniqs, lenCmbo)
# cmboAsIdxUniqs = [None] * lenCmbo
cdef array.array multiplicities = array.array('i', [])
array.resize(multiplicities, lenUniqs)
# multiplicities = [0] * lenUniqs
cdef int idxIntoCmbo
cdef int maxIdxCmbo
cdef int curIdxCmbo
cdef int idxIntoUniqs
cdef int count
while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
maxIdxCmbo = idxIntoCmbo + count
curIdxCmbo = idxIntoCmbo
while curIdxCmbo < maxIdxCmbo:
cmboAsIdxUniqs[curIdxCmbo] = idxIntoUniqs
curIdxCmbo += 1
multiplicities[idxIntoUniqs] = count
idxIntoCmbo += count
idxIntoUniqs += 1
# print("multiplicities:", multiplicities)
# print("cmboAsIdxUniqs:", cmboAsIdxUniqs)
if idxIntoCmbo != lenCmbo:
return
cdef int x
cdef int y
while True:
yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)
for idxIntoCmbo in reversed(range(lenCmbo)):
x = cmboAsIdxUniqs[idxIntoCmbo]
y = x + 1
if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
break
else:
return
for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
x = cmboAsIdxUniqs[idxIntoCmbo]
cmboAsIdxUniqs[idxIntoCmbo] = y
multiplicities[x] -= 1
multiplicities[y] += 1
# print("# multiplicities:", multiplicities)
while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
y += 1
if y == lenUniqs:
break
Run Code Online (Sandbox Code Playgroud)
当我将itertools文档的一些配方重写为C扩展时,我提供了一些经验.我想我可能会有一些可以帮助你的见解.
当你编写纯Python代码时,它是速度(生成器)和功能(迭代器)之间的权衡.
的yield功能(称为发电机)是速度,并且通常它们可以在不打扰有关内部状态被写入.因此编写它们的工作量较少,并且它们很快,因为Python只管理所有"状态".
生成器更快(或至少不慢)的原因主要是因为:
除了-method 之外,它们__next__直接(通常tp_iternext)实现-slot __next__.在这种情况下,Python不必查找__next__方法 - 这实际上是在下面的示例中使其更快的原因:
from itertools import islice
def test():
while True:
yield 1
class Test(object):
def __iter__(self):
return self
def __next__(self):
return 1
%timeit list(islice(test(), 1000))
# 173 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 499 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
所以它几乎快3倍,因为生成器直接填充__next__-slot.
一个yield-函数和类有状态,但yield功能保存和载入状态的速度远远超过你可以用类和属性的访问:
def test():
i = 0
while True:
yield i
i += 1
class Test(object):
def __init__(self):
self.val = 0
def __iter__(self):
return self
def __next__(self):
current = self.val
self.val += 1
return current
%timeit list(islice(test(), 1000))
# 296 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test(), 1000))
# 1.22 ms ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
这次课程的速度已经慢了4倍(相比之下,几乎没有任何州参与了3次).这是累积效应:所以你拥有的"状态"越多,类变体就越慢.
yield对于同类的方法来说太多了.请注意,实际时间取决于操作类型.例如,如果next调用时运行的实际代码很慢(即time.sleep(1)),则生成器和类之间几乎没有差别!
如果你想要一个快速的cython迭代器类,它必须是一个cdef class.否则你不会得到真正快速的课程.原因是只有a cdef class创建一个直接实现该tp_iternext字段的扩展类型!我将使用IPythons %%cython来编译代码(所以我不必包含设置):
%%cython
def test():
while True:
yield 1
class Test(object):
def __iter__(self):
return self
def __next__(self):
return 1
cdef class Test_cdef(object):
def __iter__(self):
return self
def __next__(self):
return 1
%timeit list(islice(test(), 1000))
# 113 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 407 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test_cdef(), 1000))
# 62.8 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Run Code Online (Sandbox Code Playgroud)
时间已经表明生成器和基本类比纯Python等价物快,但它们的相对性能大致保持不变.然而,cdef class变体击败了它们,这主要是因为使用了tp_iternext插槽而不是仅仅实现了该__next__方法.(如果你不信任我,请检查Cython生成的C代码:))
然而,它只比Python生成器快2倍,这并不坏,但它并不是完全压倒性的.要获得真正惊人的加速,你需要找到一种方法来表达没有Python对象的程序(Python对象越少,加速越快).例如,如果您使用字典存储项目并且它的多样性,您仍然存储Python对象,并且任何查找都必须使用python字典方法完成 - 即使您可以通过C API函数调用它们而不必查找实际方法:
%%cython
cpdef cython_count(items):
cdef dict res = dict()
for item in items:
if item in res:
res[item] += 1
else:
res[item] = 1
return res
import random
def count(items):
res = {}
for item in items:
if item in res:
res[item] += 1
else:
res[item] = 1
return res
l = [random.randint(0, 100) for _ in range(10000)]
%timeit cython_count(l)
# 2.06 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count(l)
# 3.63 ms ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Run Code Online (Sandbox Code Playgroud)
这里有一个问题,你没有使用collections.Counter哪个有优化的C代码(至少在python-3中)用于这种操作:
from collections import Counter
%timeit Counter(l)
# 1.17 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
这里有一个快速说明:不要使用,something in some_dict.keys()因为keys()Python2中的列表类似,而ony实现O(n)包含操作,而something in some_dict通常O(1)(两个Pythons)!这将使两个版本的速度更快,尤其是在Python2上:
def count2(items):
res = {}
for item in items:
if item in res.keys(): # with "keys()"
res[item] += 1
else:
res[item] = 1
return res
# Python3
l = [random.randint(0, 100) for _ in range(10000)]
%timeit count(l)
# 3.63 ms ± 29 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count2(l)
# 5.9 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Python2
l = [random.randint(0, 10000) for _ in range(10000)]
%timeit count(l)
# 100 loops, best of 3: 4.59 ms per loop
%timeit count2(l)
# 1 loop, best of 3: 2.65 s per loop <--- WHOOPS!!!
Run Code Online (Sandbox Code Playgroud)
这表明当你使用python结构时,你只能希望使用Cython(和C扩展)加速3-4倍,但即使是使用".keys()"这样的小错误也会在性能方面花费更多不正确.
那么如果你想要它更快,你能做什么?答案相对简单:基于C类型而不是Python类型创建自己的数据结构.
这意味着你必须考虑设计:
uniqComb**?你想要整数(例子说的是这样,但我想你想要任意的Python对象).uniqComb**函数的对象可以排序?您使用sorted但您也可以使用a OrderedDict并按照外观顺序而不是按数值保持键.这些问题的答案(这些只是我立即问自己的问题,可能还有更多!)可以帮助您确定可以在内部使用的结构.例如,使用Cython,您可以连接到C++,您可以使用map包含整数键和整数值而不是字典.它是默认排序的,因此您不需要自己对它们进行手动排序,而是使用本机整数而不是Python对象.但是你失去了处理你的任意python对象的能力uniqComb,你需要知道如何在Cython中使用C++类型.它可能会非常快!
我不会走这条路,因为我假设你想支持任意可订购的python类型,我坚持Counter作为起点,但我将多重性保存为整数array.arrays而不是list.我们称之为"最少侵入性"的优化.如果你使用a list或者arrayfor lstCntRpts并且multiplicities因为它们不是瓶颈,它在性能方面实际上并不重要- 但是它更快一些并且节省了一点内存,更重要的是它显示了如何array在cython中包含同类s :
%%cython
from cpython.list cimport PyList_Size # (most) C API functions can be used with cython!
from array import array
from collections import Counter
cdef class uniqCmboClassIter:
cdef list lstUniqs
cdef Py_ssize_t lenUniqs
cdef int[:] lstCntRpts # memoryview
cdef Py_ssize_t lenCmbo
cdef list cmboAsIdxUniqs
cdef int[:] multiplicities # memoryview
cdef Py_ssize_t idxIntoCmbo
cdef Py_ssize_t idxIntoUniqs
cdef bint stopIteration
cdef Py_ssize_t x
cdef Py_ssize_t y
def __init__(self, lstItems, lenCmbo):
dctCounter = Counter(lstItems)
self.lstUniqs = sorted(dctCounter)
self.lenUniqs = PyList_Size(self.lstUniqs)
self.lstCntRpts = array('i', [dctCounter[item] for item in self.lstUniqs])
self.lenCmbo = lenCmbo
self.cmboAsIdxUniqs = [None] * lenCmbo
self.multiplicities = array('i', [0] * self.lenUniqs)
self.idxIntoCmbo, self.idxIntoUniqs = 0, 0
while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
self.multiplicities[self.idxIntoUniqs] = count
self.idxIntoCmbo += count
self.idxIntoUniqs += 1
# print("self.multiplicities:", self.multiplicities)
# print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)
if self.idxIntoCmbo != self.lenCmbo:
return
self.stopIteration = False
self.x = 0
self.y = 0
return
def __iter__(self):
return self
def __next__(self):
if self.stopIteration is True:
raise StopIteration
nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)
for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.y = self.x + 1
if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
break
else:
self.stopIteration = True
return nextCmbo
for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
self.multiplicities[self.x] -= 1
self.multiplicities[self.y] += 1
# print("# multiplicities:", multiplicities)
while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
self.y += 1
if self.y == self.lenUniqs:
break
return nextCmbo
Run Code Online (Sandbox Code Playgroud)
你实际上没有分享你的时间参数,但我尝试了一些我的:
from itertools import combinations
import random
import time
def create_values(maximum):
vals = [random.randint(0, maximum) for _ in range(48)]
print('length: ', len(vals))
print('sorted values: ', sorted(vals))
print('uniques: ', len(set(vals)))
print('uniques in percent: {:%}'.format(len(set(vals)) / len(vals)))
return vals
class Timer(object):
def __init__(self):
pass
def __enter__(self):
self._time = time.time()
def __exit__(self, *args, **kwargs):
print(time.time() - self._time)
vals = create_values(maximum=50) # and 22 and 75 and 120
n = 6
with Timer():
list(combinations(vals, n))
with Timer():
list(uniqCmboClassIter(vals, n))
with Timer():
list(uniqCmboClassIterOriginal(vals, n))
with Timer():
list(uniqCmboYieldIterOriginal(vals, n))
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)length: 48 sorted values: [0, 0, 0, 1, 2, 2, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 13, 13, 14, 14, 14, 15, 15, 15, 17, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 22, 22] uniques: 21 uniques in percent: 43.750000% 6.250450611114502 0.4217393398284912 4.250436305999756 2.7186365127563477 length: 48 sorted values: [1, 1, 2, 5, 6, 7, 7, 8, 8, 9, 11, 13, 13, 15, 16, 16, 16, 16, 17, 19, 19, 21, 21, 23, 24, 26, 27, 28, 28, 29, 31, 31, 34, 34, 36, 36, 38, 39, 39, 40, 41, 42, 44, 46, 47, 47, 49, 50] uniques: 33 uniques in percent: 68.750000% 6.2034173011779785 4.343803882598877 42.39261245727539 26.65750527381897 length: 48 sorted values: [4, 4, 7, 9, 10, 14, 14, 17, 19, 21, 23, 24, 24, 26, 34, 36, 40, 42, 43, 43, 45, 46, 46, 52, 53, 58, 59, 59, 61, 63, 66, 68, 71, 72, 72, 75, 76, 80, 82, 82, 83, 84, 86, 86, 89, 92, 97, 99] uniques: 39 uniques in percent: 81.250000% 6.859697341918945 10.437987327575684 104.12988543510437 65.25306582450867 length: 48 sorted values: [4, 7, 11, 19, 24, 29, 32, 36, 49, 49, 54, 57, 58, 60, 62, 65, 67, 70, 70, 72, 72, 79, 82, 83, 86, 89, 89, 90, 91, 94, 96, 99, 102, 111, 112, 118, 120, 120, 128, 129, 129, 134, 138, 141, 141, 144, 146, 147] uniques: 41 uniques in percent: 85.416667% 6.484673023223877 13.610010623931885 136.28764533996582 84.73834943771362
它绝对比原始方法执行得更好,实际上只需要类型声明几倍.可能有更多可以优化(禁用边界检查,使用Python C API函数调用,使用无符号整数或更小的整数,如果你知道你的多重性的"最大"和"最小",...) - 但事实itertools.combinations即使对于80%的独特物品来说,它也不比任何原始实施速度快得多,对我来说已经足够了.:-)
| 归档时间: |
|
| 查看次数: |
1080 次 |
| 最近记录: |