生成器函数(yield)比迭代器类(next)快得多

Question

生成器函数(yield)比迭代器类(next)快得多

Cla*_*dio 3 performance iterator yield generator python-3.x

更新(镜像最先进的知识水平)状态:2017-05-12

这次更新的原因是,在我提出这个问题的时候,我并不知道我发现了一些关于Python3如何在"引擎盖下"工作的东西.

以下所有内容的结论是:

如果为迭代器编写自己的Python3代码并关心执行速度,则应将其编写为生成器函数而不是迭代器类.

下面是一个简约的代码示例,演示了表示为生成器函数的相同算法(此处:自制的Pythons版本range())运行速度比表示为迭代器类的速度快得多:

def   gnrtYieldRange(startWith, endAt, step=1): 
    while startWith <= endAt: 
        yield startWith
        startWith += step
class iterClassRange:
    def __init__(self, startWith, endAt, step=1): 
        self.startWith = startWith - 1
        self.endAt     = endAt
        self.step      = step
    def __iter__(self): 
        return self
    def __next__(self): 
        self.startWith += self.step
        if self.startWith <= self.endAt:
            return self.startWith
        else:
            raise StopIteration

N = 10000000
print("    Size of created list N = {} elements (ints 1 to N)".format(N))

from time import time as t
from customRange import gnrtYieldRange as cthnYieldRange
from customRange import cintYieldRange
from customRange import iterClassRange as cthnClassRange
from customRange import cdefClassRange

iterPythnRangeObj =          range(1, N+1)
gnrtYieldRangeObj = gnrtYieldRange(1, N)
cthnYieldRangeObj = cthnYieldRange(1, N)
cintYieldRangeObj = cintYieldRange(1, N)
iterClassRangeObj = iterClassRange(1, N)
cthnClassRangeObj = cthnClassRange(1, N)
cdefClassRangeObj = cdefClassRange(1, N)

sEXECs = [ 
    "liPR = list(iterPythnRangeObj)",
    "lgYR = list(gnrtYieldRangeObj)",
    "lcYR = list(cthnYieldRangeObj)",
    "liGR = list(cintYieldRangeObj)",
    "liCR = list(iterClassRangeObj)",
    "lcCR = list(cthnClassRangeObj)",
    "ldCR = list(cdefClassRangeObj)"
 ]

sCOMMENTs = [ 
    "Python3 own range(1, N+1) used here as reference for timings  ",
    "self-made range generator function using yield (run as it is) ",
    "self-made range (with yield) run from module created by Cython",
    "Cython-optimized self-made range (using yield) run from module",
    "self-made range as iterator class using __next__() and return ",
    "self-made range (using __next__) from module created by Cython",
    "Cython-optimized self-made range (using __next__) from module "
 ]

for idx, sEXEC in enumerate(sEXECs): 
    s=t();exec(sEXEC);e=t();print("{} takes: {:3.1f} sec.".format(sCOMMENTs[idx], e-s))
print("All created lists are equal:", all([liPR == lgYR, lgYR == lcYR, lcYR == liGR, liGR == liCR, liCR == lcCR, lcCR == ldCR]) )
print("Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'")

Run Code Online (Sandbox Code Playgroud)

The code above put into a file and runned prints to stdout:

>python3.6 -u "gnrtFunction-fasterThan-iterClass_runMe.py"
    Size of created list N = 10000000 elements (ints 1 to N)
Python3 own range(1, N+1) used here as reference for timings   takes: 0.2 sec.
self-made range generator function using yield (run as it is)  takes: 1.1 sec.
self-made range (with yield) run from module created by Cython takes: 0.5 sec.
Cython-optimized self-made range (using yield) run from module takes: 0.3 sec.
self-made range as iterator class using __next__() and return  takes: 3.9 sec.
self-made range (using __next__) from module created by Cython takes: 3.3 sec.
Cython-optimized self-made range (using __next__) from module  takes: 0.2 sec.
All created lists are equal: True
Run on Linux Mint 18.1, used Cython.__version__ == '0.25.2'
>Exit code: 0

Run Code Online (Sandbox Code Playgroud)

From the timings above you can see that the generator function variant of the self-made range() iterator runs faster than the iterator class variant and when no optimization of code is involved this behavior propagates also into C-code level of C-code created by Cython.

If you are curious why in detail it is that way you can read through the provided answer(s) or play yourself a bit with the provided code yourself.

Below the missing pieces of code necessary to run the code above:

customRange.pyx - the file Cython creates the customRange module from:

def gnrtYieldRange(startWith, endAt, step=1): 
    while startWith <= endAt: 
        yield startWith
        startWith += step

class iterClassRange:
    def __init__(self, startWith, endAt, step=1): 
        self.startWith = startWith - 1
        self.endAt     = endAt
        self.step      = step
    def __iter__(self): 
        return self
    def __next__(self): 
        self.startWith += self.step
        if self.startWith <= self.endAt:
            return self.startWith
        else:
            raise StopIteration

def cintYieldRange(int startWith, int endAt, int step=1): 
    while startWith <= endAt: 
        yield startWith
        startWith += step

cdef class cdefClassRange:
    cdef int startWith
    cdef int endAt
    cdef int step

    def __init__(self, int startWith, int endAt, int step=1): 
        self.startWith = startWith - 1
        self.endAt     = endAt
        self.step      = step
    def __iter__(self): 
        return self
    def __next__(self): 
        self.startWith += self.step
        if self.startWith <= self.endAt:
            return self.startWith
        else:
            raise StopIteration

Run Code Online (Sandbox Code Playgroud)

and the setup file customRange-setup.py used to create the Python customRange module:

import sys
sys.argv += ['build_ext',  '--inplace']

from distutils.core import setup
from Cython.Build   import cythonize

setup(
  name = 'customRange',
  ext_modules = cythonize("customRange.pyx"),
)

Run Code Online (Sandbox Code Playgroud)

Now some further information making it easier to understand the provided answer(s):

当我提出这个问题的时候,我忙于一个非常复杂的算法,用于从一个非唯一的列表生成独特的组合,该列表可以使用生成器函数的形式yield.我的目标是使用此算法创建一个用C语言编写的Python模块,以使其运行得更快.为此,我重写了生成器函数,该函数用于yield使用__next__()和的迭代器类return.当我比较算法的两个变体的速度时,我很惊讶迭代器类比生成器函数慢两倍,我(错误地)认为它与我重写算法的方式有关(你需要知道这个,如果你想更好地理解这里的答案是什么)并因此

Originally asked how to make the iterator class version run at the same speed as the generator function and where the speed difference comes from?.

Below some more about the HISTORY of the question:

In the below provided Python script code exactly the same algorithm for creating unique combinations from a non-unique list of elements was implemented using a Python function with yield and using a class with __next__. The code is ready to run after copy/paste, so you can see it for yourself what I am speaking about.

The same phenomenon observed for pure Python code propagates into C code of a Python extension module created out of the script code by Cython, so it is not limited to Python level code because it doesn't vanish at the C code level.

The question is:

执行速度的巨大差异来自哪里？有没有什么可以让两个代码变体以相当的速度运行？与函数/ yield变量相比,类/下一个实现是否出现了问题？根据我的知识,两者都完全相同的代码......

这里的代码(调整突出显示的行中的数字会改变列表中元素的唯一性级别,组合是根据对运行时间产生巨大影响的方式生成的):

def uniqCmboYieldIter(lstItems, lenCmbo):
    dctCounter = {}
    lenLstItems = len(lstItems)
    for idx in range(lenLstItems):
        item = lstItems[idx]
        if item in dctCounter.keys(): 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
        #:if
    #:for     
    lstUniqs   = sorted(dctCounter.keys())
    lstCntRpts = [dctCounter[item] for item in lstUniqs]
    lenUniqs   = len(lstUniqs)
    cmboAsIdxUniqs = [None] * lenCmbo
    multiplicities = [0] * lenUniqs
    idxIntoCmbo, idxIntoUniqs = 0, 0

    while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
        count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
        cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
        multiplicities[idxIntoUniqs] = count
        idxIntoCmbo  += count
        idxIntoUniqs += 1

    if idxIntoCmbo != lenCmbo:
        return

    while True:
        yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)

        for idxIntoCmbo in reversed(range(lenCmbo)):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            y = x + 1

            if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
                break
        else:
            return

        for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            cmboAsIdxUniqs[idxIntoCmbo] = y
            multiplicities[x] -= 1
            multiplicities[y] += 1
            # print("# multiplicities:", multiplicities)


            while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
                y += 1

            if y == lenUniqs:
                break


class uniqCmboClassIter:
    # ----------------------------------------------------------------------------------------------
    def __iter__(self):
       return self

    # ----------------------------------------------------------------------------------------------
    def __init__(self, lstItems, lenCmbo):
        dctCounter = {}
        lenLstItems = len(lstItems)
        for idx in range(lenLstItems):
            item = lstItems[idx]
            if item in dctCounter.keys(): 
                dctCounter[item] += 1
            else: 
                dctCounter[item]  = 1
            #:if
        #:for     

        self.lstUniqs   = sorted(dctCounter.keys())
        self.lenUniqs   = len(self.lstUniqs)
        self.lstCntRpts = [dctCounter[item] for item in self.lstUniqs]

        self.lenCmbo        = lenCmbo
        self.cmboAsIdxUniqs = [None] * lenCmbo
        self.multiplicities = [0] * self.lenUniqs
        self.idxIntoCmbo, self.idxIntoUniqs = 0, 0

        while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
            count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
            self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
            self.multiplicities[self.idxIntoUniqs] = count
            self.idxIntoCmbo  += count
            self.idxIntoUniqs += 1
            # print("self.multiplicities:", self.multiplicities)
            # print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)

        if self.idxIntoCmbo != self.lenCmbo:
            return

        self.stopIteration = False
        self.x = None
        self.y = None

        return

    # ----------------------------------------------------------------------------------------------
    def __next__(self):

        if self.stopIteration is True:
            raise StopIteration
            return

        nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)

        for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
            self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
            self.y = self.x + 1

            if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
                break
        else:
            self.stopIteration = True
            return nextCmbo

        for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
            self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
            self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
            self.multiplicities[self.x] -= 1
            self.multiplicities[self.y] += 1
            # print("# multiplicities:", multiplicities)


            while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
                self.y += 1

            if self.y == self.lenUniqs:
                break

        return nextCmbo

# ============================================================================================================================================
lstSize   = 48 # 48

Run Code Online (Sandbox Code Playgroud)

uniqLevel =  12 # (7 ~60% unique) higher level => more unique items in the generated list

Run Code Online (Sandbox Code Playgroud)

aList = []
from random import randint
for _ in range(lstSize):
    aList.append( ( randint(1,uniqLevel), randint(1,uniqLevel) ) )
lenCmbo = 6
percUnique = 100.0 - 100.0*(lstSize-len(set(aList)))/lstSize
print("========================  lenCmbo:", lenCmbo, 
      "   sizeOfList:", len(aList), 
      "   noOfUniqueInList", len(set(aList)), 
      "   percUnique",  int(percUnique) ) 

import time
from itertools import combinations
# itertools.combinations
# ---
# def   uniqCmboYieldIter(lstItems, lenCmbo):
# class uniqCmboClassIter: def __init__(self, lstItems, lenCmbo):
# ---
start_time = time.time()
print("Combos:%9i"%len(list(combinations(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(     combinations(aList, lenCmbo)))):",  "{:9.5f}".format(duration), "seconds.")

start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboYieldIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):",  "{:9.5f}".format(duration), "seconds.")

start_time = time.time()
print("Combos:%9i"%len(list(uniqCmboClassIter(aList, lenCmbo))), " ", end='')
duration = time.time() - start_time
print("print(len(list(uniqCmboClassIter(aList, lenCmbo)))):", "{:9.5f}".format(duration), "seconds.")

Run Code Online (Sandbox Code Playgroud)

和我的盒子上的时间:

>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
========================  lenCmbo: 6    sizeOfList: 48    noOfUniqueInList 32    percUnique 66
Combos: 12271512  print(len(list(     combinations(aList, lenCmbo)))):   2.04635 seconds.
Combos:  1296058  print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):   3.25447 seconds.
Combos:  1296058  print(len(list(uniqCmboClassIter(aList, lenCmbo)))):   5.97371 seconds.
>Exit code: 0
  [2017-05-02_03:23]  207474 <-Chrs,Keys-> 1277194 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
========================  lenCmbo: 6    sizeOfList: 48    noOfUniqueInList 22    percUnique 45
Combos: 12271512  print(len(list(     combinations(aList, lenCmbo)))):   2.05199 seconds.
Combos:   191072  print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):   0.47343 seconds.
Combos:   191072  print(len(list(uniqCmboClassIter(aList, lenCmbo)))):   0.89860 seconds.
>Exit code: 0
  [2017-05-02_03:23]  207476 <-Chrs,Keys-> 1277202 OnSave(): '/home/claudio/CgMint18/_Cg.DIR/ClaudioOnline/at-stackoverflow/bySubject/uniqueCombinations/nonRecursiveUniqueCombos_Cg.py'
>python3.6 -u "nonRecursiveUniqueCombos_Cg.py"
========================  lenCmbo: 6    sizeOfList: 48    noOfUniqueInList 43    percUnique 89
Combos: 12271512  print(len(list(     combinations(aList, lenCmbo)))):   2.17285 seconds.
Combos:  6560701  print(len(list(uniqCmboYieldIter(aList, lenCmbo)))):  16.72573 seconds.
Combos:  6560701  print(len(list(uniqCmboClassIter(aList, lenCmbo)))):  31.17714 seconds.
>Exit code: 0

Run Code Online (Sandbox Code Playgroud)

更新(状态2017-05-07):

At the time of asking the question and offering a bounty it was not known to me that there is a way to easily create C code of an extension module for an iterator object out of Python script code using Cython and that such C code can be created also from an iterator function using yield.

Considering that the generated faster version of the C extension module is still not fast enough to compete with itertools.combinations it doesn't make much sense to dive deeply into knowing what exactly is causing the slowdown when using an iterator class compared to an iterator function and how to overcome this. It makes much more sense to find a way to speed up the faster version using Cython, especially because I am a total novice in writing Python extension modules failing to create a working code after hours and hours of intense focused work spend on tweaking existing C code of itertools.combinations with own modifications because of Segmentation Fault errors for which I was not able to grasp the reason of.

Currently I think that there is still room to speed up the by me used Cython code and no need to go the harder way of writing the C code myself.

Below Cython code that runs OK and for speed optimized Cython code which changes somehow (I can't currently see the reason for that) the way the algorithm works and produce therefore wrong results. The idea behind the Cython optimization was to use in Cython code Python/Cython arrays instead of a Python lists. Any hints how to get a faster running Python extension module out of the used algorithm in a for a novice "safe" way are welcome.

def subbags_by_loops_with_dict_counter(lstItems, int lenCmbo):

    dctCounter = {}
    cdef int lenLstItems = len(lstItems)
    cdef int idx = 0
    for idx in range(lenLstItems):
        item = lstItems[idx]
        if item in dctCounter.keys(): 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
        #:if
    #:for     
    lstUniqs   = sorted(dctCounter.keys())
    lstCntRpts = [dctCounter[item] for item in lstUniqs]

    cdef int lenUniqs   = len(lstUniqs)

    cmboAsIdxUniqs = [None] * lenCmbo
    multiplicities = [0] * lenUniqs
    cdef int idxIntoCmbo
    cdef int idxIntoUniqs
    cdef int count        
    while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
        count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
        cmboAsIdxUniqs[idxIntoCmbo : idxIntoCmbo + count] = [idxIntoUniqs] * count
        multiplicities[idxIntoUniqs] = count
        idxIntoCmbo  += count
        idxIntoUniqs += 1

    if idxIntoCmbo != lenCmbo:
        return

    cdef int x
    cdef int y
    while True:
        yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)

        for idxIntoCmbo in reversed(range(lenCmbo)):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            y = x + 1

            if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
                break
        else:
            return

        for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            cmboAsIdxUniqs[idxIntoCmbo] = y
            multiplicities[x] -= 1
            multiplicities[y] += 1

            while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
                y += 1

            if y == lenUniqs:
                break

Run Code Online (Sandbox Code Playgroud)

Below OPTIMIZED CYTHON CODE which produces wrong results:

def subbags_loops_dict_cython_optimized(lstItems, int lenCmbo):

    dctCounter = {}
    cdef int lenLstItems = len(lstItems)
    cdef int idx = 0
    for idx in range(lenLstItems):
        item = lstItems[idx]
        if item in dctCounter.keys(): 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
        #:if
    #:for     
    lstUniqs   = sorted(dctCounter.keys())
    lstCntRpts = [dctCounter[item] for item in lstUniqs]

    cdef int lenUniqs   = len(lstUniqs)
    cdef array.array cmboAsIdxUniqs = array.array('i', [])
    array.resize(cmboAsIdxUniqs, lenCmbo)
    # cmboAsIdxUniqs = [None] * lenCmbo 
    cdef array.array multiplicities = array.array('i', [])
    array.resize(multiplicities, lenUniqs)
    # multiplicities = [0] * lenUniqs
    cdef int idxIntoCmbo
    cdef int maxIdxCmbo
    cdef int curIdxCmbo
    cdef int idxIntoUniqs
    cdef int count        

    while idxIntoCmbo != lenCmbo and idxIntoUniqs != lenUniqs:
        count = min(lstCntRpts[idxIntoUniqs], lenCmbo-idxIntoCmbo)
        maxIdxCmbo = idxIntoCmbo + count
        curIdxCmbo = idxIntoCmbo
        while curIdxCmbo < maxIdxCmbo: 
            cmboAsIdxUniqs[curIdxCmbo] = idxIntoUniqs
            curIdxCmbo += 1
        multiplicities[idxIntoUniqs] = count
        idxIntoCmbo  += count
        idxIntoUniqs += 1
    # print("multiplicities:", multiplicities)
    # print("cmboAsIdxUniqs:", cmboAsIdxUniqs)

    if idxIntoCmbo != lenCmbo:
        return

    cdef int x
    cdef int y
    while True:
        yield tuple(lstUniqs[idxUniqs] for idxUniqs in cmboAsIdxUniqs)

        for idxIntoCmbo in reversed(range(lenCmbo)):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            y = x + 1

            if y < lenUniqs and multiplicities[y] < lstCntRpts[y]:
                break
        else:
            return

        for idxIntoCmbo in range(idxIntoCmbo, lenCmbo):
            x = cmboAsIdxUniqs[idxIntoCmbo]
            cmboAsIdxUniqs[idxIntoCmbo] = y
            multiplicities[x] -= 1
            multiplicities[y] += 1
            # print("# multiplicities:", multiplicities)


            while y != lenUniqs and multiplicities[y] == lstCntRpts[y]:
                y += 1

            if y == lenUniqs:
                break

Run Code Online (Sandbox Code Playgroud)

Answer 1

MSe*_*ert 7

当我将itertools文档的一些配方重写为C扩展时,我提供了一些经验.我想我可能会有一些可以帮助你的见解.

Generator与Iterator类.

当你编写纯Python代码时,它是速度(生成器)和功能(迭代器)之间的权衡.

的yield功能(称为发电机)是速度,并且通常它们可以在不打扰有关内部状态被写入.因此编写它们的工作量较少,并且它们很快,因为Python只管理所有"状态".

生成器更快(或至少不慢)的原因主要是因为:

除了-method 之外,它们__next__直接(通常tp_iternext)实现-slot __next__.在这种情况下,Python不必查找__next__方法 - 这实际上是在下面的示例中使其更快的原因:

from itertools import islice

def test():
    while True:
        yield 1

class Test(object):
    def __iter__(self):
        return self

    def __next__(self):
        return 1

%timeit list(islice(test(), 1000))
# 173 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 499 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

所以它几乎快3倍,因为生成器直接填充__next__-slot.

一个yield-函数和类有状态,但yield功能保存和载入状态的速度远远超过你可以用类和属性的访问:

def test():
    i = 0
    while True:
        yield i
        i += 1

class Test(object):
    def __init__(self):
        self.val = 0

    def __iter__(self):
        return self

    def __next__(self):
        current = self.val
        self.val += 1
        return current

%timeit list(islice(test(), 1000))
# 296 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test(), 1000))
# 1.22 ms ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

这次课程的速度已经慢了4倍(相比之下,几乎没有任何州参与了3次).这是累积效应:所以你拥有的"状态"越多,类变体就越慢.

yield对于同类的方法来说太多了.请注意,实际时间取决于操作类型.例如,如果next调用时运行的实际代码很慢(即time.sleep(1)),则生成器和类之间几乎没有差别!

用Cython

如果你想要一个快速的cython迭代器类,它必须是一个cdef class.否则你不会得到真正快速的课程.原因是只有a cdef class创建一个直接实现该tp_iternext字段的扩展类型!我将使用IPythons %%cython来编译代码(所以我不必包含设置):

%%cython

def test():
    while True:
        yield 1

class Test(object):
    def __iter__(self):
        return self

    def __next__(self):
        return 1

cdef class Test_cdef(object):
    def __iter__(self):
        return self

    def __next__(self):
        return 1

%timeit list(islice(test(), 1000))
# 113 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit list(islice(Test(), 1000))
# 407 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit list(islice(Test_cdef(), 1000))
# 62.8 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Run Code Online (Sandbox Code Playgroud)

时间已经表明生成器和基本类比纯Python等价物快,但它们的相对性能大致保持不变.然而,cdef class变体击败了它们,这主要是因为使用了tp_iternext插槽而不是仅仅实现了该__next__方法.(如果你不信任我,请检查Cython生成的C代码:))

然而,它只比Python生成器快2倍,这并不坏,但它并不是完全压倒性的.要获得真正惊人的加速,你需要找到一种方法来表达没有Python对象的程序(Python对象越少,加速越快).例如,如果您使用字典存储项目并且它的多样性,您仍然存储Python对象,并且任何查找都必须使用python字典方法完成 - 即使您可以通过C API函数调用它们而不必查找实际方法:

%%cython

cpdef cython_count(items):
    cdef dict res = dict()
    for item in items:
        if item in res:
            res[item] += 1
        else:
            res[item] = 1
    return res

import random

def count(items):
    res = {}
    for item in items:
        if item in res:
            res[item] += 1
        else:
            res[item] = 1
    return res

l = [random.randint(0, 100) for _ in range(10000)]
%timeit cython_count(l)
# 2.06 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count(l)
# 3.63 ms ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

这里有一个问题,你没有使用collections.Counter哪个有优化的C代码(至少在python-3中)用于这种操作:

from collections import Counter
%timeit Counter(l)
# 1.17 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

这里有一个快速说明:不要使用,something in some_dict.keys()因为keys()Python2中的列表类似,而ony实现O(n)包含操作,而something in some_dict通常O(1)(两个Pythons)!这将使两个版本的速度更快,尤其是在Python2上:

def count2(items):
    res = {}
    for item in items:
        if item in res.keys():  # with "keys()"
            res[item] += 1
        else:
            res[item] = 1
    return res

# Python3
l = [random.randint(0, 100) for _ in range(10000)]
%timeit count(l)
# 3.63 ms ± 29 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count2(l)
# 5.9 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Python2
l = [random.randint(0, 10000) for _ in range(10000)]
%timeit count(l)
# 100 loops, best of 3: 4.59 ms per loop
%timeit count2(l)
# 1 loop, best of 3: 2.65 s per loop  <--- WHOOPS!!!

Run Code Online (Sandbox Code Playgroud)

这表明当你使用python结构时,你只能希望使用Cython(和C扩展)加速3-4倍,但即使是使用".keys()"这样的小错误也会在性能方面花费更多不正确.

优化Cython

那么如果你想要它更快,你能做什么？答案相对简单:基于C类型而不是Python类型创建自己的数据结构.

这意味着你必须考虑设计:

你想要支持哪些类型uniqComb**？你想要整数(例子说的是这样,但我想你想要任意的Python对象).
你想要从Python(如当前状态)进行内省吗？如果你想要它,将多重性保持为python对象是有意义的,但如果你不在乎你可以将它们保存为类似整数的对象而不是python对象.
您是否需要传递给uniqComb**函数的对象可以排序？您使用sorted但您也可以使用a OrderedDict并按照外观顺序而不是按数值保持键.

这些问题的答案(这些只是我立即问自己的问题,可能还有更多!)可以帮助您确定可以在内部使用的结构.例如,使用Cython,您可以连接到C++,您可以使用map包含整数键和整数值而不是字典.它是默认排序的,因此您不需要自己对它们进行手动排序,而是使用本机整数而不是Python对象.但是你失去了处理你的任意python对象的能力uniqComb,你需要知道如何在Cython中使用C++类型.它可能会非常快!

我不会走这条路,因为我假设你想支持任意可订购的python类型,我坚持Counter作为起点,但我将多重性保存为整数array.arrays而不是list.我们称之为"最少侵入性"的优化.如果你使用a list或者arrayfor lstCntRpts并且multiplicities因为它们不是瓶颈,它在性能方面实际上并不重要- 但是它更快一些并且节省了一点内存,更重要的是它显示了如何array在cython中包含同类s :

%%cython

from cpython.list cimport PyList_Size  # (most) C API functions can be used with cython!

from array import array
from collections import Counter

cdef class uniqCmboClassIter:

    cdef list lstUniqs
    cdef Py_ssize_t lenUniqs
    cdef int[:] lstCntRpts   # memoryview
    cdef Py_ssize_t lenCmbo
    cdef list cmboAsIdxUniqs
    cdef int[:] multiplicities  # memoryview
    cdef Py_ssize_t idxIntoCmbo
    cdef Py_ssize_t idxIntoUniqs
    cdef bint stopIteration
    cdef Py_ssize_t x
    cdef Py_ssize_t y

    def __init__(self, lstItems, lenCmbo):
        dctCounter = Counter(lstItems)

        self.lstUniqs = sorted(dctCounter)
        self.lenUniqs = PyList_Size(self.lstUniqs)
        self.lstCntRpts = array('i', [dctCounter[item] for item in self.lstUniqs])

        self.lenCmbo        = lenCmbo
        self.cmboAsIdxUniqs = [None] * lenCmbo
        self.multiplicities = array('i', [0] * self.lenUniqs)
        self.idxIntoCmbo, self.idxIntoUniqs = 0, 0

        while self.idxIntoCmbo != self.lenCmbo and self.idxIntoUniqs != self.lenUniqs:
            count = min(self.lstCntRpts[self.idxIntoUniqs], self.lenCmbo-self.idxIntoCmbo)
            self.cmboAsIdxUniqs[self.idxIntoCmbo : self.idxIntoCmbo + count] = [self.idxIntoUniqs] * count
            self.multiplicities[self.idxIntoUniqs] = count
            self.idxIntoCmbo += count
            self.idxIntoUniqs += 1
            # print("self.multiplicities:", self.multiplicities)
            # print("self.cmboAsIdxUniqs:", self.cmboAsIdxUniqs)

        if self.idxIntoCmbo != self.lenCmbo:
            return

        self.stopIteration = False
        self.x = 0
        self.y = 0

        return

    def __iter__(self):
        return self

    def __next__(self):
        if self.stopIteration is True:
            raise StopIteration

        nextCmbo = tuple(self.lstUniqs[idxUniqs] for idxUniqs in self.cmboAsIdxUniqs)

        for self.idxIntoCmbo in reversed(range(self.lenCmbo)):
            self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
            self.y = self.x + 1

            if self.y < self.lenUniqs and self.multiplicities[self.y] < self.lstCntRpts[self.y]:
                break
        else:
            self.stopIteration = True
            return nextCmbo

        for self.idxIntoCmbo in range(self.idxIntoCmbo, self.lenCmbo):
            self.x = self.cmboAsIdxUniqs[self.idxIntoCmbo]
            self.cmboAsIdxUniqs[self.idxIntoCmbo] = self.y
            self.multiplicities[self.x] -= 1
            self.multiplicities[self.y] += 1
            # print("# multiplicities:", multiplicities)

            while self.y != self.lenUniqs and self.multiplicities[self.y] == self.lstCntRpts[self.y]:
                self.y += 1

            if self.y == self.lenUniqs:
                break

        return nextCmbo

Run Code Online (Sandbox Code Playgroud)

你实际上没有分享你的时间参数,但我尝试了一些我的:

from itertools import combinations

import random
import time

def create_values(maximum):

    vals = [random.randint(0, maximum) for _ in range(48)]
    print('length: ', len(vals))
    print('sorted values: ', sorted(vals))
    print('uniques: ', len(set(vals)))
    print('uniques in percent: {:%}'.format(len(set(vals)) / len(vals)))

    return vals

class Timer(object):
    def __init__(self):
        pass

    def __enter__(self):
        self._time = time.time()

    def __exit__(self, *args, **kwargs):
        print(time.time() -  self._time)

vals = create_values(maximum=50)  # and 22 and 75 and 120
n = 6

with Timer():
    list(combinations(vals, n))

with Timer():
    list(uniqCmboClassIter(vals, n))

with Timer():
    list(uniqCmboClassIterOriginal(vals, n))

with Timer():
    list(uniqCmboYieldIterOriginal(vals, n))

Run Code Online (Sandbox Code Playgroud)

length:  48
sorted values:  [0, 0, 0, 1, 2, 2, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 13, 13, 14, 14, 14, 15, 15, 15, 17, 18, 19, 19, 19, 19, 20, 20, 20, 21, 21, 22, 22]
uniques:  21
uniques in percent: 43.750000%
6.250450611114502
0.4217393398284912
4.250436305999756
2.7186365127563477

length:  48
sorted values:  [1, 1, 2, 5, 6, 7, 7, 8, 8, 9, 11, 13, 13, 15, 16, 16, 16, 16, 17, 19, 19, 21, 21, 23, 24, 26, 27, 28, 28, 29, 31, 31, 34, 34, 36, 36, 38, 39, 39, 40, 41, 42, 44, 46, 47, 47, 49, 50]
uniques:  33
uniques in percent: 68.750000%
6.2034173011779785
4.343803882598877
42.39261245727539
26.65750527381897

length:  48
sorted values:  [4, 4, 7, 9, 10, 14, 14, 17, 19, 21, 23, 24, 24, 26, 34, 36, 40, 42, 43, 43, 45, 46, 46, 52, 53, 58, 59, 59, 61, 63, 66, 68, 71, 72, 72, 75, 76, 80, 82, 82, 83, 84, 86, 86, 89, 92, 97, 99]
uniques:  39
uniques in percent: 81.250000%
6.859697341918945
10.437987327575684
104.12988543510437
65.25306582450867

length:  48
sorted values:  [4, 7, 11, 19, 24, 29, 32, 36, 49, 49, 54, 57, 58, 60, 62, 65, 67, 70, 70, 72, 72, 79, 82, 83, 86, 89, 89, 90, 91, 94, 96, 99, 102, 111, 112, 118, 120, 120, 128, 129, 129, 134, 138, 141, 141, 144, 146, 147]
uniques:  41
uniques in percent: 85.416667%
6.484673023223877
13.610010623931885
136.28764533996582
84.73834943771362

Run Code Online (Sandbox Code Playgroud)

它绝对比原始方法执行得更好,实际上只需要类型声明几倍.可能有更多可以优化(禁用边界检查,使用Python C API函数调用,使用无符号整数或更小的整数,如果你知道你的多重性的"最大"和"最小",...) - 但事实itertools.combinations即使对于80%的独特物品来说,它也不比任何原始实施速度快得多,对我来说已经足够了.:-)

从技术上讲，所有迭代器都有一个“tp_iternext”槽，但是Python类和朴素的非扩展类型Cython类有一个“tp_iternext”，它查找“__next__”方法并调用它，而生成器和“cdef类” ` 有一个不涉及方法查找的 `tp_iternext`。当问题谈论编写 C 扩展模块时，我认为提问者会熟悉如何做到这一点并且知道使用诸如“tp_iternext”之类的东西，但这是一个糟糕的假设。 (2认同)
@Claudio因为你会失去动态的本质。例如，您可以重新分配 `__next__`：`Test.__next__ = lambda self: 2`。但是一旦你有了一个 C 扩展类，你就不能重新分配方法（实际上这是可能的，只是不容易，但我认为它不适用于特殊方法）。通常它只是一个小的常数因子，请记住，与“__next__”方法内完成的操作相比，“槽查找方法”的开销_通常_小。所以速度很少会慢 2/3 倍以上。还可以使用发电机来几乎完全缓解这种情况。 (2认同)

归档时间：	8 年，10 月前
查看次数：	1080 次
最近记录：	8 年，10 月前

生成器函数(yield)比迭代器类(__next__)快得多

Generator与Iterator类.

用Cython

优化Cython

生成器函数(yield)比迭代器类(next)快得多