第二个较短的压缩 Python 生成器:如何检索静默消耗的元素

Jea*_* T. 66 python zip generator python-itertools python-3.x

我想解析 2 个(可能)不同长度的生成器zip

for el1, el2 in zip(gen1, gen2):
    print(el1, el2)
Run Code Online (Sandbox Code Playgroud)

但是,如果gen2元素较少,gen1则“消耗”一个额外的元素。

例如,

def my_gen(n:int):
    for i in range(n):
        yield i

gen1 = my_gen(10)
gen2 = my_gen(8)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen1))  # printed value is "9" => 8 is missing

gen1 = my_gen(8)
gen2 = my_gen(10)

list(zip(gen1, gen2))  # Last tuple is (7, 7)
print(next(gen2))  # printed value is "8" => OK
Run Code Online (Sandbox Code Playgroud)

显然,缺少一个值(8在我之前的示例中),因为在它意识到没有更多元素之前gen1被读取(从而生成值8gen2。但是这个值在宇宙中消失了。当gen2是“更长”时,就不存在这样的“问题”。

问题:有没有办法检索这个缺失值(即8在我之前的例子中)?...理想情况下具有可变数量的参数(就像zip那样)。

注意我目前通过 using 以另一种方式实现,itertools.zip_longest但我真的想知道如何使用zip或等效来获取此缺失值。

注意 2我在这个 REPL 中创建了一些不同实现的测试,以防您想提交并尝试新的实现:) https://repl.it/@jfthuong/MadPhysicistChester

Ray*_*ger 40

开箱即用,zip()被硬连线以处理不匹配的项目。因此,您需要一种在值被消耗之前记住值的方法。

名为tee()的 itertool就是为此目的而设计的。您可以使用它来创建第一个输入迭代器的“阴影”。如果第二个迭代器终止,您可以从影子迭代器中获取第一个迭代器的值。

这是使用现有工具的一种方法,它以 C 速度运行,并且内存效率高:

>>> from itertools import tee
>>> from operator import itemgetter

>>> iterable1, iterable2 = 'abcde', 'xyz' 

>>> it1, shadow1 = tee(iterable1)
>>> it2 = iter(iterable2)
>>> combined = map(itemgetter(0, 1), zip(it1, it2, shadow1))
 
>>> list(combined)
[('a', 'x'), ('b', 'y'), ('c', 'z')]
>>> next(shadow1)
'd'
Run Code Online (Sandbox Code Playgroud)

  • @让-弗朗索瓦T。*it1* 迭代器和 *shadow1* 迭代器都在 *zip()* 中,因此它们将以相同的速率被消耗。假设 *teeobject* 在内存中保存的数据元素不超过一个。 (3认同)
  • 还有一个问题:为什么需要在“zip”中包含“shadow1”? (2认同)

Mad*_*ist 30

一种方法是实现一个生成器,让您缓存最后一个值:

class cache_last(collections.abc.Iterator):
    """
    Wraps an iterable in an iterator that can retrieve the last value.

    .. attribute:: obj

       A reference to the wrapped iterable. Provided for convenience
       of one-line initializations.
    """
    def __init__(self, iterable):
        self.obj = iterable
        self._iter = iter(iterable)
        self._sentinel = object()

    @property
    def last(self):
        """
        The last object yielded by the wrapped iterator.

        Uninitialized iterators raise a `ValueError`. Exhausted
        iterators raise a `StopIteration`.
        """
        if self.exhausted:
            raise StopIteration
        return self._last

    @property
    def exhausted(self):
        """
        `True` if there are no more elements in the iterator.
        Violates EAFP, but convenient way to check if `last` is valid.
        Raise a `ValueError` if the iterator is not yet started.
        """
        if not hasattr(self, '_last'):
            raise ValueError('Not started!')
        return self._last is self._sentinel

    def __next__(self):
        """
        Retrieve, record, and return the next value of the iteration.
        """
        try:
            self._last = next(self._iter)
        except StopIteration:
            self._last = self._sentinel
            raise
        # An alternative that has fewer lines of code, but checks
        # for the return value one extra time, and loses the underlying
        # StopIteration:
        #self._last = next(self._iter, self._sentinel)
        #if self._last is self._sentinel:
        #    raise StopIteration
        return self._last

    def __iter__(self):
        """
        This object is already an iterator.
        """
        return self
Run Code Online (Sandbox Code Playgroud)

要使用它,请将输入包装为zip

gen1 = cache_last(range(10))
gen2 = iter(range(8))
list(zip(gen1, gen2))
print(gen1.last)
print(next(gen1)) 
Run Code Online (Sandbox Code Playgroud)

制作gen2迭代器而不是可迭代器很重要,这样您就可以知道哪个耗尽了。如果gen2已用尽,则无需检查gen1.last

另一种方法是覆盖 zip 以接受可变的迭代序列而不是单独的迭代。这将允许您用包含“偷看”项目的链接版本替换可迭代对象:

def myzip(iterables):
    iterators = [iter(it) for it in iterables]
    while True:
        items = []
        for it in iterators:
            try:
                items.append(next(it))
            except StopIteration:
                for i, peeked in enumerate(items):
                    iterables[i] = itertools.chain([peeked], iterators[i])
                return
            else:
                yield tuple(items)

gens = [range(10), range(8)]
list(myzip(gens))
print(next(gens[0]))
Run Code Online (Sandbox Code Playgroud)

由于许多原因,这种方法是有问题的。它不仅会丢失原始的可迭代对象,而且会丢失原始对象可能具有的任何有用的属性,因为它会被替换为chain对象。

  • 没有必要做所有这些工作。*itertools.tee()* 已经被设计为有效地分割迭代器流并记住未使用的值。 (2认同)

Ch3*_*teR 18

这是文档中zip给出的等效实现

def zip(*iterables):
    # zip('ABCD', 'xy') --> Ax By
    sentinel = object()
    iterators = [iter(it) for it in iterables]
    while iterators:
        result = []
        for it in iterators:
            elem = next(it, sentinel)
            if elem is sentinel:
                return
            result.append(elem)
        yield tuple(result)
Run Code Online (Sandbox Code Playgroud)

在您的第一个示例gen1 = my_gen(10)gen2 = my_gen(8). 在两个生成器都被消耗后直到第 7 次迭代。现在在第 8 次迭代中,gen1调用elem = next(it, sentinel)返回 8,但是当gen2调用时elem = next(it, sentinel)它返回sentinel(因为此时gen2已用尽)并且if elem is sentinel满足并且函数执行返回并停止。现在next(gen1)返回 9。

在您的第二个示例gen1 = gen(8)gen2 = gen(10). 在两个生成器都被消耗后直到第 7 次迭代。现在在第 8 次迭代中gen1调用elem = next(it, sentinel)返回sentinel(因为此时gen1已用尽)并if elem is sentinel满足并且函数执行返回并停止。现在next(gen2)返回 8。

Mad Physicist's answer 的启发,您可以使用此Gen包装器来反击它:

编辑:处理Jean-Francois T.

一旦从迭代器中消耗了一个值,它就会从迭代器中永远消失,并且迭代器没有就地变异方法将它添加回迭代器。一种解决方法是存储最后消耗的值。

class Gen:
    def __init__(self,iterable):
        self.d = iter(iterable)
        self.sentinel = object()
        self.prev = self.sentinel
    def __iter__(self):
        return self
    @property
    def last_val_consumed(self):
        if self.prev is None:
            raise StopIteration
        if self.prev == self.sentinel:
            raise ValueError('Nothing has been consumed')
        return self.prev
    def __next__(self):
        self.prev = next(self.d,None)
        if self.prev is None:
            raise StopIteration
        return self.prev
Run Code Online (Sandbox Code Playgroud)

例子:

# When `gen1` is larger than `gen2`
gen1 = Gen(range(10))
gen2 = Gen(range(8))
list(zip(gen1,gen2))
# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7)]
gen1.last_val_consumed
# 8 #as it was the last values consumed
next(gen1)
# 9
gen1.last_val_consumed
# 9

# 2. When `gen1` or `gen2` is empty
gen1 = Gen(range(0))
gen2 = Gen(range(5))
list(zip(gen1,gen2))
gen1.last_val_consumed
# StopIteration error is raised
gen2.last_val_consumed
# ValueError is raised saying `ValueError: Nothing has been consumed`
Run Code Online (Sandbox Code Playgroud)

  • 干得好,它通过了我在这里编写的所有测试:https://repl.it/@jfthuong/MadPhysicistChester 您可以在线运行它们,非常方便:) (2认同)

Ter*_*ryA 10

我可以看到您已经找到了这个答案,并且在评论中提出了它,但我想我会从中做出一个答案。您想使用itertools.zip_longest(),它将用 替换较短的生成器的空值None

import itertools

def my_gen(n:int):
    for i in range(n):
        yield i

gen1 = my_gen(10)
gen2 = my_gen(8)

for i, j in itertools.zip_longest(gen1, gen2):
    print(i, j)
Run Code Online (Sandbox Code Playgroud)

印刷:

0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 None
9 None
Run Code Online (Sandbox Code Playgroud)

您还可以fillvalue在调用时提供一个参数以使用默认值zip_longest替换None,但基本上对于您的解决方案,一旦您在 for 循环中点击 a Noneij),另一个变量将具有您的8.


J.G*_*.G. 6

受到@GrandPhuba 对 的阐明的启发zip,让我们创建一个“安全”变体(此处进行单元测试):

def safe_zip(*args):
    """
    Safe zip that restores last consumed element in eachgenerator
    if not able to consume an element in all of them

    Returns:
        * generators in tuple
        * generator for zipped generators
    """
  continue_ = True
  n = len(args)
  result = (_ for _ in [])
  while continue_:
    addend = []
    for i, gen in enumerate(args):
      try:
        value = next(gen)
        addend.append(value)
      except StopIteration:
        genlist = list(args)
        args = tuple([chain([v], g) for v, g in zip(addend, genlist[:i])]+genlist[i:])
        continue_ = False
        break
    if len(addend)==n: result = chain(result, [tuple(addend)])
  return args, result
Run Code Online (Sandbox Code Playgroud)

这是一个基本的测试:

    g1, g2 = (i for i in range(10)), (i for i in range(4))
    # Create (g1, g2), g3 first, then loop over g3 as one would with zip
    (g1, g2), g3 = safe_zip(g1, g2)
    for a, b in g3:
        print(a, b)#(0, 0) to (3, 3)
    for x in g1:
        print(x)#4 to 9
Run Code Online (Sandbox Code Playgroud)


rus*_*ro1 5

你可以使用itertools.teeitertools.islice

from itertools import islice, tee

def zipped(gen1, gen2, pred=list):
    g11, g12 = tee(gen1)
    z = pred(zip(g11, gen2))

    return (islice(g12, len(z), None), gen2), z

gen1 = iter(range(10))
gen2 = iter(range(5))

(gen1, gen2), output = zipped(gen1, gen2)

print(output)
print(next(gen1))
# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
# 5
Run Code Online (Sandbox Code Playgroud)


Nei*_*l G 2

如果你想重用代码,最简单的解决方案是:

from more_itertools import peekable

a = peekable(a)
b = peekable(b)

while True:
    try:
        a.peek()
        b.peek()
    except StopIteration:
        break
    x = next(a)
    y = next(b)
    print(x, y)


print(list(a), list(b))  # Misses nothing.
Run Code Online (Sandbox Code Playgroud)

您可以使用您的设置测试此代码:

def my_gen(n: int):
    yield from range(n)

a = my_gen(10)
b = my_gen(8)
Run Code Online (Sandbox Code Playgroud)

它将打印:

0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
[8, 9] []
Run Code Online (Sandbox Code Playgroud)