在python中合并连接两个生成器

yaw*_*iek 3 python generator

我想通过密钥合并两个京都内阁 B 树数据库。(京都内阁python api)。结果列表应包含两个输入数据库中任何一个的每个唯一键(及其值)。

以下代码有效,但我认为它丑陋。
left_generator/right_generator 是两个游标对象。如果生成器耗尽, get() 返回 None 尤其奇怪。

def merge_join_kv(left_generator, right_generator):
stop = False
while left_generator.get() or right_generator.get():
    try:
        comparison = cmp(right_generator.get_key(), left_generator.get_key())
        if comparison == 0:
            yield left_generator.get_key(), left_generator.get_value()
            left_generator.next()
            right_generator.next()
        elif (comparison < 0) or (not left_generator.get() or not right_generator.get()):
            yield right_generator.get_key(), right_generator.get_value()
            right_generator.next()   
        else:
            yield left_generator.get_key(), left_generator.get_value()
            left_generator.next()    
    except StopIteration:
        if stop:
            raise
        stop = True
Run Code Online (Sandbox Code Playgroud)

通常:是否有一个函数/库将生成器与 cmp() 合并?

Hug*_*ell 5

我认为这就是你所需要的;orderedMerge 基于 Gnibbler 的代码,但添加了自定义键函数和唯一参数,

import kyotocabinet
import collections
import heapq

class IterableCursor(kyotocabinet.Cursor, collections.Iterator):
    def __init__(self, *args, **kwargs):
        kyotocabinet.Cursor.__init__(self, *args, **kwargs)
        collections.Iterator.__init__(self)

    def next():
        "Return (key,value) pair"
        res = self.get(True)
        if res is None:
            raise StopIteration
        else:
            return res

def orderedMerge(*iterables, **kwargs):
    """Take a list of ordered iterables; return as a single ordered generator.

    @param key:     function, for each item return key value
                    (Hint: to sort descending, return negated key value)

    @param unique:  boolean, return only first occurrence for each key value?
    """
    key     = kwargs.get('key', (lambda x: x))
    unique  = kwargs.get('unique', False)

    _heapify       = heapq.heapify
    _heapreplace   = heapq.heapreplace
    _heappop       = heapq.heappop
    _StopIteration = StopIteration

    # preprocess iterators as heapqueue
    h = []
    for itnum, it in enumerate(map(iter, iterables)):
        try:
            next  = it.next
            data   = next()
            keyval = key(data)
            h.append([keyval, itnum, data, next])
        except _StopIteration:
            pass
    _heapify(h)

    # process iterators in ascending key order
    oldkeyval = None
    while True:
        try:
            while True:
                keyval, itnum, data, next = s = h[0]  # get smallest-key value
                                                      # raises IndexError when h is empty
                # if unique, skip duplicate keys
                if unique and keyval==oldkeyval:
                    pass
                else:
                    yield data
                    oldkeyval = keyval

                # load replacement value from same iterator
                s[2] = data = next()        # raises StopIteration when exhausted
                s[0] = key(data)
                _heapreplace(h, s)          # restore heap condition
        except _StopIteration:
            _heappop(h)                     # remove empty iterator
        except IndexError:
            return    
Run Code Online (Sandbox Code Playgroud)

那么你的功能可以做为

from operator import itemgetter

def merge_join_kv(leftGen, rightGen):
    # assuming that kyotocabinet.Cursor has a copy initializer
    leftIter = IterableCursor(leftGen)
    rightIter = IterableCursor(rightGen)

    return orderedMerge(leftIter, rightIter, key=itemgetter(0), unique=True)
Run Code Online (Sandbox Code Playgroud)