Python链表O(1)插入/删除

ecl*_*rat 8 python algorithm linked-list list

我正在寻找Python的链表和相关算法实现.我要求的每个人都建议使用内置的Python列表,但性能测量表明列表插入和删除是我们应用程序的瓶颈.实现一个简单的链表是微不足道的,但我想知道是否有一个成熟的库,其中包括一些操作,如排序,合并,拼接,搜索,下限/上限等...

我知道这是一个骗局,但在任何搜索引擎上搜索python列表都会给出可预测的糟糕结果,大多数人只是说python中不需要链接列表(pfft!).

PS:我需要从列表中的任何位置插入和删除,而不仅仅是结尾.

好的,你要求它:我需要维护一个有数十万个条目的有序列表.我将逐步遍历列表(逐个),使用每个条目上的访问者,从头开始或二进制搜索找到的位置.当找到与谓词匹配的条目时,将其从列表中移除,然后,对从列表的子集开始的另一个二进制搜索从所移除的条目的先前位置开始,直到预先在统计上确定的位置.忽略错误条件,修改的条目可以用于创建另一个链接列表,该链接列表被拼接到通过第二二进制搜索找到的新位置.从删除条目的位置继续迭代.有时,可以在列表中的任何位置添加/移除数千个连续的有序条目.有时必须逐步搜索和删除数千个不连续的条目.

python的列表是不可接受的,因为插入/删除的成本过高,而二进制搜索的速度微小增益与总成本完全无关.我们的内部测试证实了这一点.

如果我忽略了任何细节,或许我可以通过电子邮件向您发送我公司的保密协议副本,我可以就此事私下与您通信.sarcasm.end().

Eli*_*sky 10

这是一篇分享你痛苦的博客文章.它包括链表的实现和性能比较.


或许blist会更好(从这里开始)?

BList比Python列表稍慢的用例如下(O(log n)vs O(1)):

  1. 一个永远不会改变长度的大型列表.
  2. 插入和删除仅在列表末尾(LIFO)的大型列表.

有了这个免责声明,这里有一些使用案例,其中BLists比内置列表快得多:

  1. 插入或删除大型列表(O(log n)与O(n))
  2. 取大片大片(O(log n)vs O(n))
  3. 制作大型列表的浅表副本(O(1)与O(n))
  4. 更改大片大片(O(log n + log k)与O(n + k))
  5. 将列表相乘以生成大型稀疏列表(O(log k)与O(kn))

请注意,它实际上是作为B +树实现的,从而为所有这些操作提供了出色的性能.

  • @unknown:但是在Python中实现链表很容易.真.你看过那篇博文了吗?那家伙实现了它.您需要纯Python解决方案,还是C扩展? (2认同)

C. *_*ann 7

Python列表是O(1),用于列表末尾的操作.如果您将以半连续的方式进行所有插入 - 通过类比于C,只将单个指针保持在列表中间作为各种"光标" - 您可以节省很多精力只使用两个Python列表.一个列表是光标前的内容,一个是后面的内容; 移动光标涉及从一个列表中拉出下一个项目并将其附加到另一个列表.这使得您可以在光标位置插入任意O(1),而不是创建一个全新的数据结构,而且可以重复使用很多现有的列表函数.

但是,对于允许多个引用进入列表的完全一般情况,您可能会陷入某种类型的链接列表.

编辑:你并没有认真考虑在链表上实际进行"二元搜索",是吗?对于本质上顺序的数据结构,二进制搜索甚至没有意义......

无论如何,如果您对线性时间搜索没问题,并且您的插入将始终保留列表顺序而不重新排序,那么您可能只需要一个简单的链接列表.如果你像迭代一样进行搜索,你应该考虑使用快速索引的东西,如果需要求助,那么像树一样会更好.


Gle*_*ard 7

令人费解的是,每个人都需要证明需要链表.链接列表是最基本的数据结构之一,原因如下:它们具有其他主要数据结构缺少的属性,如果需要这些属性,则需要链表或其近亲.如果您不理解为什么链表是一个不能总是用双端队列或二叉树替换的重要数据结构,那么您绝不应该通过"数据结构简介"类.

这里有一个快速的实现,支持平常的东西:常量时间插入在给予节点的引用的任何一点,分裂列表分为两个列表,并插入一个列表到另一个列表(拼接)的中间.支持通用Python接口:push,pop,pushleft,popleft,extend,普通迭代,切片迭代(getiter).

我刚刚写了这个,所以它是经过证明的,但没有经过生产测试; 可能还有bug.

def _ref(obj):
    """
    weakref.ref has a bit of braindamage: you can't take a weakref to None.
    This is a major hassle and a needless limitation; work around it.
    """
    from weakref import ref
    if obj is None:
        class NullRef(object):
            def __call__(self): return None
        return NullRef()
    else:
        return ref(obj)

class _node(object):
    def __init__(self, obj):
        self.obj = obj
        self._next = None
        self._prev = _ref(None)
    def __repr__(self):
        return "node(%s)" % repr(self.obj)

    def __call__(self):
        return self.obj

    @property
    def next(self):
        return self._next

    @property
    def prev(self):
        return self._prev()

# Implementation note: all "_last" and "prev" links are weakrefs, to prevent circular references.
# This is important; if we don't do this, every list will be a big circular reference.  This would
# affect collection of the actual objects in the list, not just our node objects.
#
# This means that _node objects can't exist on their own; they must be part of a list, or nodes
# in the list will be collected.  We also have to pay attention to references when we move nodes
# from one list to another.
class llist(object):
    """
    Implements a doubly-linked list.
    """
    def __init__(self, init=None):
        self._first = None
        self._last = _ref(None)
        if init is not None:
            self.extend(init)

    def insert(self, item, node=None):
        """
        Insert item before node.  If node is None, insert at the end of the list.
        Return the node created for item.

        >>> l = llist()
        >>> a = l.insert(1)
        >>> b = l.insert(2)
        >>> d = l.insert(4)
        >>> l._check()
        [1, 2, 4]
        >>> c = l.insert(3, d)
        >>> l._check()
        [1, 2, 3, 4]
        """
        item = _node(item)

        if node is None:
            if self._last() is not None:
                self._last()._next = item
                item._prev = _ref(self._last())
            self._last = _ref(item)
            if self._first is None:
                self._first = item
        else:
            assert self._first is not None, "insertion node must be None when the list is empty"
            if node._prev() is not None:
                node._prev()._next = item
            item._prev = node._prev
            item._next = node
            node._prev = _ref(item)
            if node is self._first:
                self._first = item
        return item

    def remove(self, node):
        """
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l.remove(c) # Check removing from the middle
        3
        >>> l._check()
        [1, 2, 4, 5]
        >>> l.remove(a) # Check removing from the start
        1
        >>> l._check()
        [2, 4, 5]
        >>> l.remove(e) # Check removing from the end
        5
        >>> l._check()
        [2, 4]
        """
        if self._first is node:
            self._first = node._next
        if self._last() is node:
            self._last = node._prev
        if node._next is not None:
            node._next._prev = node._prev
        if node._prev() is not None:
            node._prev()._next = node._next
        node._next = None
        node._prev = _ref(None)
        return node.obj

    def __nonzero__(self):
        """
        A list is true if it has any elements.

        >>> l = llist()
        >>> bool(l)
        False
        >>> l = llist([1])
        >>> bool(l)
        True
        """
        return self._first is not None


    def __iter__(self):
        """
        >>> l = llist([1,2,3])
        >>> [i() for i in l]
        [1, 2, 3]
        """
        return self.getiter(self._first, self._last())

    def _check(self):
        if self._last() is None:
            assert self._last() is None
            return []
        node = self._first
        ret = []
        while node is not None:
            if node._next is None:
                assert node == self._last()
            if node._prev() is None:
                assert node == self._first
            if node._next is not None:
                assert node._next._prev() == node
            if node._prev() is not None:
                assert node._prev()._next == node
            ret.append(node.obj)
            node = node._next
        return ret

    def getiter(self, first, last):
        """
        Return an iterator over [first,last].

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> start = l.append(2)
        >>> l.extend([3,4,5,6])
        >>> end = l.append(7)
        >>> l.extend([8,9])
        >>> [i() for i in l.getiter(start, end)]
        [2, 3, 4, 5, 6, 7]
        """
        class listiter(object):
            def __init__(self, first, last):
                self.node = first
                self.final_node = last

            def __iter__(self): return self
            def next(self):
                ret = self.node
                if ret is None:
                    raise StopIteration
                if ret is self.final_node:
                    self.node = None
                else:
                    self.node = self.node._next
                return ret
        return listiter(first, last)

    def append(self, item):
        """
        Add an item to the end of the list.

        >>> l = llist()
        >>> l.append(1)
        node(1)
        >>> l.append(2)
        node(2)
        >>> l._check()
        [1, 2]
        """
        return self.insert(item, None)

    def appendleft(self, item):
        """
        Add an item to the beginning of the list.

        >>> l = llist()
        >>> l.appendleft(1)
        node(1)
        >>> l.appendleft(2)
        node(2)
        >>> l._check()
        [2, 1]
        """
        return self.insert(item, self._first)

    def pop(self):
        """
        Remove an item from the end of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.pop()
        3
        >>> l.pop()
        2
        >>> l.pop()
        1
        >>> l.pop()
        Traceback (most recent call last):
        ...
        IndexError: pop from empty llist
        """
        if self._last() is None:
            raise IndexError, "pop from empty llist"
        return self.remove(self._last())

    def popleft(self):
        """
        Remove an item from the beginning of the list and return it.

        >>> l = llist([1,2,3])
        >>> l.popleft()
        1
        >>> l.popleft()
        2
        >>> l.popleft()
        3
        >>> l.popleft()
        Traceback (most recent call last):
        ...
        IndexError: popleft from empty llist
        """
        if self._first is None:
            raise IndexError, "popleft from empty llist"
        return self.remove(self._first)

    def splice(self, source, node=None):
        """
        Splice the contents of source into this list before node; if node is None, insert at
        the end.  Empty source_list.  Return the first and last nodes that were moved.

        # Test inserting at the beginning.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, a)
        (node(4), node(6))
        >>> l._check()
        [4, 5, 6, 1, 2, 3]
        >>> l2._check()
        []

        # Test inserting in the middle.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, b)
        (node(4), node(6))
        >>> l._check()
        [1, 4, 5, 6, 2, 3]
        >>> l2._check()
        []

        # Test inserting at the end.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4,5,6])
        >>> l.splice(l2, None)
        (node(4), node(6))
        >>> l._check()
        [1, 2, 3, 4, 5, 6]
        >>> l2._check()
        []

        # Test inserting a list with a single item.
        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> l2 = llist([4])
        >>> l.splice(l2, b)
        (node(4), node(4))
        >>> l._check()
        [1, 4, 2, 3]
        >>> l2._check()
        []
        """
        if source._first is None:
            return
        first = source._first
        last = source._last()

        if node is None:
            if self._last() is not None:
                self._last()._next = source._first
            source._first._prev = self._last
            self._last = source._last
            if self._first is None:
                self._first = source._first
        else:
            source._first._prev = node._prev
            source._last()._next = node
            if node._prev() is not None:
                node._prev()._next = source._first
            node._prev = source._last
            if node is self._first:
                self._first = source._first
        source._first = None
        source._last = _ref(None)
        return first, last

    def split(self, start, end=None):
        """
        Remove all items between [node, end] and return them in a new list.  If end is None,
        remove until the end of the list.

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l._check()
        [1, 2, 3, 4, 5]
        >>> l2 = l.split(c, e)
        >>> l._check()
        [1, 2]
        >>> l2._check()
        [3, 4, 5]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(a, c)
        >>> l._check()
        [4, 5]
        >>> l2._check()
        [1, 2, 3]

        >>> l = llist()
        >>> a = l.append(1)
        >>> b = l.append(2)
        >>> c = l.append(3)
        >>> d = l.append(4)
        >>> e = l.append(5)
        >>> l2 = l.split(b, d)
        >>> l._check()
        [1, 5]
        >>> l2._check()
        [2, 3, 4]
        """
        if end is None:
            end = self._last()

        ret = llist()

        # First, move the region into the new list.  It's important to do this first, or
        # once we remove the nodes from the old list, they'll be held only by weakrefs and
        # nodes could end up being collected before we put it into the new one.
        ret._first = start
        ret._last = _ref(end)

        # Hook our own nodes back together.
        if start is self._first:
            self._first = end._next
        if end is self._last():
            self._last = start._prev

        if start._prev() is not None:
            start._prev()._next = end._next
        if end._next is not None:
            end._next._prev = start._prev
        start._prev = _ref(None)
        end._next = None

        return ret

    def extend(self, items):
        """
        >>> l = llist()
        >>> l.extend([1,2,3,4,5])
        >>> l._check()
        [1, 2, 3, 4, 5]
        """
        for item in items:
            self.append(item)

if __name__ == "__main__":
    import doctest
    doctest.testmod()
Run Code Online (Sandbox Code Playgroud)

  • 人们可能会质疑动机,因为在通过"数据结构简介"之后,实际上*需要一个专门针对其性能特征的链表是很不寻常的.绝大多数情况下,如果数据足够大以至于担心时间复杂性,则需要进行排序,索引或搜索,并且链接列表在所有这些方面都非常糟糕*.如果没有上下文,坚持链接列表听起来就像是一个设计错误,尤其是当澄清请求遭到无端的敌意时. (4认同)

Ale*_*lli 6

有一个单链表这里(配方17.14在Python食谱第1版),但它几乎没有"成熟"或富-它只是做一个FIFO队列所以这是相当小.

这个配方是一个非常简洁的C实现(只读)类似Lisp的缺点 - 只是汽车,cdr和缺点; 再次,不是丰富的类型,而是最小的类型(并且将其用于可变数据,而不是纯函数方法,您至少需要添加setcar和setcdr).这可能是一个更好的起点,因为cons-cells非常灵活和熟悉.

您需要的某些操作可能最好由现有的Python原语完成.例如,对于排序,很难看出你自己的排序如何能够超越Python的性能sorted(linkedlist)(当然,你可以将linkedlist类型设置为Python可迭代的,因此它可以很好地与其他语言和库一起使用;-),考虑timsort在Python运行时中实现的算法的强大功能.

更一般地,我建议你仔细timeit考虑每一步,以考虑C编码方法真正为你买多少(当与我在开始时给出的URL中的配方所例证的普通C编码的那个相比时)这个答案) - 这将在很大程度上取决于您的应用程序列表的大小和性质,因此您最适合组织这些基准测试.