如何在python OrderedDict上使用字符串键而不是整数进行切片?

Nic*_*ing 21 python dictionary ordereddictionary python-2.7 python-internals

由于OrderedDict具有列表(具有有序元素)和字典(使用键而不是索引)的特征,因此使用键切片似乎很自然.

>>> from collections import OrderedDict
>>> cities = OrderedDict((('san francisco', 650), ('new york', 212), ('shanghai', 8621), ('barcelona', 42423)))
>>> test['shanghai':]  # I want all the cities from shanghai to the end of the list
TypeError: unhashable type
Run Code Online (Sandbox Code Playgroud)

有趣的是,由于OrderedDictionary.__getslice__没有实施,这不是你看到的错误.我尝试添加自己的__getslice__方法OrderedDict,但我一直遇到这个TypeError问题.似乎Python正在进行某种类型检查以强制切片键只是整数,在它们传递给__getslice__函数之前,是多少unpythonic!

>>> class BetterOrderedDict(OrderedDict):
        def __getslice__(self, start=None, end=None, step=1):
            return 'potato'

>>> test = BetterOrderedDict((('one', 1), ('two', 2), ('three', 3), ('four', 4)))
>>> print test[1:4]
'potato'                           # ok this makes sense so far

>>> test['one':'four']
TypeError: unhashable type         # WTF, strings are hashable!
Run Code Online (Sandbox Code Playgroud)

所以我的问题是,为什么我不能实现非int切片,什么样的类型检查阻止切片键甚至到达我的__getslice__函数,我可以通过BetterOrderedDict在C中使用绑定来实现我的覆盖吗?

zch*_*zch 22

__getslice__是不推荐的实现切片的方式.相反,你应该处理slice对象__getitem__:

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            return 'potato({},{},{})'.format(key.start, key.stop, key.step)
        return super(SlicableDict, self).__getitem__(key)

>>> s = SlicableDict(a=1, b=2, c=3)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2)])
>>> s['a']
1
>>> s['a':'c']
'potato(a,c,None)'
Run Code Online (Sandbox Code Playgroud)

如果您需要的不仅仅是土豆,那么您可以按照以下方式实施所有三种切片操作:

def _key_slice_to_index_slice(items, key_slice):
    try:
        if key_slice.start is None:
            start = None
        else:
            start = next(idx for idx, (key, value) in enumerate(items)
                         if key == key_slice.start)
        if key_slice.stop is None:
            stop = None
        else:
            stop = next(idx for idx, (key, value) in enumerate(items)
                        if key == key_slice.stop)
    except StopIteration:
        raise KeyError
    return slice(start, stop, key_slice.step)

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            return SlicableDict(items[index_slice])
        return super(SlicableDict, self).__getitem__(key)

    def __setitem__(self, key, value):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            items[index_slice] = value.items()
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__setitem__(key, value)

    def __delitem__(self, key):
        if isinstance(key, slice):
            items = self.items()
            index_slice = _key_slice_to_index_slice(items, key)
            del items[index_slice]
            self.clear()
            self.update(items)
            return
        return super(SlicableDict, self).__delitem__(key)
Run Code Online (Sandbox Code Playgroud)


the*_*eye 12

这是您期望的切片功能的实际实现.

OrderedDict内部以双向链表的形式维护密钥的顺序.引用Python 2.7.9中的实际注释,

# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# Each link is stored as a list of length three:  [PREV, NEXT, KEY].
Run Code Online (Sandbox Code Playgroud)

现在,为了对字典进行切片,我们需要迭代双向链表__root,这实际上是一个私有变量,受名称修改机制的保护.

注意:这涉及使用OrderedDict内部数据结构的hacky名称unmangling .

from collections import OrderedDict

class SlicableDict(OrderedDict):
    def __getitem__(self, key):
        if isinstance(key, slice):
            # Unmangle `__root` to access the doubly linked list
            root = getattr(self, "_OrderedDict__root")
            # By default, make `start` as the first element, `end` as the last
            start, end = root[1][2], root[0][2]
            start = key.start or start
            end = key.stop or end
            step = key.step or 1
            curr, result, begun, counter = root[1], [], False, 0

            # Begin iterating
            curr, result, begun = root[1], [], False
            while curr is not root:
                # If the end value is reached, `break` and `return`
                if curr[2] == end:
                    break
                # If starting value is matched, start appending to `result`
                if curr[2] == start:
                    begun = True
                if begun:
                    if counter % step == 0:
                        result.append((curr[2], self[curr[2]]))
                    counter += 1

                # Make the `curr` point to the next element
                curr = curr[1]

            return result

        return super(SlicableDict, self).__getitem__(key)
Run Code Online (Sandbox Code Playgroud)

几个样本运行:

>>> s = SlicableDict(a=1, b=2, c=3, d=4)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4), ('f', 6)])
>>> s['a':'c']
[('a', 1)]
>>> s['a':]
[('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4)]
>>> s[:'a']
[]
>>> s['a':'f':2]
[('a', 1), ('b', 2), ('d', 4)]
Run Code Online (Sandbox Code Playgroud)

  • 很好的回答暴露了'OrderedDict`s的"管道".我希望你的答案比我的快得多,但是根据'OrderedDict'中的实现变化(由于名称unmangling等)而受到破坏,而我使用的"瓷器"实现可能是安全的版本之间的破坏,但不是这么快. (5认同)
  • @thefoureye什么是`getattr`而不仅仅是'self._OrderedDict__root`? (2认同)