我有兴趣保持对scrapy项目中字段名称的顺序的引用.这个存放在哪里?
>>> dir(item)
Out[7]:
['_MutableMapping__marker',
'__abstractmethods__',
'__class__',
'__contains__',
'__delattr__',
'__delitem__',
'__dict__',
'__doc__',
'__eq__',
'__format__',
'__getattr__',
'__getattribute__',
'__getitem__',
'__hash__',
'__init__',
'__iter__',
'__len__',
'__metaclass__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__slots__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_cache',
'_abc_negative_cache',
'_abc_negative_cache_version',
'_abc_registry',
'_class',
'_values',
'clear',
'copy',
'fields',
'get',
'items',
'iteritems',
'iterkeys',
'itervalues',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values']
Run Code Online (Sandbox Code Playgroud)
我尝试了item.keys(),但是返回了一个无序的dict
Itemclass有一个dict接口,存储在_valuesdict中的值,它不跟踪键顺序(https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53).我相信你可以从子类中Item重写并覆盖该__init__方法以使该容器成为Ordereddict:
from scrapy import Item
from collections import OrderedDict
class OrderedItem(Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs: # avoid creating dict for most common case
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
Run Code Online (Sandbox Code Playgroud)
然后,该项目保留分配值的顺序:
In [28]: class SomeItem(OrderedItem):
...: a = Field()
...: b = Field()
...: c = Field()
...: d = Field()
...:
...: i = SomeItem()
...: i['b'] = 'bbb'
...: i['a'] = 'aaa'
...: i['d'] = 'ddd'
...: i['c'] = 'ccc'
...: i.items()
...:
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1489 次 |
| 最近记录: |