Mat*_*ain 68 python recursion dictionary traversal
我有一个这样的字典:
{ "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz",
"keyA" : "blah blah blah" },
{ "id" : "fghi",
"keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf",
"keyQ" : "blah blah" },
{ "id" : "yuiop",
"keyW" : "blah" }] } ] }
Run Code Online (Sandbox Code Playgroud)
基本上是具有任意深度的嵌套列表,字典和字符串的字典.
遍历此方法以提取每个"id"键的值的最佳方法是什么?我想实现相当于XPath查询,如"// id"."id"的值始终是一个字符串.
所以从我的例子来看,我需要的输出基本上是:
["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]
Run Code Online (Sandbox Code Playgroud)
订单并不重要.
hex*_*are 54
我发现这个Q/A非常有趣,因为它为同样的问题提供了几种不同的解决方案.我使用了所有这些函数并使用复杂的字典对象对其进行了测试.我不得不从测试中取出两个函数,因为它们必须有许多失败结果,并且它们不支持返回列表或dicts作为值,这是我认为必不可少的,因为应该为几乎任何数据准备函数.
所以我通过timeit模块在100.000次迭代中抽取其他函数,输出结果如下:
0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Run Code Online (Sandbox Code Playgroud)
所有函数都有相同的针搜索('logging')和相同的字典对象,其构造如下:
o = { 'temparature': '50',
'logging': {
'handlers': {
'console': {
'formatter': 'simple',
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
'level': 'DEBUG'
}
},
'loggers': {
'simpleExample': {
'handlers': ['console'],
'propagate': 'no',
'level': 'INFO'
},
'root': {
'handlers': ['console'],
'level': 'DEBUG'
}
},
'version': '1',
'formatters': {
'simple': {
'datefmt': "'%Y-%m-%d %H:%M:%S'",
'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
}
}
},
'treatment': {'second': 5, 'last': 4, 'first': 4},
'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}
Run Code Online (Sandbox Code Playgroud)
所有功能都提供相同的结果,但时间差异是戏剧性的!该函数gen_dict_extract(k,o)是我的函数改编自这里的函数,实际上它非常类似于findAlfe 的函数,主要区别在于,我检查给定对象是否具有iteritems函数,以防在递归期间传递字符串:
def gen_dict_extract(key, var):
if hasattr(var,'iteritems'):
for k, v in var.iteritems():
if k == key:
yield v
if isinstance(v, dict):
for result in gen_dict_extract(key, v):
yield result
elif isinstance(v, list):
for d in v:
for result in gen_dict_extract(key, d):
yield result
Run Code Online (Sandbox Code Playgroud)
所以这个变体是这里最快和最安全的功能.并且find_all_items速度非常缓慢,远远低于第二慢,get_recursivley而其余部分dict_extract彼此接近.功能fun和keyHole唯一的工作,如果你正在寻找的字符串.
这里有趣的学习方面:)
kev*_*kev 39
d = { "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz", "keyA" : "blah blah blah" },
{ "id" : "fghi", "keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf", "keyQ" : "blah blah" },
{ "id" : "yuiop", "keyW" : "blah" }] } ] }
def fun(d):
if 'id' in d:
yield d['id']
for k in d:
if isinstance(d[k], list):
for i in d[k]:
for j in fun(i):
yield j
Run Code Online (Sandbox Code Playgroud)
>>> list(fun(d))
['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']
Run Code Online (Sandbox Code Playgroud)
Ven*_*nga 17
pip install nested-lookup
正是您正在寻找的:
document = [ { 'taco' : 42 } , { 'salsa' : [ { 'burrito' : { 'taco' : 69 } } ] } ]
>>> print(nested_lookup('taco', document))
[42, 69]
Run Code Online (Sandbox Code Playgroud)
Alf*_*lfe 15
def find(key, value):
for k, v in value.iteritems():
if k == key:
yield v
elif isinstance(v, dict):
for result in find(key, v):
yield result
elif isinstance(v, list):
for d in v:
for result in find(key, d):
yield result
Run Code Online (Sandbox Code Playgroud)
ara*_*chi 11
d = { "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz", "keyA" : "blah blah blah" },
{ "id" : "fghi", "keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf", "keyQ" : "blah blah" },
{ "id" : "yuiop", "keyW" : "blah" }] } ] }
def findkeys(node, kv):
if isinstance(node, list):
for i in node:
for x in findkeys(i, kv):
yield x
elif isinstance(node, dict):
if kv in node:
yield node[kv]
for j in node.values():
for x in findkeys(j, kv):
yield x
print(list(findkeys(d, 'id')))
Run Code Online (Sandbox Code Playgroud)
此函数递归搜索包含嵌套字典和列表的字典。它构建一个名为 fields_found 的列表,其中包含每次找到该字段时的值。“字段”是我在字典及其嵌套列表和字典中寻找的键。
def get_recursively(search_dict, field):
"""Takes a dict with nested lists and dicts,
and searches all dicts for a key of the field
provided.
"""
fields_found = []
for key, value in search_dict.iteritems():
if key == field:
fields_found.append(value)
elif isinstance(value, dict):
results = get_recursively(value, field)
for result in results:
fields_found.append(result)
elif isinstance(value, list):
for item in value:
if isinstance(item, dict):
more_results = get_recursively(item, field)
for another_result in more_results:
fields_found.append(another_result)
return fields_found
Run Code Online (Sandbox Code Playgroud)
另一种变体,包括找到结果的嵌套路径(注意:此版本不考虑列表):
def find_all_items(obj, key, keys=None):
"""
Example of use:
d = {'a': 1, 'b': 2, 'c': {'a': 3, 'd': 4, 'e': {'a': 9, 'b': 3}, 'j': {'c': 4}}}
for k, v in find_all_items(d, 'a'):
print "* {} = {} *".format('->'.join(k), v)
"""
ret = []
if not keys:
keys = []
if key in obj:
out_keys = keys + [key]
ret.append((out_keys, obj[key]))
for k, v in obj.items():
if isinstance(v, dict):
found_items = find_all_items(v, key, keys=(keys+[k]))
ret += found_items
return ret
Run Code Online (Sandbox Code Playgroud)
我只想使用yield from并接受顶级列表来迭代@ hexerei-software的出色答案。
def gen_dict_extract(var, key):
if isinstance(var, dict):
for k, v in var.items():
if k == key:
yield v
if isinstance(v, (dict, list)):
yield from gen_dict_extract(v, key)
elif isinstance(var, list):
for d in var:
yield from gen_dict_extract(d, key)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
57444 次 |
| 最近记录: |