我有几本字典(可能有十多本),其结构如下:
{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
Run Code Online (Sandbox Code Playgroud)
我想将所有这些字典结合起来,添加“count”键的整数和相同的“foo”、“bar”和“host”键(没有一个是 NoneType)
例如,对于 2 个字典
dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
Run Code Online (Sandbox Code Playgroud)
那么合并后的版本应该是
dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
'stderr': ''}
Run Code Online (Sandbox Code Playgroud)
请注意,列表中字典元素的顺序在“count”求和后发生了变化。
我尝试将它们组合为相同的“主机”作为第一步,如下所示,但它与我想要的不同:
hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value + b[key] for key, value in a.items()}
Run Code Online (Sandbox Code Playgroud)
from collections import defaultdict\nfrom operator import itemgetter\n\n# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list\ngroups = defaultdict(list, {(d[\'foo\'], d[\'bar\'], d[\'host\']): [d] for d in dictB[\'stdout\']})\nfor d in dictA["stdout"]:\n key = (d[\'foo\'], d[\'bar\'], d[\'host\'])\n groups[key].append(d)\n\n# use item getter for better readability\ncount = itemgetter("count")\n\n# create new list of dictionaries, sum the count values\nds = [{\'foo\': f, \'bar\': b, \'host\': h, \'count\': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]\n\n# sort the list of dictionaries in decreasing order \nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\nprint(res)\nRun Code Online (Sandbox Code Playgroud)\n输出
\n{\'stderr\': \'\',\n \'stdout\': [{\'bar\': \'B\', \'count\': 415, \'foo\': \'A\', \'host\': None},\n {\'bar\': \'B\', \'count\': 46, \'foo\': \'A\', \'host\': \'orange\'},\n {\'bar\': \'B\', \'count\': 28, \'foo\': \'C\', \'host\': \'egg\'},\n {\'bar\': \'E\', \'count\': 4, \'foo\': \'D\', \'host\': \'apple\'},\n {\'bar\': \'E\', \'count\': 3, \'foo\': \'A\', \'host\': \'pineapple\'},\n {\'bar\': \'F\', \'count\': 2, \'foo\': \'C\', \'host\': \'carrot\'},\n {\'bar\': \'E\', \'count\': 1, \'foo\': \'A\', \'host\': \'chicken breast\'}]}\nRun Code Online (Sandbox Code Playgroud)\n有关上面代码中使用的每个函数和数据结构的更多信息,请参阅:sorted和defaultdictitemgetter
使用groupby:
import pprint\nfrom operator import itemgetter\nfrom itertools import groupby\n\n\ndef key(d):\n return d["foo"], d["bar"], d["host"] or ""\n\n\ncount = itemgetter("count")\nlst = sorted(dictA["stdout"] + dictB["stdout"], key=key)\ngroups = groupby(lst, key=key)\nds = [{\'foo\': f, \'bar\': b, \'host\': h or None, \'count\': sum(count(d) for d in vs)} for (f, b, h), vs in groups]\nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\nprint(res)\nRun Code Online (Sandbox Code Playgroud)\n第二种方法有两个注意事项:
\nO(nlogn)第一个O(n)None用空字符串替换""。如果您有多个词典,您可以将第一种方法更改为:
\n# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list\ngroups = defaultdict(list, {(d[\'foo\'], d[\'bar\'], d[\'host\']): [d] for d in dictB[\'stdout\']})\n\n# create a list with all the dictionaries from multiple dict\ndata = []\nlst = [dictA] # change this line to contain all the dictionaries except B\nfor d in lst:\n data.extend(d["stdout"])\n\nfor d in data:\n key = (d[\'foo\'], d[\'bar\'], d[\'host\'])\n groups[key].append(d)\n\n# use item getter for better readability\ncount = itemgetter("count")\n\n# create new list of dictionaries, sum the count values\nds = [{\'foo\': f, \'bar\': b, \'host\': h, \'count\': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]\n\n# sort the list of dictionaries in decreasing order\nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\nRun Code Online (Sandbox Code Playgroud)\nitemgetter?从文档中:
\n\n\n返回一个可调用对象,该对象使用 \noperand\xe2\x80\x99s getitem () 方法从其操作数中获取项目。如果指定了多个项目,\n则返回查找值的元组。
\n
相当于:
\ndef itemgetter(*items):\n if len(items) == 1:\n item = items[0]\n def g(obj):\n return obj[item]\n else:\n def g(obj):\n return tuple(obj[item] for item in items)\n return g\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
61 次 |
| 最近记录: |