合并列表中包含字典的多个字典

Lud*_*cia 2 python dictionary

我有几本字典(可能有十多本),其结构如下:

{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}
Run Code Online (Sandbox Code Playgroud)

我想将所有这些字典结合起来,添加“count”键的整数和相同的“foo”、“bar”和“host”键(没有一个是 NoneType)

例如,对于 2 个字典

dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}
Run Code Online (Sandbox Code Playgroud)

那么合并后的版本应该是

dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
 'stderr': ''}
Run Code Online (Sandbox Code Playgroud)

请注意,列表中字典元素的顺序在“count”求和后发生了变化。

我尝试将它们组合为相同的“主机”作为第一步,如下所示,但它与我想要的不同:

hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value + b[key] for key, value in a.items()}
Run Code Online (Sandbox Code Playgroud)

Dan*_*ejo 5

一种方法

\n
from collections import defaultdict\nfrom operator import itemgetter\n\n# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list\ngroups = defaultdict(list, {(d[\'foo\'], d[\'bar\'], d[\'host\']): [d] for d in dictB[\'stdout\']})\nfor d in dictA["stdout"]:\n    key = (d[\'foo\'], d[\'bar\'], d[\'host\'])\n    groups[key].append(d)\n\n# use item getter for better readability\ncount = itemgetter("count")\n\n# create new list of dictionaries, sum the count values\nds = [{\'foo\': f, \'bar\': b, \'host\': h, \'count\': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]\n\n# sort the list of dictionaries in decreasing order \nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\nprint(res)\n
Run Code Online (Sandbox Code Playgroud)\n

输出

\n
{\'stderr\': \'\',\n \'stdout\': [{\'bar\': \'B\', \'count\': 415, \'foo\': \'A\', \'host\': None},\n            {\'bar\': \'B\', \'count\': 46, \'foo\': \'A\', \'host\': \'orange\'},\n            {\'bar\': \'B\', \'count\': 28, \'foo\': \'C\', \'host\': \'egg\'},\n            {\'bar\': \'E\', \'count\': 4, \'foo\': \'D\', \'host\': \'apple\'},\n            {\'bar\': \'E\', \'count\': 3, \'foo\': \'A\', \'host\': \'pineapple\'},\n            {\'bar\': \'F\', \'count\': 2, \'foo\': \'C\', \'host\': \'carrot\'},\n            {\'bar\': \'E\', \'count\': 1, \'foo\': \'A\', \'host\': \'chicken breast\'}]}\n
Run Code Online (Sandbox Code Playgroud)\n

有关上面代码中使用的每个函数和数据结构的更多信息,请参阅:sorteddefaultdictitemgetter

\n

一种替代方案

\n

使用groupby

\n
import pprint\nfrom operator import itemgetter\nfrom itertools import groupby\n\n\ndef key(d):\n    return d["foo"], d["bar"], d["host"] or ""\n\n\ncount = itemgetter("count")\nlst = sorted(dictA["stdout"] + dictB["stdout"], key=key)\ngroups = groupby(lst, key=key)\nds = [{\'foo\': f, \'bar\': b, \'host\': h or None, \'count\': sum(count(d) for d in vs)} for (f, b, h), vs in groups]\nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\nprint(res)\n
Run Code Online (Sandbox Code Playgroud)\n

第二种方法有两个注意事项:

\n
    \n
  1. 时间复杂度是O(nlogn)第一个O(n)
  2. \n
  3. 为了对字典列表进行排序,需要None用空字符串替换""
  4. \n
\n

多个词典

\n

如果您有多个词典,您可以将第一种方法更改为:

\n
# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list\ngroups = defaultdict(list, {(d[\'foo\'], d[\'bar\'], d[\'host\']): [d] for d in dictB[\'stdout\']})\n\n# create a list with all the dictionaries from multiple dict\ndata = []\nlst = [dictA]  # change this line to contain all the dictionaries except B\nfor d in lst:\n    data.extend(d["stdout"])\n\nfor d in data:\n    key = (d[\'foo\'], d[\'bar\'], d[\'host\'])\n    groups[key].append(d)\n\n# use item getter for better readability\ncount = itemgetter("count")\n\n# create new list of dictionaries, sum the count values\nds = [{\'foo\': f, \'bar\': b, \'host\': h, \'count\': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]\n\n# sort the list of dictionaries in decreasing order\nres = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}\n
Run Code Online (Sandbox Code Playgroud)\n

什么是itemgetter

\n

从文档中:

\n
\n

返回一个可调用对象,该对象使用 \noperand\xe2\x80\x99s getitem () 方法从其操作数中获取项目。如果指定了多个项目,\n则返回查找值的元组。

\n
\n

相当于:

\n
def itemgetter(*items):\n    if len(items) == 1:\n        item = items[0]\n        def g(obj):\n            return obj[item]\n    else:\n        def g(obj):\n            return tuple(obj[item] for item in items)\n    return g\n
Run Code Online (Sandbox Code Playgroud)\n