使用常用键值对对字典列表中的值求和

c.g*_*rey 10 python dictionary nested-lists

如何对字典列表中的重复元素求和?

样品清单:

data = [
        [
            {'user': 1, 'rating': 0},
            {'user': 2, 'rating': 10},
            {'user': 1, 'rating': 20},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 20},
            {'user': 1, 'rating': 10}
        ],
    ]
Run Code Online (Sandbox Code Playgroud)

预期输出:

op = [
        [
            {'user': 1, 'rating': 20},
            {'user': 2, 'rating': 10},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 30},
        ],
    ]
Run Code Online (Sandbox Code Playgroud)

tim*_*geb 5

pandas

>>> import pandas as pd
>>> [pd.DataFrame(dicts).groupby('user', as_index=False, sort=False).sum().to_dict(orient='records') for dicts in data]
[[{'user': 1, 'rating': 20},
  {'user': 2, 'rating': 10},
  {'user': 3, 'rating': 10}],
 [{'user': 4, 'rating': 4},
  {'user': 2, 'rating': 80},
  {'user': 1, 'rating': 30}]]
Run Code Online (Sandbox Code Playgroud)


Shu*_*rma 4

你可以试试:

\n\n
from itertools import groupby\n\nresult = []\nfor lst in data:\n    sublist = sorted(lst, key=lambda d: d['user'])\n    grouped = groupby(sublist, key=lambda d: d['user'])\n    result.append([\n        {'user': name, 'rating': sum([d['rating'] for d in group])}\n        for name, group in grouped])\n\n# Sort the `result` `rating` wise:\nresult = [sorted(sub, key=lambda d: d['rating']) for sub in result]\n\n# %%timeit\n# 7.54 \xc2\xb5s \xc2\xb1 220 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n\n
\n\n

更新(更有效的解决方案):

\n\n
result = []\nfor lst in data:\n    visited = {}\n    for d in lst:\n        if d['user'] in  visited:\n            visited[d['user']]['rating'] += d['rating'] \n        else:\n            visited[d['user']] = d\n\n    result.append(sorted(visited.values(), key=lambda d: d['rating']))\n\n# %% timeit\n# 2.5 \xc2\xb5s \xc2\xb1 54 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n\n

结果:

\n\n
# print(result)\n[\n    [\n        {'user': 2, 'rating': 10},\n        {'user': 3, 'rating': 10},\n        {'user': 1, 'rating': 20}\n    ],\n    [\n        {'user': 4, 'rating': 4},\n        {'user': 1, 'rating': 30},\n        {'user': 2, 'rating': 80}\n    ]\n]\n
Run Code Online (Sandbox Code Playgroud)\n