我在Python中有一个字典列表,如下所示:
d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...]
Run Code Online (Sandbox Code Playgroud)
我想实现的是保持feature_a,_b和_c独特的.
例如,如果我们有3项具有相同feature_a和_b,但有3个不同的值feature_c 100,100,150,则操作之后,它应该是100和150.
我怎样才能做到这一点?
================================================== ==============更新:
好的,感谢Anand的出色答案,它完美无缺.但是,我还有一个问题.
假设我们有一个新的feature_d,字典看起来像:
d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...]
Run Code Online (Sandbox Code Playgroud)
我只想重复数据删除feature_a,_b并且_c,但是离开feature_d了.我怎样才能做到这一点?
非常感谢.
如果初始d列表的顺序不重要,您可以使用.items()每个字典并将其转换为frozenset()可清除的字典,然后您可以将整个事物转换为set()或frozenset(),然后将每个frozenset()字典转换回字典.示例 -
uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
Run Code Online (Sandbox Code Playgroud)
sets()不允许重复的元素.虽然你最终会失去列表的顺序.对于Python 2.x,list(...)不需要,因为map()返回一个列表.
示例/演示 -
>>> import pprint
>>> pprint.pprint(d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}]
>>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
>>> pprint.pprint(uniq_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}]
Run Code Online (Sandbox Code Playgroud)
对于新的要求 -
但是,如果我有另一个feature_d但我只想重复删除feature_a,_b和_c
如果两个条目具有相同的feature_a,_b和_c,则它们被认为是相同且重复的,无论feature_d中的内容是什么
一种简单的方法是使用集合和新列表,仅添加集合所需的功能,并仅使用所需的功能进行检查.示例 -
seen_set = set()
new_d = []
for i in d:
if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
new_d.append(i)
seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
Run Code Online (Sandbox Code Playgroud)
示例/演示 -
>>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'},
... {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}]
>>> seen_set = set()
>>> new_d = []
>>> for i in d:
... if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
... new_d.append(i)
... seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
...
>>> pprint.pprint(new_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}]
Run Code Online (Sandbox Code Playgroud)