如何从Python中的列表中删除重复的词典?

Jac*_*ges 3 python dictionary list duplicates python-2.7

我有一个按特定键排序的字典列表.每个字典包含32个元素,列表中有超过4000个字典.我需要代码来处理列表并返回一个删除了所有重复项的新列表.

这些链接的方法:

不要帮助我,因为字典是不可用的.

有什么想法吗?如果您需要更多信息,评论,我将添加信息.

编辑:

重复的字典可以是具有相同值的任何两个字典list[dictionary][key].


好的,这里是需要它的人的详细解释.

我有一个像这样的词典列表:

[ {
    "ID" : "0001",
    "Organization" : "SolarUSA",
    "Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Timothy Black",
   }, {
    "ID" : "0002",
    "Organization" : "SolarUSA",
    "Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Johen Wilheim",
   }, {
    "ID" : "0003",
    "Organization" : "Zapotec",
    "Matchcode" : "Zapotec, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Simeon Yurrigan",
   } ]
Run Code Online (Sandbox Code Playgroud)

在此列表中,第一个和第二个字典是重复的,因为它们Matchcodes是相同的.

现在,此列表按以下代码排序:

# sort_by is "Matchcode"
def sort( list_to_be_sorted, sort_by ):
    return sorted(list_to_be_sorted, key=lambda k: k[sort_by])
Run Code Online (Sandbox Code Playgroud)

所以我有一个整齐的词典列表排序Matchcode.现在我只需要迭代列表,list[dictionary][key]在两个键值匹配时访问和删除重复项.

aba*_*ert 10

就像你可以使用a tuple来获得a的hashable等价物一样list,你可以使用a frozenset来获得a的hashable等价物dict.唯一的技巧是你需要传递d.items()而不是d构造函数.

>>> d = {'a': 1, 'b': 2}
>>> s = frozenset(d.items())
>>> hash(s)
-7588994739874264648
>>> dict(s) == d
True
Run Code Online (Sandbox Code Playgroud)

然后,您可以使用您已经看过的最喜欢的解决方案.如果您需要保留订单等,请将它们转储到set或使用OrderedSetunique_everseen配方.例如:

>>> unique_sets = set(frozenset(d.items()) for d in list_of_dicts)
>>> unique_dicts = [dict(s) for s in unique_sets]
Run Code Online (Sandbox Code Playgroud)

或者,保留订单并使用键值:

>>> sets = (frozenset(d.items()) for d in list_of_dicts)
>>> unique_sets = unique_everseen(sets, key=operator.itemgetter(key))
>>> unique_dicts = [dict(s) for s in unique_sets]
Run Code Online (Sandbox Code Playgroud)

当然,如果你有嵌套的列表或dicts,你必须递归转换,就像你对列表列表一样.


Mar*_*lin 6

用于itertools.groupby()按键值对字典进行分组,然后从每个组中获取第一个项目.

import itertools

data =[ {
    "ID" : "0001",
    "Organization" : "SolarUSA",
    "Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Timothy Black",
   }, {
    "ID" : "0002",
    "Organization" : "SolarUSA",
    "Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Johen Wilheim",
   }, {
    "ID" : "0003",
    "Organization" : "Zapotec",
    "Matchcode" : "Zapotec, Something Street, Somewhere State, Whatev Zip",
    "Owner" : "Simeon Yurrigan",
   } ]


print [g.next() for k,g in itertools.groupby(data, lambda x: x['Matchcode'])]
Run Code Online (Sandbox Code Playgroud)

给出结果

[{'Owner': 'Timothy Black',  
  'Organization': 'SolarUSA', 
  'ID': '0001',  
  'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip'},

 {'Owner': 'Simeon Yurrigan', 
  'Organization': 'Zapotec', 
  'ID': '0003', 
  'Matchcode':'Zapotec, Something Street, Somewhere State, Whatev Zip'}]
Run Code Online (Sandbox Code Playgroud)

我相信这就是你要找的东西.

编辑:我更喜欢unique_justseen解决方案.它更短,更具描述性.