Jac*_*ges 3 python dictionary list duplicates python-2.7
我有一个按特定键排序的字典列表.每个字典包含32个元素,列表中有超过4000个字典.我需要代码来处理列表并返回一个删除了所有重复项的新列表.
这些链接的方法:
不要帮助我,因为字典是不可用的.
有什么想法吗?如果您需要更多信息,评论,我将添加信息.
编辑:
重复的字典可以是具有相同值的任何两个字典list[dictionary][key].
好的,这里是需要它的人的详细解释.
我有一个像这样的词典列表:
[ {
"ID" : "0001",
"Organization" : "SolarUSA",
"Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Timothy Black",
}, {
"ID" : "0002",
"Organization" : "SolarUSA",
"Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Johen Wilheim",
}, {
"ID" : "0003",
"Organization" : "Zapotec",
"Matchcode" : "Zapotec, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Simeon Yurrigan",
} ]
Run Code Online (Sandbox Code Playgroud)
在此列表中,第一个和第二个字典是重复的,因为它们Matchcodes是相同的.
现在,此列表按以下代码排序:
# sort_by is "Matchcode"
def sort( list_to_be_sorted, sort_by ):
return sorted(list_to_be_sorted, key=lambda k: k[sort_by])
Run Code Online (Sandbox Code Playgroud)
所以我有一个整齐的词典列表排序Matchcode.现在我只需要迭代列表,list[dictionary][key]在两个键值匹配时访问和删除重复项.
aba*_*ert 10
就像你可以使用a tuple来获得a的hashable等价物一样list,你可以使用a frozenset来获得a的hashable等价物dict.唯一的技巧是你需要传递d.items()而不是d构造函数.
>>> d = {'a': 1, 'b': 2}
>>> s = frozenset(d.items())
>>> hash(s)
-7588994739874264648
>>> dict(s) == d
True
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用您已经看过的最喜欢的解决方案.如果您需要保留订单等,请将它们转储到set或使用OrderedSet或unique_everseen配方.例如:
>>> unique_sets = set(frozenset(d.items()) for d in list_of_dicts)
>>> unique_dicts = [dict(s) for s in unique_sets]
Run Code Online (Sandbox Code Playgroud)
或者,保留订单并使用键值:
>>> sets = (frozenset(d.items()) for d in list_of_dicts)
>>> unique_sets = unique_everseen(sets, key=operator.itemgetter(key))
>>> unique_dicts = [dict(s) for s in unique_sets]
Run Code Online (Sandbox Code Playgroud)
当然,如果你有嵌套的列表或dicts,你必须递归转换,就像你对列表列表一样.
用于itertools.groupby()按键值对字典进行分组,然后从每个组中获取第一个项目.
import itertools
data =[ {
"ID" : "0001",
"Organization" : "SolarUSA",
"Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Timothy Black",
}, {
"ID" : "0002",
"Organization" : "SolarUSA",
"Matchcode" : "SolarUSA, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Johen Wilheim",
}, {
"ID" : "0003",
"Organization" : "Zapotec",
"Matchcode" : "Zapotec, Something Street, Somewhere State, Whatev Zip",
"Owner" : "Simeon Yurrigan",
} ]
print [g.next() for k,g in itertools.groupby(data, lambda x: x['Matchcode'])]
Run Code Online (Sandbox Code Playgroud)
给出结果
[{'Owner': 'Timothy Black',
'Organization': 'SolarUSA',
'ID': '0001',
'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip'},
{'Owner': 'Simeon Yurrigan',
'Organization': 'Zapotec',
'ID': '0003',
'Matchcode':'Zapotec, Something Street, Somewhere State, Whatev Zip'}]
Run Code Online (Sandbox Code Playgroud)
我相信这就是你要找的东西.
编辑:我更喜欢unique_justseen解决方案.它更短,更具描述性.