我有一个元组列表,不幸的是包含重复项,如下所示:
[(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c'), (64, u'a-b-c-ca-d-ab-ea-d-c-c'), (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (61, u'top-coldestcitiesinamerica'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion'), (43, u'x_file')]
Run Code Online (Sandbox Code Playgroud)
问题是元组的第一个元素(基于0的排序)是我想要检查重复的条目.所以,我可以看到:
(67, u'top-coldestcitiesinamerica')
(61, u'top-coldestcitiesinamerica')
Run Code Online (Sandbox Code Playgroud)
..重复,我想删除其中一个(类似于a set).所以,最后,我想有一个干净的元组列表,没有像这样的重复项(即元组的第一个元素没有重复):
[(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c') (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion')]
Run Code Online (Sandbox Code Playgroud)
我怎样才能以pythonic方式实现这一目标?谢谢!
您可以使用以下set方法从保存顺序中删除列表中的重复项吗?,使用x[1]唯一标识符:
def unique_second_element(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x[1] in seen or seen_add(x[1]))]
Run Code Online (Sandbox Code Playgroud)
请注意,OrderedDict如果您想保留最后一次出现,也会显示所示的方法; 对于第一次出现,您必须反转输入然后再次反转输出.
您可以通过支持key函数使其更加通用:
def unique_preserve_order(seq, key=None):
if key is None:
key = lambda elem: elem
seen = set()
seen_add = seen.add
augmented = ((key(x), x) for x in seq)
return [x for k, x in augmented if not (k in seen or seen_add(k))]
Run Code Online (Sandbox Code Playgroud)
然后用
import operator
unique_preserve_order(yourlist, key=operator.itemgetter(1))
Run Code Online (Sandbox Code Playgroud)
演示:
>>> def unique_preserve_order(seq, key=None):
... if key is None:
... key = lambda elem: elem
... seen = set()
... seen_add = seen.add
... augmented = ((key(x), x) for x in seq)
... return [x for k, x in augmented if not (k in seen or seen_add(k))]
...
>>> from pprint import pprint
>>> import operator
>>> yourlist = [(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c'), (64, u'a-b-c-ca-d-ab-ea-d-c-c'), (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (61, u'top-coldestcitiesinamerica'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion'), (43, u'x_file')]
>>> pprint(unique_preserve_order(yourlist, operator.itemgetter(1)))
[(67, u'top-coldestcitiesinamerica'),
(66, u'ecofriendlyideastocelebrateindependenceday-phpapp'),
(65, u'a-b-c-ca-d-ab-ea-d-c-c'),
(63,
u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'),
(62, u'ghgemissions'),
(58, u'infographicthe-stateofdigitaltransformationaltimetergroup'),
(57, u'culture'),
(55, u'cas-k-ihaveanidea'),
(54, u'trendsfor'),
(53, u'batteryimpedance'),
(52, u'evs-howey-full'),
(51, u'bericht'),
(49, u'classiccarinsurance'),
(47, u'uploaded_file'),
(46, u'x_file'),
(45, u's-s-main'),
(44, u'vehicle-propulsion')]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
75 次 |
| 最近记录: |