如何在保留区分大小写的同时消除Python中的重复列表条目?

kst*_*tis 6 python list case-sensitive duplicates duplicate-removal

我正在寻找一种方法从Python列表中删除重复的条目,但有一个扭曲; 最终列表必须区分大小写,并且首选大写单词.

例如,我cup和之间Cup只需要保持Cup而不是cup.与其他建议lower()首先使用的常见解决方案不同,我更喜欢在这里维护字符串的情况,特别是我更喜欢将大写字母保留在小写的字母上.

我再次尝试将此列表转为: [Hello, hello, world, world, poland, Poland]

进入这个:

[Hello, world, Poland]

我该怎么办?

提前致谢.

unu*_*tbu 7

这不保留顺序words,但它确实产生了一个"唯一"单词列表,其中首选大写单词.

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']
Run Code Online (Sandbox Code Playgroud)

如果您希望保留订单显示words,那么您可以使用collections.OrderedDict:

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
Run Code Online (Sandbox Code Playgroud)


fal*_*tru 6

使用set跟踪见过的词:

def uniq(words):
    seen = set()
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible. (3.3+)
        if l in seen:
            continue
        seen.add(l)
        yield word
Run Code Online (Sandbox Code Playgroud)

用法:

>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']
Run Code Online (Sandbox Code Playgroud)

UPDATE

以前的版本不会考虑大写优先于小写.在更新版本中,我使用了min@TheSoundDefense.

import collections

def uniq(words):
    seen = collections.OrderedDict()  # Use {} if the order is not important.
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible (3.3+)
        seen[l] = min(word, seen.get(l, word))
    return seen.values()
Run Code Online (Sandbox Code Playgroud)