kst*_*tis 6 python list case-sensitive duplicates duplicate-removal
我正在寻找一种方法从Python列表中删除重复的条目,但有一个扭曲; 最终列表必须区分大小写,并且首选大写单词.
例如,我cup和之间Cup只需要保持Cup而不是cup.与其他建议lower()首先使用的常见解决方案不同,我更喜欢在这里维护字符串的情况,特别是我更喜欢将大写字母保留在小写的字母上.
我再次尝试将此列表转为:
[Hello, hello, world, world, poland, Poland]
进入这个:
[Hello, world, Poland]
我该怎么办?
提前致谢.
这不保留顺序words,但它确实产生了一个"唯一"单词列表,其中首选大写单词.
In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]
In [35]: wordset = set(words)
In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']
Run Code Online (Sandbox Code Playgroud)
如果您希望保留订单显示words,那么您可以使用collections.OrderedDict:
In [43]: wordset = collections.OrderedDict()
In [44]: wordset = collections.OrderedDict.fromkeys(words)
In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
Run Code Online (Sandbox Code Playgroud)
使用set跟踪见过的词:
def uniq(words):
seen = set()
for word in words:
l = word.lower() # Use `word.casefold()` if possible. (3.3+)
if l in seen:
continue
seen.add(l)
yield word
Run Code Online (Sandbox Code Playgroud)
用法:
>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']
Run Code Online (Sandbox Code Playgroud)
UPDATE
以前的版本不会考虑大写优先于小写.在更新版本中,我使用了min@TheSoundDefense.
import collections
def uniq(words):
seen = collections.OrderedDict() # Use {} if the order is not important.
for word in words:
l = word.lower() # Use `word.casefold()` if possible (3.3+)
seen[l] = min(word, seen.get(l, word))
return seen.values()
Run Code Online (Sandbox Code Playgroud)