我正在努力学习python.以下是练习的相关部分:
对于每个单词,检查单词是否已在列表中.如果单词不在列表中,请将其添加到列表中.
这就是我所拥有的.
fhand = open('romeo.txt')
output = []
for line in fhand:
words = line.split()
for word in words:
if word is not output:
output.append(word)
print sorted(output)
Run Code Online (Sandbox Code Playgroud)
这是我得到的.
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
注意重复(并且,是,太阳等).
我如何只获得唯一值?
Ton*_*ous 46
要消除列表中的重复项,您可以维护辅助列表并进行检查.
myList = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and',
'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light',
'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the',
'through', 'what', 'window', 'with', 'yonder']
auxiliaryList = []
for word in myList:
if word not in auxiliaryList:
auxiliaryList.append(word)
Run Code Online (Sandbox Code Playgroud)
输出:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east',
'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick',
'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
Run Code Online (Sandbox Code Playgroud)
这很容易理解,代码是自我解释的.然而,代码简单性以代码效率为代价,因为对增长列表的线性扫描使得线性算法降级为二次方.
使用set() !, set是一个没有重复元素的无序集合.
基本用途包括成员资格测试和消除重复条目.
auxiliaryList = list(set(myList))
Run Code Online (Sandbox Code Playgroud)
输出:
['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder',
'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks',
'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet']
Run Code Online (Sandbox Code Playgroud)
fal*_*tru 11
is not
您应该使用not in
运算符来检查项目是否在列表中,而不是运算符:
if word not in output:
Run Code Online (Sandbox Code Playgroud)
with open('romeo.txt') as fhand:
output = set()
for line in fhand:
words = line.split()
output.update(words)
Run Code Online (Sandbox Code Playgroud)
UPDATE将set
不会保留原来的顺序.要保留订单,请将该集用作辅助数据结构:
output = []
seen = set()
with open('romeo.txt') as fhand:
for line in fhand:
words = line.split()
for word in words:
if word not in seen: # faster than `word not in output`
seen.add(word)
output.append(word)
Run Code Online (Sandbox Code Playgroud)
这是一个“one-liner”,它使用这种删除重复项同时保留顺序的实现:
def unique(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
output = unique([word for line in fhand for word in line.split()])
Run Code Online (Sandbox Code Playgroud)
最后一行展平fhand
为单词列表,然后调用unique()
结果列表。
一种方法是在添加之前查看它是否在列表中,这就是 Tony 的答案。如果要在创建列表后删除重复值,可以使用set()
将现有列表转换为一组唯一值,然后使用list()
将其再次转换为列表。全部在一行中:
list(set(output))
Run Code Online (Sandbox Code Playgroud)
如果要按字母顺序排序,只需sorted()
在上面添加一个。结果如下:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']