Mik*_*keT 1 python list duplicates
我有以下python代码几乎适合我(我很接近!).我有一个正在开放的莎士比亚戏剧的文本文件:原始文本文件:
"但通过那个窗户打破了光线
它是东部,朱丽叶是太阳
太阳公平,杀死羡慕的月亮
谁已经病了,脸色苍白悲伤"
我给我的代码的结果是这样的:
['升起','但','它','朱丽叶','谁','已经','和','和','和','打破','东','羡慕','公平','悲伤','是','是','是','杀','轻','月亮','苍白','病态','软','太阳','太阳' ,'the','the','the','through','what','window','with','yonder']
所以这几乎就是我想要的:它已经按照我想要的方式排列在列表中,但是如何删除重复的单词呢?我正在尝试创建一个新的ResultsList并将单词附加到它,但它给了我上面的结果,而没有删除重复的单词.如果我"打印结果列表",它只会丢弃大量的单词.我现在的方式已接近,但我想摆脱额外的"和","是","太阳"和"'s"....我想保持简单并使用append(),但我不知道如何才能让它发挥作用.我不想对代码做任何疯狂的事情.为了删除重复的单词,我在代码中遗漏了哪些简单的东西?
fname = raw_input("Enter file name: ")
fhand = open(fname)
NewList = list() #create new list
ResultList = list() #create new results list I want to append words to
for line in fhand:
line.rstrip() #strip white space
words = line.split() #split lines of words and make list
NewList.extend(words) #make the list from 4 lists to 1 list
for word in line.split(): #for each word in line.split()
if words not in line.split(): #if a word isn't in line.split
NewList.sort() #sort it
ResultList.append(words) #append it, but this doesn't work.
print NewList
#print ResultList (doesn't work the way I want it to)
Run Code Online (Sandbox Code Playgroud)
mylist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(set(mylist), key=lambda x:mylist.index(x))
print(newlist)
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
Run Code Online (Sandbox Code Playgroud)
newlist包含一组唯一值的列表mylist,按每个项目的索引排序mylist.