Kau*_*hik 3 python string duplicates
我想从文本文件中删除重复的单词.
我有一些文本文件包含如下所示:
None_None
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
None_None
ColumnConverter_56963312
ColumnConverter_56963312
PredicatesFactory_56963424
PredicatesFactory_56963424
PredicateConverter_56963648
PredicateConverter_56963648
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
Run Code Online (Sandbox Code Playgroud)
结果输出需要是:
None_None
ConfigHandler_56663624
ColumnConverter_56963312
PredicatesFactory_56963424
PredicateConverter_56963648
ConfigHandler_80134888
Run Code Online (Sandbox Code Playgroud)
我只使用了这个命令:en = set(open('file.txt')但它不起作用.
任何人都可以帮助我如何从文件中只提取唯一的集合
谢谢
这是关于保留顺序的选项(与集合不同),但仍然具有相同的行为(请注意,EOL 字符被故意删除,空白行被忽略)...
from collections import OrderedDict
with open('/home/jon/testdata.txt') as fin:
lines = (line.rstrip() for line in fin)
unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )
print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']
Run Code Online (Sandbox Code Playgroud)
然后你只需要将上面的内容写入你的输出文件。
这是一个简单的解决方案,使用集合从文本文件中删除重复项.
lines = open('workfile.txt', 'r').readlines()
lines_set = set(lines)
out = open('workfile.txt', 'w')
for line in lines_set:
out.write(line)
Run Code Online (Sandbox Code Playgroud)