从文本文件中删除重复项

Kau*_*hik 3 python string duplicates

我想从文本文件中删除重复的单词.

我有一些文本文件包含如下所示:

None_None

ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624

None_None

ColumnConverter_56963312
ColumnConverter_56963312

PredicatesFactory_56963424
PredicatesFactory_56963424

PredicateConverter_56963648
PredicateConverter_56963648

ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
Run Code Online (Sandbox Code Playgroud)

结果输出需要是:

None_None

ConfigHandler_56663624

ColumnConverter_56963312

PredicatesFactory_56963424

PredicateConverter_56963648

ConfigHandler_80134888
Run Code Online (Sandbox Code Playgroud)

我只使用了这个命令:en = set(open('file.txt')但它不起作用.

任何人都可以帮助我如何从文件中只提取唯一的集合

谢谢

Jon*_*nts 6

这是关于保留顺序的选项(与集合不同),但仍然具有相同的行为(请注意,EOL 字符被故意删除,空白行被忽略)...

from collections import OrderedDict

with open('/home/jon/testdata.txt') as fin:
    lines = (line.rstrip() for line in fin)
    unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )

print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']
Run Code Online (Sandbox Code Playgroud)

然后你只需要将上面的内容写入你的输出文件。


Stu*_*rey 6

这是一个简单的解决方案,使用集合从文本文件中删除重复项.

lines = open('workfile.txt', 'r').readlines()

lines_set = set(lines)

out  = open('workfile.txt', 'w')

for line in lines_set:
    out.write(line)
Run Code Online (Sandbox Code Playgroud)