13 python text-files
我正在尝试创建一个简单的程序,从文件中删除重复的行.但是,我被卡住了.我的目标是最终删除除1个重复行之外的所有行,与建议的副本不同.所以,我仍然有这些数据.我也想这样做,它采用相同的文件名并输出相同的文件名.当我试图使文件名都相同时,它只输出一个空文件.
input_file = "input.txt"
output_file = "input.txt"
seen_lines = set()
outfile = open(output_file, "w")
for line in open(input_file, "r"):
if line not in seen_lines:
outfile.write(line)
seen_lines.add(line)
outfile.close()
Run Code Online (Sandbox Code Playgroud)
input.txt中
I really love christmas
Keep the change ya filthy animal
Pizza is my fav food
Keep the change ya filthy animal
Did someone say peanut butter?
Did someone say peanut butter?
Keep the change ya filthy animal
Run Code Online (Sandbox Code Playgroud)
预期产出
I really love christmas
Keep the change ya filthy animal
Pizza is my fav food
Did someone say peanut butter?
Run Code Online (Sandbox Code Playgroud)
outfile = open(output_file, "w")无论您做什么,该行都会截断您的文件.后面的读取将找到一个空文件.我建议安全地使用临时文件:
这比读取和写入两次打开文件要强大得多.如果出现任何问题,您将拥有迄今为止所做的原始和所做的任何工作.如果在此过程中出现任何问题,您当前的方法可能会弄乱您的文件.
这是一个使用的示例tempfile.NamedTemporaryFile,以及一个with块,以确保所有内容都正确关闭,即使出现错误:
from tempfile import NamedTemporaryFile
from shutil import move
input_file = "input.txt"
output_file = "input.txt"
seen_lines = set()
with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:
for line in open(input_file, "r"):
sline = line.rstrip('\n')
if sline not in seen_lines:
output.write(line)
seen_lines.add(sline)
move(output.name, output_file)
Run Code Online (Sandbox Code Playgroud)
将move在年底将正常工作,即使在输入和输出名称是相同的,因为output.name保证是一些来自不同.
另请注意,我正在从集合中的每一行剥离换行符,因为最后一行可能没有.
Alt解决方案
如果您不关心行的顺序,可以通过直接在内存中执行所有操作来简化过程:
input_file = "input.txt"
output_file = "input.txt"
with open(input_file) as input:
unique = set(line.rstrip('\n') for line in input)
with open(output_file, 'w') as output:
for line in unique:
output.write(line)
output.write('\n')
Run Code Online (Sandbox Code Playgroud)
你可以比较一下
with open(input_file) as input:
unique = set(line.rstrip('\n') for line in input.readlines())
with open(output_file, 'w') as output:
output.write('\n'.join(unique))
Run Code Online (Sandbox Code Playgroud)
第二个版本完全相同,但一次加载和写入.
| 归档时间: |
|
| 查看次数: |
555 次 |
| 最近记录: |