最好的策略可能取决于文件的确切程度.如果第一个文件可以放入内存中,那么您可以轻松地构建一组行,并file2从该集合中删除行.这要求内存量大致与大小成比例file1.
with open('file1') as f1:
lineset = set(f1)
with open('file2') as f2:
lineset.difference_update(f2)
with open('file3', 'w') as out:
for line in lineset:
out.write(line)
Run Code Online (Sandbox Code Playgroud)
请注意,此解决方案还将消除重复的行file1.