Nea*_*ers 4 python difflib delta
我一直想做一些类似的事情,就像我相信变更控制系统所做的那样,它们比较两个文件,并在每次文件更改时保存一个小的差异。我一直在阅读此页面:http : //docs.python.org/library/difflib.html,它显然并没有陷入我的脑海。
我试图在下面显示的一个简单程序中重新创建它,但是我似乎缺少的是Delta包含的内容至少与原始文件一样多,甚至更多。
不可能只进行纯粹的改变吗?我要求的原因很明显-节省磁盘空间。
我每次都可以保存整个代码块,但是最好先保存一次当前代码,然后再进行少量更改。
我还在尝试找出为什么许多difflib函数返回一个生成器而不是一个列表,那有什么好处?
difflib对我有用吗?还是我需要找到一个具有更多功能的更专业的软件包?
# Python Difflib demo
# Author: Neal Walters
# loosely based on http://ahlawat.net/wordpress/?p=371
# 01/17/2011
# build the files here - later we will just read the files probably
file1Contents="""
for j = 1 to 10:
print "ABC"
print "DEF"
print "HIJ"
print "JKL"
print "Hello World"
print "j=" + j
print "XYZ"
"""
file2Contents = """
for j = 1 to 10:
print "ABC"
print "DEF"
print "HIJ"
print "JKL"
print "Hello World"
print "XYZ"
print "The end"
"""
filename1 = "diff_file1.txt"
filename2 = "diff_file2.txt"
file1 = open(filename1,"w")
file2 = open(filename2,"w")
file1.write(file1Contents)
file2.write(file2Contents)
file1.close()
file2.close()
#end of file build
lines1 = open(filename1, "r").readlines()
lines2 = open(filename2, "r").readlines()
import difflib
print "\n FILE 1 \n"
for line in lines1:
print line
print "\n FILE 2 \n"
for line in lines2:
print line
diffSequence = difflib.ndiff(lines1, lines2)
print "\n ----- SHOW DIFF ----- \n"
for i, line in enumerate(diffSequence):
print line
diffObj = difflib.Differ()
deltaSequence = diffObj.compare(lines1, lines2)
deltaList = list(deltaSequence)
print "\n ----- SHOW DELTALIST ----- \n"
for i, line in enumerate(deltaList):
print line
#let's suppose we store just the diffSequence in the database
#then we want to take the current file (file2) and recreate the original (file1) from it
#by backward applying the diff
restoredFile1Lines = difflib.restore(diffSequence,1) # 1 indicates file1 of 2 used to create the diff
restoreFileList = list(restoredFile1Lines)
print "\n ----- SHOW REBUILD OF FILE1 ----- \n"
# this is not showing anything!
for i, line in enumerate(restoreFileList):
print line
Run Code Online (Sandbox Code Playgroud)
谢谢!
更新:
contextDiffSeq = difflib.context_diff(lines1, lines2)
contextDiffList = list(contextDiffSeq)
print "\n ----- SHOW CONTEXTDIFF ----- \n"
for i, line in enumerate(contextDiffList):
print line
Run Code Online (Sandbox Code Playgroud)
-----显示上下文差异-----
* 5,9 **
Run Code Online (Sandbox Code Playgroud)print "HIJ" print "JKL" print "Hello World"
打印“ j =” + j
打印“ XYZ”
-5,9 ----
Run Code Online (Sandbox Code Playgroud)print "HIJ" print "JKL" print "Hello World" print "XYZ"
- 打印“结束”
另一个更新:
在Panvalet(大型图书馆的管理员)源管理工具的早期,您可以创建一个变更集,如下所示:
++ADD 9
print "j=" + j
Run Code Online (Sandbox Code Playgroud)
这只是意味着在第9行之后添加一行。然后出现++ REPLACE或++ UPDATE之类的单词。 http://www4.hawaii.gov/dags/icsd/ppmo/Stds_Web_Pages/pdf/it110401.pdf
我还在尝试找出为什么许多difflib函数返回一个生成器而不是一个列表,那有什么好处?
好吧,请仔细考虑一下-如果您比较文件,那么这些文件在理论上(实际上将是实际的)可能会很大-将增量作为列表返回,例如,这意味着将完整的数据读取到内存中,即并非明智之举。
至于仅返回差值,那么,使用生成器还有另一个优点-只需迭代增量并保留您感兴趣的任何行。
如果您阅读了Differ-风格差异的difflib文档,则会看到一段内容:
Each line of a Differ delta begins with a two-letter code:
Code Meaning
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either input sequence
Run Code Online (Sandbox Code Playgroud)
因此,如果只需要差异,则可以使用str.startswith轻松地将其滤除
您还可以使用difflib.context_diff获得仅显示更改的紧凑增量。