Python Difflib增量和比较Ndiff

Nea*_*ers 4 python difflib delta

我一直想做一些类似的事情,就像我相信变更控制系统所做的那样,它们比较两个文件,并在每次文件更改时保存一个小的差异。我一直在阅读此页面:http : //docs.python.org/library/difflib.html,它显然并没有陷入我的脑海。

我试图在下面显示的一个简单程序中重新创建它,但是我似乎缺少的是Delta包含的内容至少与原始文件一样多,甚至更多。

不可能只进行纯粹的改变吗?我要求的原因很明显-节省磁盘空间。
我每次都可以保存整个代码块,但是最好先保存一次当前代码,然后再进行少量更改。

我还在尝试找出为什么许多difflib函数返回一个生成器而不是一个列表,那有什么好处?

difflib对我有用吗?还是我需要找到一个具有更多功能的更专业的软件包?

# Python Difflib demo 
# Author: Neal Walters 
# loosely based on http://ahlawat.net/wordpress/?p=371
# 01/17/2011 

# build the files here - later we will just read the files probably 
file1Contents="""
for j = 1 to 10: 
   print "ABC"
   print "DEF" 
   print "HIJ"
   print "JKL"
   print "Hello World"
   print "j=" + j 
   print "XYZ"
"""

file2Contents = """
for j = 1 to 10: 
   print "ABC"
   print "DEF" 
   print "HIJ"
   print "JKL"
   print "Hello World"
   print "XYZ"
print "The end"
"""

filename1 = "diff_file1.txt" 
filename2 = "diff_file2.txt" 

file1 = open(filename1,"w") 
file2 = open(filename2,"w") 

file1.write(file1Contents) 
file2.write(file2Contents) 

file1.close()
file2.close() 
#end of file build 

lines1 = open(filename1, "r").readlines()
lines2 = open(filename2, "r").readlines()

import difflib

print "\n FILE 1 \n" 
for line in lines1:
  print line 

print "\n FILE 2 \n" 
for line in lines2: 
  print line 

diffSequence = difflib.ndiff(lines1, lines2) 

print "\n ----- SHOW DIFF ----- \n" 
for i, line in enumerate(diffSequence):
    print line

diffObj = difflib.Differ() 
deltaSequence = diffObj.compare(lines1, lines2) 
deltaList = list(deltaSequence) 

print "\n ----- SHOW DELTALIST ----- \n" 
for i, line in enumerate(deltaList):
    print line



#let's suppose we store just the diffSequence in the database 
#then we want to take the current file (file2) and recreate the original (file1) from it
#by backward applying the diff 

restoredFile1Lines = difflib.restore(diffSequence,1)  # 1 indicates file1 of 2 used to create the diff 

restoreFileList = list(restoredFile1Lines)

print "\n ----- SHOW REBUILD OF FILE1 ----- \n" 
# this is not showing anything! 
for i, line in enumerate(restoreFileList): 
    print line
Run Code Online (Sandbox Code Playgroud)

谢谢!

更新:

contextDiffSeq = difflib.context_diff(lines1, lines2) 
contextDiffList = list(contextDiffSeq) 

print "\n ----- SHOW CONTEXTDIFF ----- \n" 
for i, line in enumerate(contextDiffList):
    print line
Run Code Online (Sandbox Code Playgroud)

-----显示上下文差异-----




* 5,9 **

 print "HIJ"

 print "JKL"

 print "Hello World"
Run Code Online (Sandbox Code Playgroud)
  • 打印“ j =” + j

    打印“ XYZ”

-5,9 ----

 print "HIJ"

 print "JKL"

 print "Hello World"

 print "XYZ"
Run Code Online (Sandbox Code Playgroud)
  • 打印“结束”

另一个更新:

在Panvalet(大型图书馆的管理员)源管理工具的早期,您可以创建一个变更集,如下所示:

++ADD 9
   print "j=" + j 
Run Code Online (Sandbox Code Playgroud)

这只是意味着在第9行之后添加一行。然后出现++ REPLACE或++ UPDATE之类的单词。 http://www4.hawaii.gov/dags/icsd/ppmo/Stds_Web_Pages/pdf/it110401.pdf

Jim*_*som 5

我还在尝试找出为什么许多difflib函数返回一个生成器而不是一个列表,那有什么好处?

好吧,请仔细考虑一下-如果您比较文件,那么这些文件在理论上(实际上将是实际的)可能会很大-将增量作为列表返回,例如,这意味着将完整的数据读取到内存中,即并非明智之举。

至于仅返回差值,那么,使用生成器还有另一个优点-只需迭代增量并保留您感兴趣的任何行。

如果您阅读了Differ-风格差异的difflib文档,则会看到一段内容:

Each line of a Differ delta begins with a two-letter code:
Code    Meaning
'- '    line unique to sequence 1
'+ '    line unique to sequence 2
'  '    line common to both sequences
'? '    line not present in either input sequence
Run Code Online (Sandbox Code Playgroud)

因此,如果只需要差异,则可以使用str.startswith轻松地将其滤除

您还可以使用difflib.context_diff获得仅显示更改的紧凑增量。