Den*_*1.5 26
你有没看过谷歌的diff-match-patch?显然谷歌文档使用这组算法.它不仅包括diff模块,还包括补丁模块,因此您可以从较旧的文件和差异生成最新的文件.
包含python版本.
http://code.google.com/p/google-diff-match-patch/
我已经实现了一个纯 python 函数来应用差异补丁来恢复任何一个输入字符串,我希望有人觉得它有用。它使用解析统一差异格式。
import re
_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")
def apply_patch(s,patch,revert=False):
"""
Apply unified diff patch to string s to recover newer string.
If revert is True, treat s as the newer string, recover older string.
"""
s = s.splitlines(True)
p = patch.splitlines(True)
t = ''
i = sl = 0
(midx,sign) = (1,'+') if not revert else (3,'-')
while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
while i < len(p):
m = _hdr_pat.match(p[i])
if not m: raise Exception("Cannot process diff")
i += 1
l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
t += ''.join(s[sl:l])
sl = l
while i < len(p) and p[i][0] != '@':
if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
else: line = p[i]; i += 1
if len(line) > 0:
if line[0] == sign or line[0] == ' ': t += line[1:]
sl += (line[0] != sign)
t += ''.join(s[sl:])
return t
Run Code Online (Sandbox Code Playgroud)
如果有标题行,("--- ...\n","+++ ...\n")它会跳过它们。如果我们有一个统一的差异字符串diffstr代表之间的差异oldstr和newstr:
# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)
Run Code Online (Sandbox Code Playgroud)
在 Python 中,您可以使用difflib(标准库的一部分)生成两个字符串的统一差异:
import difflib
_no_eol = "\ No newline at end of file"
def make_patch(a,b):
"""
Get unified string diff between two strings. Trims top two lines.
Returns empty string if strings are identical.
"""
diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
try: _,_ = next(diffs),next(diffs)
except StopIteration: pass
return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])
Run Code Online (Sandbox Code Playgroud)
在Unix上: diff -U0 a.txt b.txt
代码在 GitHub 上以及使用 ASCII 和随机 unicode 字符的测试:https : //gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc