在比较相似的行时,我想突出显示同一行的差异:
a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet
lorem <ins>foo</ins> ipsum dolor <del>sit</del> amet
Run Code Online (Sandbox Code Playgroud)
虽然difflib.HtmlDiff似乎可以执行此类内联突出显示,但它会产生非常详细的标记.
不幸的是,我无法找到另一个不按行逐行操作的类/方法.
我错过了什么吗?任何指针将不胜感激!
tzo*_*zot 43
举个简单的例子:
import difflib
def show_diff(seqm):
"""Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
output= []
for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
if opcode == 'equal':
output.append(seqm.a[a0:a1])
elif opcode == 'insert':
output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
elif opcode == 'delete':
output.append("<del>" + seqm.a[a0:a1] + "</del>")
elif opcode == 'replace':
raise NotImplementedError, "what to do with 'replace' opcode?"
else:
raise RuntimeError, "unexpected opcode"
return ''.join(output)
>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'
Run Code Online (Sandbox Code Playgroud)
这适用于字符串.您应该决定如何处理"替换"操作码.
这是一个内联差异,灵感来自上面@tzot 的回答(也兼容 Python 3):
def inline_diff(a, b):
import difflib
matcher = difflib.SequenceMatcher(None, a, b)
def process_tag(tag, i1, i2, j1, j2):
if tag == 'replace':
return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
if tag == 'delete':
return '{- ' + matcher.a[i1:i2] + '}'
if tag == 'equal':
return matcher.a[i1:i2]
if tag == 'insert':
return '{+ ' + matcher.b[j1:j2] + '}'
assert False, "Unknown tag %r"%tag
return ''.join(process_tag(*t) for t in matcher.get_opcodes())
Run Code Online (Sandbox Code Playgroud)
它并不完美,例如,扩展“替换”操作码以识别替换的完整单词而不是几个不同的字母会很好,但这是一个很好的起点。
示例输出:
>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+ bananas} ipsum {dolor -> cabbage} sit amet{- consectetur} adipiscing
Run Code Online (Sandbox Code Playgroud)