difflib 有哪些更强大的替代方案？

wnn*_*maw 5 python string-comparison difflib

我正在编写需要能够跟踪修订的脚本。总体思路是给它一个元组列表，其中第一个条目是字段的名称（即“标题”或“描述”等），第二个条目是该字段的第一个版本，第三个条目是修订版。所以像这样：

[("Title", "The first version of the title", "The second version of the title")]

Run Code Online (Sandbox Code Playgroud)

现在，使用python docx“我希望我的脚本”创建一个 Word 文件，该文件将显示原始版本以及以粗体显示更改的新版本。例子：

原标题：

这是标题的第一个版本

修改后的标题：

这是标题的第二个版本

完成此操作的方法python docx是创建一个元组列表，其中第一个条目是文本，第二个条目是格式。所以创建修改后的标题的方法是这样的：

paratext = [("This is the ", ''),("second",'b'),(" version of the title",'')]

Run Code Online (Sandbox Code Playgroud)

最近发现difflib我认为这将是一项非常简单的任务。事实上，对于简单的单词替换（例如上面的示例），可以使用以下函数来完成：

def revFinder(str1,str2):
    s = difflib.SequenceMatcher(None, str1, str2)
    matches = s.get_matching_blocks()[:-1]

    paratext = []

    for i in range(len(matches)):
        print "------"
        print str1[matches[i][0]:matches[i][0]+matches[i][2]]
        print str2[matches[i][1]:matches[i][1]+matches[i][2]]
        paratext.append((str2[matches[i][1]:matches[i][1]+matches[i][2]],''))

        if i != len(matches)-1:
            print ""
            print str1[matches[i][0]+matches[i][2]:matches[i+1][0]]
            print str2[matches[i][1]+matches[i][2]:matches[i+1][1]]
            if len(str2[matches[i][1]+matches[i][2]:matches[i+1][1]]) > len(str1[matches[i][0]+matches[i][2]:matches[i+1][0]]):
                paratext.append((str2[matches[i][1]+matches[i][2]:matches[i+1][1]],'bu'))
            else:
                paratext.append((str1[matches[i][0]+matches[i][2]:matches[i+1][0]],'bu'))

    return paratext

Run Code Online (Sandbox Code Playgroud)

当我想做其他事情时，问题就来了。例如，将“teh”更改为“the”会产生 the h（没有空格，我无法弄清楚格式）。另一个问题是附加到末尾的额外文本不会显示为更改（或根本不会显示）。

所以，我对你们所有人的问题是，有什么替代方案能够difflib强大到足以处理更复杂的文本比较，或者，我怎样才能difflib更好地使用它，使其能够满足我的需求？提前致谢

归档时间：	12 年，7 月前
查看次数：	2893 次
最近记录：	12 年，7 月前