比较源代码文件,忽略格式差异(如空格、换行符...)

san*_*lio 10 diff text-processing

我正在寻找一个可以比较两个 C++ 源代码并找到代码意义差异的应用程序(以比较可能以不同方式重新格式化的版本)。至少,能够忽略不影响源功能的空格、制表符和换行符的变化的东西(请注意,换行符是否被视为空格取决于语言,而 C 和 C++ 这样做)。并且,理想情况下,可以准确识别所有代码意义差异的东西。我在 Ubuntu 下。

按照diff --help | grep ignore,我希望diff -bBwZ合理地完成这项工作(我预计会得到一些假阴性,稍后再处理)。然而,事实并非如此。

如果我有以下带有片段的文件

test_diff1.txt

    else if (prop == "P1") { return 0; }
Run Code Online (Sandbox Code Playgroud)

和 test_diff2.txt

    else if (prop == "P1") {
        return 0;
    }
Run Code Online (Sandbox Code Playgroud)

然后

$ diff -bBwZ test_diff1.txt test_diff2.txt
1c1,3
<     else if (prop == "P1") { return 0; }
---
>     else if (prop == "P1") {
>         return 0;
>     }
Run Code Online (Sandbox Code Playgroud)

而不是空结果。

使用代码格式化程序作为两个输入的“过滤器”可能会过滤掉这些差异,但结果输出必须与原始输入联系起来,以便最终报告差异以保留实际的文本和行号。因此,无需正确编译器即可实现目标......不过,我不知道是否有可用的东西。

可以达到目标diff吗? 否则,是否有替代方案(最好是命令行)?

N0r*_*ert 8

您可以使用dwdiff. 来自man dwdiff

dwdiff - 一个定界词差异程序

程序非常聪明 - 见dwdiff --help

$ dwdiff --help
Usage: dwdiff [OPTIONS] <OLD FILE> <NEW FILE>
-h, --help                             Print this help message
-v, --version                          Print version and copyright information
-d <delim>, --delimiters=<delim>       Specify delimiters
-P, --punctuation                      Use punctuation characters as delimiters
-W <ws>, --white-space=<ws>            Specify whitespace characters
-u, --diff-input                       Read the input as the output from diff
-S[<marker>], --paragraph-separator[=<marker>]  Show inserted or deleted blocks
                               of empty lines, optionally overriding the marker
-1, --no-deleted                       Do not print deleted words
-2, --no-inserted                      Do not print inserted words
-3, --no-common                        Do not print common words
-L[<width>], --line-numbers[<width>]   Prepend line numbers
-C<num>, --context=<num>               Show <num> lines of context
-s, --statistics                       Print statistics when done
--wdiff-output                         Produce wdiff compatible output
-i, --ignore-case                      Ignore differences in case
-I, --ignore-formatting                Ignore formatting differences
-m <num>, --match-context=<num>        Use <num> words of context for matching
--aggregate-changes                    Allow close changes to aggregate
-A <alg>, --algorithm=<alg>            Choose algorithm: best, normal, fast
-c[<spec>], --color[=<spec>]           Color mode
-l, --less-mode                        As -p but also overstrike whitespace
-p, --printer                          Use overstriking and bold text
-w <string>, --start-delete=<string>   String to mark begin of deleted text
-x <string>, --stop-delete=<string>    String to mark end of deleted text
-y <string>, --start-insert=<string>   String to mark begin of inserted text
-z <string>, --stop-insert=<string>    String to mark end of inserted text
-R, --repeat-markers                   Repeat markers at newlines
--profile=<name>                       Use profile <name>
--no-profile                           Disable profile reading
Run Code Online (Sandbox Code Playgroud)

测试它:

cat << EOF > test_diff1.txt
    else if (prop == "P1") { return 0; }
EOF

cat << EOF > test_diff2.txt
    else if (prop == "P1") {
        return 0;
    }
EOF
Run Code Online (Sandbox Code Playgroud)

然后启动对比:

$ dwdiff test_diff1.txt test_diff2.txt --statistics
    else if (prop == "P1") {
        return 0;
    }
old: 9 words  9 100% common  0 0% deleted  0 0% changed
new: 9 words  9 100% common  0 0% inserted  0 0% changed
Run Code Online (Sandbox Code Playgroud)

请注意100% common以上。