san*_*lio 10 diff text-processing
我正在寻找一个可以比较两个 C++ 源代码并找到代码意义差异的应用程序(以比较可能以不同方式重新格式化的版本)。至少,能够忽略不影响源功能的空格、制表符和换行符的变化的东西(请注意,换行符是否被视为空格取决于语言,而 C 和 C++ 这样做)。并且,理想情况下,可以准确识别所有代码意义差异的东西。我在 Ubuntu 下。
按照diff --help | grep ignore
,我希望diff -bBwZ
能合理地完成这项工作(我预计会得到一些假阴性,稍后再处理)。然而,事实并非如此。
如果我有以下带有片段的文件
test_diff1.txt
else if (prop == "P1") { return 0; }
Run Code Online (Sandbox Code Playgroud)
和 test_diff2.txt
else if (prop == "P1") {
return 0;
}
Run Code Online (Sandbox Code Playgroud)
然后
$ diff -bBwZ test_diff1.txt test_diff2.txt
1c1,3
< else if (prop == "P1") { return 0; }
---
> else if (prop == "P1") {
> return 0;
> }
Run Code Online (Sandbox Code Playgroud)
而不是空结果。
使用代码格式化程序作为两个输入的“过滤器”可能会过滤掉这些差异,但结果输出必须与原始输入联系起来,以便最终报告差异以保留实际的文本和行号。因此,无需正确编译器即可实现目标......不过,我不知道是否有可用的东西。
可以达到目标diff
吗?
否则,是否有替代方案(最好是命令行)?
您可以使用dwdiff
. 来自man dwdiff
:
dwdiff
- 一个定界词差异程序
程序非常聪明 - 见dwdiff --help
:
$ dwdiff --help
Usage: dwdiff [OPTIONS] <OLD FILE> <NEW FILE>
-h, --help Print this help message
-v, --version Print version and copyright information
-d <delim>, --delimiters=<delim> Specify delimiters
-P, --punctuation Use punctuation characters as delimiters
-W <ws>, --white-space=<ws> Specify whitespace characters
-u, --diff-input Read the input as the output from diff
-S[<marker>], --paragraph-separator[=<marker>] Show inserted or deleted blocks
of empty lines, optionally overriding the marker
-1, --no-deleted Do not print deleted words
-2, --no-inserted Do not print inserted words
-3, --no-common Do not print common words
-L[<width>], --line-numbers[<width>] Prepend line numbers
-C<num>, --context=<num> Show <num> lines of context
-s, --statistics Print statistics when done
--wdiff-output Produce wdiff compatible output
-i, --ignore-case Ignore differences in case
-I, --ignore-formatting Ignore formatting differences
-m <num>, --match-context=<num> Use <num> words of context for matching
--aggregate-changes Allow close changes to aggregate
-A <alg>, --algorithm=<alg> Choose algorithm: best, normal, fast
-c[<spec>], --color[=<spec>] Color mode
-l, --less-mode As -p but also overstrike whitespace
-p, --printer Use overstriking and bold text
-w <string>, --start-delete=<string> String to mark begin of deleted text
-x <string>, --stop-delete=<string> String to mark end of deleted text
-y <string>, --start-insert=<string> String to mark begin of inserted text
-z <string>, --stop-insert=<string> String to mark end of inserted text
-R, --repeat-markers Repeat markers at newlines
--profile=<name> Use profile <name>
--no-profile Disable profile reading
Run Code Online (Sandbox Code Playgroud)
测试它:
cat << EOF > test_diff1.txt
else if (prop == "P1") { return 0; }
EOF
cat << EOF > test_diff2.txt
else if (prop == "P1") {
return 0;
}
EOF
Run Code Online (Sandbox Code Playgroud)
然后启动对比:
$ dwdiff test_diff1.txt test_diff2.txt --statistics
else if (prop == "P1") {
return 0;
}
old: 9 words 9 100% common 0 0% deleted 0 0% changed
new: 9 words 9 100% common 0 0% inserted 0 0% changed
Run Code Online (Sandbox Code Playgroud)
请注意100% common
以上。