我在一个目录中有许多文件,我想检查它们是否都是唯一的。为简单起见,假设我有三个文件:foo.txt,bar.txt和baz.txt. 如果我运行这个循环,我会互相检查它们:
$ for f in ./*; do for i in ./*; do diff -q "$f" "$i"; done; done
Files bar.txt and baz.txt differ
Files bar.txt and foo.txt differ
Files baz.txt and bar.txt differ
Files baz.txt and foo.txt differ
Files foo.txt and bar.txt differ
Files foo.txt and baz.txt differ
Run Code Online (Sandbox Code Playgroud)
对于我想要处理的数百个文件,这将变得非常不可读;这将是更好地列出这些文件做匹配,然后我可以快速查看列表并确保文件只匹配自己。从联机帮助页中,我会认为该-s选项可以完成此操作:
$ for f in ./*; do for i in ./*; do diff -s "$f" "$i"; done; done
Files bar.txt and bar.txt are identical
Files baz.txt and baz.txt are identical
Files foo.txt and foo.txt are identical
Run Code Online (Sandbox Code Playgroud)
...然而,实际上它还会打印出任何不同文件的全部内容。有什么办法可以抑制这种行为,所以我只能得到上面的行为?
或者,是否有其他工具可以完成此操作?
小智 17
这应该可以解决问题:
diff -rs dir1 dir2 | egrep '^Files .+ and .+ are identical$'
Run Code Online (Sandbox Code Playgroud)
你的两个目录在哪里dir1和dir2在哪里。
如果您只想从dir1以下位置打印匹配的目录:
diff -rs dir1 dir2 | egrep '^Files .+ and .+ are identical$' | awk -F '(Files | and | are identical)' '{print $2}'
Run Code Online (Sandbox Code Playgroud)
同样,如果您只想从dir2以下位置打印匹配的目录:
diff -rs dir1 dir2 | egrep '^Files .+ and .+ are identical$' | awk -F '(Files | and | are identical)' '{print $3}'
Run Code Online (Sandbox Code Playgroud)
如果您只想检查两个文件是否相同,请使用cmp. 要仅获得相同文件的输出,您可以使用
for f in ./*; do for i in ./*; do cmp -s "$f" "$i" && echo "Files $f and $i are identical"; done; done
Run Code Online (Sandbox Code Playgroud)
diff 尝试生成一个简短的、人类可读的差异列表,这可能需要很多时间,因此如果您不需要它,请避免开销。