如何删除文件的所有重复硬链接？

Question

如何删除文件的所有重复硬链接？

我有一个由创建的目录树rsnapshot，其中包含相同目录结构的多个快照，所有相同的文件都被硬链接替换。

我想删除所有这些硬链接重复项，并只保留每个文件的一个副本（这样我以后就可以将所有文件移动到一个排序的存档中，而不必两次接触相同的文件）。

有没有工具可以做到这一点？
到目前为止，我只找到了可以找到重复项并创建硬链接来替换它们的工具……
我想我可以列出所有文件及其 inode 编号并自己实现重复数据删除和删除，但我不想在这里重新发明轮子。

Answer 1

最后，根据Stéphane和xenoid 的提示以及之前使用find.
我不得不调整一些命令来使用 FreeBSD 的非 GNU 工具——GNUfind有-printf可以替换 . 的选项-exec stat，但 FreeBSDfind没有。

# create a list of "<inode number> <tab> <full file path>"
find rsnapshots -type f -links +1 -exec stat -f '%i%t%R' {} + > inodes.txt

# sort the list by inode number (to have consecutive blocks of duplicate files)
sort -n inodes.txt > inodes.sorted.txt

# remove the first file from each block (we want to keep one link per inode)
awk -F'\t' 'BEGIN {lastinode = 0} {inode = 0+$1; if (inode == lastinode) {print $2}; lastinode = inode}' inodes.sorted.txt > inodes.to-delete.txt

# delete duplicates (watch out for special characters in the filename, and possibly adjust the read command and double quotes accordingly)
cat inodes.to-delete.txt | while read line; do rm -f "$line"; done

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	5308 次
最近记录：	5 年，3 月前