ars*_*zer 0 directory recursion perl
这是我的代码.我正在搜索重复的目录.我需要深度优先搜索.我使用递归,如果DH看到一个文件夹,那么它会查找该文件夹.但完成此文件夹后DH处理关闭和程序不会查看顶级文件夹的剩余部分.
#! /usr/bin/perl
use Digest::MD5;
dtraverse(@ARGV) ;
sub dtraverse {
my $fullpathname ;
my @subdirlist ;
my @filelist2 ;
my $newpath ;
my $name ;
my $d ;
print "entered nnnn\n";
$fullpathname = $_[0];
opendir(DH,$fullpathname) or die("Cannot open directory\n");
@subdirlist = () ;
@filelist2 = () ;
while ($name = readdir(DH) ) {
next if (($name eq ".") or ($name eq "..") );
$newpath = $fullpathname . "/" . $name ;
print "asdasd == $name\n";
if (-d $newpath ) {
push(@subdirlist,$newpath) ;
$name2=$name;
dtraverse($newpath) ;
Run Code Online (Sandbox Code Playgroud)
在点DH正在关闭并且没有查看剩余文件之后
push @filelist2,$hashes{$newpath};
}
else {
open (my $fh, '<', $newpath) or die "Can't open '$newpath': $!";
binmode ($fh);
$mumu= Digest::MD5->new->addfile($fh)->hexdigest, " $newpath\n";
push(@filelist2,$mumu);
$data {$newpath}=$mumu;
}
}
$total="";
foreach $mumus (sort @filelist2) {
$total="$total" . "$mumus";
$total2= Digest::MD5->new->add("$total")->hexdigest;
$hashes{$fullpathname}=$total2;
}
closedir(DH) ;
print "hash of $fullpathname= $total2 \n";
#print "DIR:$fullpathname FILES:@filelist\n" ;
}
Run Code Online (Sandbox Code Playgroud)
您正在使用裸字目录句柄 DH.这些是包变量.每次执行时opendir DH,先前打开的句柄都会关闭:
使用裸字符号来引用文件句柄是特别邪恶的,因为它们是全局的,并且您不知道该符号是否已指向其他文件句柄.
因此,使用词法目录句柄,opendir my $dh就像您正在使用的文件句柄一样.
当然,我可能会使用File :: Find.另外,看看Yanick在DFW.pm Dedup Hackathon 的参赛作品.
使用Path :: Class和Digest :: xxHash的以下可能错误的脚本花了大约10秒来检查我的下载文件夹中的5876文件:
#!/usr/bin/env perl
use strict;
use warnings;
use constant xxHASH_SEED => 0xDEADBEEF;
use feature 'say';
use Digest::xxHash qw(xxhash_hex);
use Path::Class;
use YAML::XS;
run(@ARGV) unless caller;
sub run {
my $top = shift;
die "Need top directory\n" unless defined $top;
# dies if it cannot resolve
$top = dir($top)->absolute->resolve;
my $counter;
my %dupes;
$top->recurse(
callback => sub {
my $entry = shift;
if (-d $entry and !(-x _)) {
return $entry->PRUNE
}
return unless -r $entry;
return unless -f _;
$counter += 1;
my $hash = xxhash_hex scalar($entry->slurp), xxHASH_SEED;
# Don't stringify if you want to do
# anything other than display file names
push @{ $dupes{$hash} }, "$entry";
},
depthfirst => 1,
);
say "Hashed $counter files";
my @dupes = grep @$_ > 1, values %dupes;
if (@dupes) {
print "Possible duplicates:\n", Dump \@dupes;
}
}
Run Code Online (Sandbox Code Playgroud)