perl -f check fails to identify file

azz*_*zid 12 perl file

I have a perl script that goes through a folder with a couple of thousand files.

When I started writing the script I was unaware of the perl File::Find functions, so in order to list all the files in the structure I used something along the line of:

open (FILES, "$FIND $FOLDER -type f |");
while (my $line = <FILES>) {...}
Run Code Online (Sandbox Code Playgroud)

Now however I figured I would try doing this from perl instead of launching a external program. (No real reason to do this change other than wanting to learn to use File::Find.)

Trying to learn the semantics of File::Find find function I tried a few things on the command line and compared the output to that of find.

Oddly enough there is 1 file that the program find finds but the perl function skips.

Find works:

machine:~# find /search/path -type f | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# find /search/path -type f | wc -l
    6439
Run Code Online (Sandbox Code Playgroud)

Perl fails:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | wc -l
    6438
Run Code Online (Sandbox Code Playgroud)

Changing to exclude folders rather than include files works:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" unless -d }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db
Run Code Online (Sandbox Code Playgroud)

Only difference between the files is the size:

machine:~# ls -l /search/path/folder/folder/UNIQ/
total 4213008
-rw-rw-r--    1 user users    4171336632 May 27  2012 movie_file_015.MOV
-rw-rw-r--    1 user users    141610616 May 27  2012 movie_file_145.MOV
-rw-rw-r--    1 user users       20992 May 27  2012 Thumbs.db
Run Code Online (Sandbox Code Playgroud)

Perl on the machine in question is old but not ancient:

machine:~# perl -version

This is perl, v5.8.8 built for sparc-linux

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
Run Code Online (Sandbox Code Playgroud)

Is this a known bug or something?

Or am I hitting some size limit of '-f'? The file is almost 4gb and the largest in the selection.

Or is my test (if -f) poorly chosen?

EDIT [trying to stat files]:

Big file fails

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"));'
Run Code Online (Sandbox Code Playgroud)

Small file works

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"));'
$VAR1 = 65024;
$VAR2 = 19989500;
$VAR3 = 33204;
$VAR4 = 1;
$VAR5 = 1004;
$VAR6 = 100;
$VAR7 = 0;
$VAR8 = 141610616;
$VAR9 = 1349281585;
$VAR10 = 1338096718;
$VAR11 = 1352403842;
$VAR12 = 16384;
$VAR13 = 276736;
Run Code Online (Sandbox Code Playgroud)

Binary 'stat' works on both files

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_015.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_015.MOV"
  Size: 4171336632  Blocks: 8149216    IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989499    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:11:05.000000000 +0200
Modify: 2012-05-27 07:23:34.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_145.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_145.MOV"
  Size: 141610616   Blocks: 276736     IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989500    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:26:25.000000000 +0200
Modify: 2012-05-27 07:31:58.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100
Run Code Online (Sandbox Code Playgroud)

Also:

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $! . "\n";'
Bad file descriptor

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $! . "\n";'
Value too large for defined data type
Run Code Online (Sandbox Code Playgroud)

EDIT2:

# perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=sparc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Dstatic_ext=B ByteLoader GDBM_File POSIX re -Dusemymalloc -Uuselargefiles -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
useperlio=define d_sfio=undef uselargefiles=undef usesocks=undef
Run Code Online (Sandbox Code Playgroud)

Problem "solved":

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $!{EOVERFLOW} . "\n";'
92
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $!{EOVERFLOW} . "\n";'
0
Run Code Online (Sandbox Code Playgroud)

Works:

# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f or ( $!{EOVERFLOW} > 0 and not -d) }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV 
/search/path/folder/folder/UNIQ/movie_file_145.MOV 
/search/path/folder/folder/UNIQ/Thumbs.db
Run Code Online (Sandbox Code Playgroud)

Ilm*_*nen 10

基于谷歌搜索,它看起来像你的Perl解释器没有被编译大文件支持,造成stat(以及依赖于它内部的任何文件测试,其中包括-f)以失败大于2GB的文件.

要检查是否是这种情况,请运行:

perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
Run Code Online (Sandbox Code Playgroud)

如果你的perl有大文件支持,输出应该显示类似uselargefiles=define-D_FILE_OFFSET_BITS=64.如果没有,则perl可能不支持大文件.

为什么即使只是stat文件也需要大文件支持,这可能有些令人费解.根本问题是,如果应用于大于2GB的文件,32位版本的stat(2)系统调用(而不是返回虚假大小)会失败,EOVERFLOW如果应用于大于2GB的文件:

" EOVERFLOW

(stat())path指的是无法在off_t类型中表示其大小的文件.当在没有-D_FILE_OFFSET_BITS = 64的情况下在32位平台上编译的应用程序在大小超过(1 << 31)-1位的文件上调用stat()时,可能会发生这种情况.

从技术上讲,接收该错误应该足以表明命名文件确实存在(虽然我猜它也可能是一个非常棒的目录),但perl并不聪明,没有意识到 - 它只是看到统计失败了,所以没有回报.

(编辑:正如ikegami在评论中正确注意到的,如果stat(2)调用失败,则-f返回undef而不是0或1,并设置$!为导致失败的错误代码.所以,如果你不介意假设所有目录条目如果文件大小> 2GB,你可以做一些类似-f $_ or (not defined -f _ and $!{EOVERFLOW})的检查.)

  • 它什么都不返回; 它返回undef(错误)而不是0(不是普通文件)并将`$!`设置为`EOVERLFLOW`.当`-f`返回undef时,可以通过检查`$!{EOVERFLOW}`来检查溢出. (5认同)