如何将具有UTF-8文件名的文件复制到Windows上的Perl中的另一个UTF-8文件名？

Question

如何将具有UTF-8文件名的文件复制到Windows上的Perl中的另一个UTF-8文件名？

例如,给定一个空文件???.txt,我如何制作一个名为???.txt.copy？的副本？

我的第一个破解设法访问该文件并创建新文件名,但生成了副本ãƒ†ã‚¹ãƒˆ.txt.copy.

这是我的第一次破解:

#!/usr/bin/env perl

use strict;
use warnings;

use English '-no_match_vars';
use File::Basename;
use Getopt::Long;

use File::Copy;
use Win32;

my (
    $output_relfilepath,
   ) = process_command_line();

open my $fh, '>', $output_relfilepath or die $!;
binmode $fh, ':utf8';
foreach my $short_basename ( glob( '*.txt') ) {

  # skip the output basename if it's in the glob
  if ( $short_basename eq $output_relfilepath ) {
    next;
  }

  my $long_basename = Win32::GetLongPathName( $short_basename );
  my $new_basename  = $long_basename . '.copy';

  print {$fh} sprintf(
                      "short_basename = (%s)\n" .
                      " long_basename = (%s)\n" .
                      "  new_basename = (%s)\n",
                      $short_basename,
                      $long_basename,
                      $new_basename,
                     );
  copy( $short_basename, $new_basename );
}

printf(
       "\n%s done! (%d seconds elapsed)\n",
       basename( $0 ),
       time() - $BASETIME,
      );

# === subroutines ===

sub process_command_line {

  # default arguments
  my %args
    = (
       output_relfilepath => 'output.txt',
      );

  GetOptions(
             'help'                 => sub { print usage(); exit },
             'output_relfilepath=s' => \$args{output_relfilepath},
            );

  return (
          $args{output_relfilepath},
         );
}

sub usage {
  my $script_name = basename $0;

  my $usage = <<END_USAGE;
======================================================================

Test script to copy files with a UTF-8 filenames to files with
different UTF-8 filenames.  This example tries to make copies of all
.txt files with versions that end in .txt.copy.

  usage: ${script_name} (<options>)

options:

  -output_relfilepath <s>   set the output relative file path to <s>.
                            this file contains the short, long, and
                            new basenames.
                            (default: 'output.txt')

----------------------------------------------------------------------

examples:

  ${script_name}

======================================================================
END_USAGE

  return $usage;
}

Run Code Online (Sandbox Code Playgroud)

以下是output.txt执行后的内容:

short_basename = (BD9A~1.TXT)
 long_basename = (???.txt)
  new_basename = (???.txt.copy)

Run Code Online (Sandbox Code Playgroud)

我试过用系统调用替换File :: Copy的copy命令:

my $cmd = "copy \"${short_basename}\" \"${new_basename}\"";
print `$cmd`;

Run Code Online (Sandbox Code Playgroud)

并使用Win32 :: CopyFile:

Win32::CopyFile( $short_basename, $new_basename, 'true' );

Run Code Online (Sandbox Code Playgroud)

不幸的是,我在两种情况下得到了相同的结果(ãƒ†ã‚¹ãƒˆ.txt.copy).对于系统调用,打印1 file(s) copied.按预期显示.

笔记:

我在Windows 7 Professional上通过Strawberry Perl运行Perl 5.10.0
我使用Win32模块访问长文件名
glob返回短文件名,我必须使用它来访问该文件
片假名中的テスト= test(tesuto)
我已经阅读了perlunitut和绝对最低每个软件开发人员,绝对必须知道Unicode和字符集(没有借口!)

Answer 1

bob*_*nce 2

您可以使用获得长文件名Win32，这会为您提供 UTF-8 编码的字符串。

\n\n

但是，您随后使用 plain设置长文件名copy，它使用 C stdlib IO 函数。stdlib 函数使用默认的文件系统编码。

\n\n

在现代 Linux 上，通常是 UTF-8，但在 Windows 上（遗憾的是）从来不是，因为系统默认代码页无法设置为 UTF-8。因此，在西欧 Windows 安装上，您的 UTF-8 字符串将被解释为代码页 1252 字符串，正如此处所发生的那样。（在日本机器上，它会被解释为代码页 932\xe2\x80\x89\xe2\x80\x94\xe2\x80\x89，如 Shift-JIS\xe2\x80\x89\xe2\x80\x94\xe2\ x80\x89 会出现类似的结果\xe7\xb9\x9d\xef\xbf\xbd\xe3\x81\x9b\xe7\xb9\x9d\xef\xbf\xbd。）

\n\n

我还没有在 Perl 中这样做过，但我怀疑该Win32::CopyFile函数更有可能能够处理Win32模块中其他地方返回的 Unicode 路径类型。

\n

归档时间：	16 年前
查看次数：	4589 次
最近记录：	11 年，1 月前