在Windows上处理Perl中的unicode目录和文件名

Kel*_*ian 8 directory perl encoding tk-toolkit

我有Perl和Windows的编码问题.在运行Perl的Windows 7(草莓5.16)和简单的TK GUI上,我需要打开文件和/或访问名称/路径中包含非英文字符的目录.对于打开文件我已经提出这个解决方案似乎工作正常:

#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::Unicode::File;
use Encode;
use Tk;

my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
    -text => 'Open file',
    -command =>  [ \&select_unicode_file ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();

sub select_unicode_file{
my $types = [ ['Txt', '.txt'],
          ['All Files',   '*'],];
my $input_file= $mw->getOpenFile(-filetypes => $types);
my $fh = Win32::Unicode::File->new;
if ($fh->open('<', $input_file)){
  while (my $line = $fh->readline()){
    print "\n$line\n";
  }
   close $fh;
}
 else{
  print "Couldn't open file: $!\n";
}
}
Run Code Online (Sandbox Code Playgroud)

这会正确打开Поиск/Поиск.txt等文件

我不能做的只是获取目录路径而不是处理它.我想我应该使用Win32 :: Unicode :: Dir,但我真的无法理解文档.

它应该是这样的:

#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::Unicode::Dir;
use Encode;
use Tk;

my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
    -text => 'Open file',
    -command =>  [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();

sub select_unicode_directory{
my $dir = $mw->chooseDirectory( );
my $wdir = Win32::Unicode::Dir->new;
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
open (MYFILE, $dir_complete );
    while (<MYFILE>) {
    chomp;
    print "$_\n";
}
close (MYFILE); 
}
Run Code Online (Sandbox Code Playgroud)

Яро*_*лин 1

存在逻辑错误:

\n\n
my $dir = $wdir->open($dir) || die $wdir->error;\nmy $dir_complete = "$dir/a.txt";\n
Run Code Online (Sandbox Code Playgroud)\n\n

$wdir->open(\'path\')返回一个对象,而不是字符串。您不能将其用作路径。但这还不是最糟糕的。遗憾的是,Tk 实现似乎还不支持 Unicode 文件名(包括 ChooseDirectory)。我想您必须编写一个自定义目录选择器,但我不确定它是否可能。

\n\n

它能够列出 ascii-chars 文件夹中的文件(并且 ->fetch 可以列出 utf-8 文件),并且在打开具有 utf-8 字符的文件夹时崩溃。好吧,公平地说,打开时会崩溃??????

\n\n
use strict;\nuse warnings;\nuse Win32::Unicode::Dir;\nuse Win32::Unicode::Console;\nuse Encode;\nuse Tk;\n\nmy $mw = Tk::MainWindow->new;\nmy $tissue_but = $mw->Button(\n    -text => \'Select dir\',\n    -command =>  [ \\&select_unicode_directory ],\n);\n$tissue_but->grid( -row => 3, -column => 1 );\nTk::MainLoop();\n\nsub select_unicode_directory {\n    my $wdir = Win32::Unicode::Dir->new;\n    my $selected = $mw->chooseDirectory(-parent =>$mw);\n       # http://search.cpan.org/dist/Tk/pod/chooseDirectory.pod#CAVEATS\n       $selected = encode("utf-8", $selected);\n    print "selected: $selected\\n";\n\n    $wdir->open($selected) || die $wdir->error;\n\n    print "\\$mw->chooseDirectory:    $selected\\n";\n    print "\\$wdir->open(\\$selected): $wdir\\n";\n\n\n# CRASH HERE, presumably because winders can\'t handle \'?\' in a file (dir) name\n    for ($wdir->fetch) {\n# http://search.cpan.org/~xaicron/Win32-Unicode-0.38/lib/Win32/Unicode/Dir.pm\n        next if /^\\.{1,2}$/;\n        my $path = "$selected/$_";\n        if (file_type(\'f\', $path)) { print "file: $path\\n"; } \n        elsif (file_type(\'d\', $path)) { print " dir: $path\\n"; }\n    }\n    print "closing \\n";\n    $wdir->close || die $wdir->error;\n\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

采样(打开\xd0\x9f\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba/):

\n\n

下面的两个示例均使用以下命令运行:Strawberry Perl 5.12.3built for MSWin32-x64-multi-thread

\n\n
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????\n$mw->chooseDirectory:    C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????\n$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2e38158)\n>>> perl crash <<<\n
Run Code Online (Sandbox Code Playgroud)\n\n

采样(\xd0\x9f\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba 的打开父级):

\n\n
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk\n$mw->chooseDirectory:    C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk\n$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2b92c10)\nfile: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/.select_uni_dir.pl.swp\nfile: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o\nfile: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o.dir\nfile: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_dir.pl\nfile: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_file.pl\n dir: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/\xd0\x9f\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba\n
Run Code Online (Sandbox Code Playgroud)\n\n

结论

\n\n

Tk 目录选择器返回 ?????? 而不是 \xd0\x9f\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba。在寻找在 Tk 中启用 Unicode 的方法时,我发现了这一点:

\n\n

http://search.cpan.org/dist/Tk/pod/UserGuide.pod#Perl/Tk_and_Unicode

\n\n
\n

(...) 不幸的是,Perl 中仍然有一些地方不了解 \n Unicode。这些地方之一是文件名。因此,Perl/Tk 中的文件选择器 \n 无法正确处理文件名的编码。目前他们假设文件名采用 iso-8859-1 编码,至少在 Unix 系统上是这样。一旦 Perl 有了文件名编码的概念,Perl/Tk 也将实现这样的方案。

\n
\n\n

因此,乍一看,您想要做的事情似乎是不可能的(除非您编写或找到自定义目录选择器)。实际上,提交这个错误可能不是一个坏主意,因为 GUI确实显示了“\xd0\x9f\xd0\xbe\xd0\xb8\xd1\x81\xd0\xba”,所以错误在返回值中。

\n