sid*_*com 4 unicode perl encoding utf-8 pathname
我在这个路径名中做了什么奇怪的编码呢?
在我的文件管理器(Dolphin)中,路径名称看起来很好.
#!/usr/local/bin/perl
use warnings;
use 5.014;
use utf8;
use open qw( :encoding(UTF-8) :std );
use File::Find;
use Devel::Peek;
use Encode qw(decode);
my $string;
find( sub { $string = $File::Find::name }, 'Delibes, Léo' );
$string =~ s|Delibes,\ ||;
$string =~ s|\..*\z||;
my ( $s1, $s2 ) = split m|/|, $string, 2;
say Dump $s1;
say Dump $s2;
# SV = PV(0x824b50) at 0x9346d8
# REFCNT = 1
# FLAGS = (PADMY,POK,pPOK,UTF8)
# PV = 0x93da30 "L\303\251o"\0 [UTF8 "L\x{e9}o"]
# CUR = 4
# LEN = 16
# SV = PV(0x7a7150) at 0x934c30
# REFCNT = 1
# FLAGS = (PADMY,POK,pPOK,UTF8)
# PV = 0x7781e0 "Lakm\303\203\302\251"\0 [UTF8 "Lakm\x{c3}\x{a9}"]
# CUR = 8
# LEN = 16
say $s1;
say $s2;
# Léo
# Lakmé
$s1 = decode( 'utf-8', $s1 );
$s2 = decode( 'utf-8', $s2 );
say $s1;
say $s2;
# L?o
# Lakmé
Run Code Online (Sandbox Code Playgroud)
Eri*_*ikR 13
不幸的是,您的操作系统的路径名API是另一个"二进制接口",您必须使用它Encode::encode并Encode::decode获得可预测的结果.
大多数操作系统将路径名视为八位字节序列(即字节).该序列是否应解释为latin-1,UTF-8或其他字符编码是应用程序决策.因此返回的值readdir()只是一个八位字节序列,File::Find并不知道您希望路径名称为Unicode代码点.它$File::Find::name通过简单地连接目录路径(您提供的)和操作系统返回的值来形成readdir(),这就是您如何获得与八位字节混合的代码点.
经验法则:每当将路径名传递给操作系统时,Encode::encode()都要确保它是一系列八位字节.从操作系统获取路径名时,Encode::decode()它将转换为应用程序所需的字符集.
您可以通过find以下方式调用您的程序:
find( sub { ... }, Encode::encode('utf8', 'Delibes, Léo') );
Run Code Online (Sandbox Code Playgroud)
然后Encode::decode()在使用以下值时调用$File::Find::name:
my $path = Encode::decode('utf8', $File::Find::name);
Run Code Online (Sandbox Code Playgroud)
更清楚的是,这是如何$File::Find::name形成的:
use Encode;
# This is a way to get $dir to be represented as a UTF-8 string
my $dir = 'L' .chr(233).'o'.chr(256);
chop $dir;
say "dir: ", d($dir); # length = 3
# This is what readdir() is returning:
my $leaf = encode('utf8', 'Lakem' . chr(233));
say "leaf: ", d($leaf); # length = 7
$File::Find::name = $dir . '/' . $leaf;
say "File::Find::name: ", d($File::Find::name);
sub d {
join(' ', map { sprintf("%02X", ord($_)) } split('', $_[0]))
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
898 次 |
| 最近记录: |