Perl:以与输入文件相同的字节序打开输出文件——UTF-16be 与 UTF-16le

Han*_*zel 1 perl encoding utf-16 endianness

当 Perl 打开一个 UTF-16 编码的文件时,

open my $in, "< :encoding(UTF-16)", "text-utf16le.txt" or die "Error $!\n";

由于字节顺序标记,它会自动检测字节序

但是当我打开文件进行写入时

open my $out, "> :encoding(UTF-16)", "output.txt" or die "Error $!\n";

Perl 默认以大端格式打开它。

请问如何指定以与输入文件相同的字节序打开输出文件?

如何从输入文件句柄获取字节序/编码$inPerlIO::get_layers($in)返回其他层encoding(UTF-16)

ike*_*ami 5

您必须自己阅读 BOM。

use IO::Unread qw( unread );

open(my $fh_in, "<:raw", $qfn)
   or die;

my $rv = read($fh_in, my $buf, 4);
defined($rv)
   or die;

my $encoding;
my $bom_present;
if    ($buf =~ s/^\x00\x00\xFE\xFF//) { $encoding = 'UTF-32be'; $bom_present = 1; }
elsif ($buf =~ s/^\xFF\xFE\x00\x00//) { $encoding = 'UTF-32le'; $bom_present = 1; }
elsif ($buf =~ s/^\xFE\xFF//        ) { $encoding = 'UTF-16be'; $bom_present = 1; }
elsif ($buf =~ s/^\xFF\xFE//        ) { $encoding = 'UTF-16le'; $bom_present = 1; }
elsif ($buf =~ s/^\xEF\xBB\xBF//    ) { $encoding = 'UTF-8';    $bom_present = 1; }
else {
   $encoding = 'UTF-8';
   $bom_present = 0;
}

unread($fh_in, $buf) if length($buf);

binmode($fh_in, ":encoding($encoding)");
binmode($fh_in, ":crlf") if $^O eq 'MSWin32';
Run Code Online (Sandbox Code Playgroud)

但有人已经为你做到了:

use File::BOM qw( open_bom );

my $encoding = open_bom(my $fh_in, $qfn, ':encoding(UTF-8)');
Run Code Online (Sandbox Code Playgroud)