Mau*_*lin 25 string unicode perl
我有一个Unicode字符串,不知道它的编码是什么.当Perl程序读取此字符串时,是否存在Perl将使用的默认编码?如果是这样,我怎么知道它是什么?
我试图从输入中删除非ASCII字符.我在一些论坛上发现了这个:
my $line = encode('ascii', normalize('KD', $myutf), sub {$_[0] = ''});
Run Code Online (Sandbox Code Playgroud)
如果没有指定输入编码,上面的工作如何?是否应该如下指定?
my $line = encode('ascii', normalize('KD', decode($myutf, 'input-encoding'), sub {$_[0] = ''});
Run Code Online (Sandbox Code Playgroud)
dax*_*xim 32
要找出未知使用的编码,您只需要尝试查看.Encode :: Detect和Encode :: Guess模块使其自动化.(如果您在编译Encode :: Detect时遇到问题,请尝试使用它的分叉Encode :: Detective.)
use Encode::Detect::Detector;
my $unknown = "\x{54}\x{68}\x{69}\x{73}\x{20}\x{79}\x{65}\x{61}\x{72}\x{20}".
"\x{49}\x{20}\x{77}\x{65}\x{6e}\x{74}\x{20}\x{74}\x{6f}\x{20}".
"\x{b1}\x{b1}\x{be}\x{a9}\x{20}\x{50}\x{65}\x{72}\x{6c}\x{20}".
"\x{77}\x{6f}\x{72}\x{6b}\x{73}\x{68}\x{6f}\x{70}\x{2e}";
my $encoding_name = Encode::Detect::Detector::detect($unknown);
print $encoding_name; # gb18030
use Encode;
my $string = decode($encoding_name, $unknown);
Run Code Online (Sandbox Code Playgroud)
我发现encode 'ascii'摆脱非ASCII字符是一个蹩脚的解决方案.一切都将被问号所取代; 这太有损了,无济于事.
# Bad example; don't do this.
use utf8;
use Encode;
my $string = 'This year I went to ?? Perl workshop.';
print encode('ascii', $string); # This year I went to ?? Perl workshop.
Run Code Online (Sandbox Code Playgroud)
如果你想要可读的ASCII文本,我推荐使用Text :: Unidecode.这也是一种有损编码,但并不像encode上面那样可怕.
use utf8;
use Text::Unidecode;
my $string = 'This year I went to ?? Perl workshop.';
print unidecode($string); # This year I went to Bei Jing Perl workshop.
Run Code Online (Sandbox Code Playgroud)
但是,如果你可以帮助它,请避免使用那些有损编码.如果你想在以后的扭转操作,挑中的任何一个PERLQQ或XMLCREF.
use utf8;
use Encode qw(encode PERLQQ XMLCREF);
my $string = 'This year I went to ?? Perl workshop.';
print encode('ascii', $string, PERLQQ); # This year I went to \x{5317}\x{4eac} Perl workshop.
print encode('ascii', $string, XMLCREF); # This year I went to 北京 Perl workshop.
Run Code Online (Sandbox Code Playgroud)