我试图在UTF8编码文本上使用XML :: RAI perl模块,我仍然有错误我真的不明白...这里是代码(它不应该做任何有用的事情):
use HTTP::Request;
use LWP::UserAgent;
use XML::RAI;
use Encode;
my $ua = LWP::UserAgent->new;
sub readFromWeb{
my $address = shift;
my $request = HTTP::Request->new( GET => $address );
my $response = $ua->request( $request );
return unless $response->code == 200;
return decode("utf8", $response->content());
}
sub readFromRSS{
my $address=shift;
my $content = readFromWeb $address;
my $rai = XML::RAI->parse_string($content);
#this line "causes" the error
}
readFromRSS("http://aktualne.centrum.cz/export/rss-hp.phtml");
#I am testing it on this particular RSS
Run Code Online (Sandbox Code Playgroud)
错误是:
Cannot decode string with wide characters at /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.
Run Code Online (Sandbox Code Playgroud)
我不知道这是我的错还是XML :: RAI的错.如果已经从utf8中解码了$ content,我看不出这些宽字符的位置...
编辑:由于某种原因我仍然不明白,删除"解码"部分实际上解决了问题.
问题是双重解码.XML::RAI::parse_string()显然需要一个UTF-8编码的文档并进行解码.如果传入已解码的字符串,则第二次解码将失败,当然:
#!/usr/bin/perl
use strict;
use warnings;
use Encode qw( decode );
use LWP::Simple qw( get );
my $xml = get("http://aktualne.centrum.cz/export/rss-hp.phtml");
$xml = decode('UTF-8', $xml);
$xml = decode('UTF-8', $xml); # dies: Cannot decode string with wide characters ...
Run Code Online (Sandbox Code Playgroud)
所以,跳过这decode()一步,你会没事的.