Phi*_*son 6 perl encode entities file
我在编码方面不太好,我想弄清楚如何将数据返回为与它开始时相同的编码......
我有一个包含一些字符的文件,例如当'»'我编辑并插入数据库时,它们已变成»。
decode_entities() 什么都不做, encode_entities 再次对字符进行编码。所以我创建了我自己的子程序来解决这个问题,但是当从文件中获取数据时,它没有以正确的格式检索。
my $file = "c:/perlscripts/" . md5_hex($md5Con) . "-code.php";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n";
my $fileContents = unicodeConvert(<FILE>);
...
..
Run Code Online (Sandbox Code Playgroud)
有没有像这样的编码选项;
my $file = "c:/perlscripts/" . md5_hex($md5Con) . "-code.php";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n", "UTF-8";
my $fileContents = unicodeConvert(<FILE>);
...
..
Run Code Online (Sandbox Code Playgroud)
我的潜艇是;
sub unicodeConvert($) {
my $str = shift;
my %entityRef = ("&" => "&", '¢' => "¢", '¤' => "¤", '¦' => "¦", '¨' => "¨", 'ª' => "ª", '¬' => "¬", '®' => "®", '°' => "°", '²' => "²", '´' => "´", '¶' => "¶", '¸' => "¸", 'º' => "º", '¼' => "¼", '¾' => "¾", 'À' => "À", 'Â' => "Â", 'Ä' => "Ä", 'Æ' => "Æ", 'È' => "È", 'Ê' => "Ê", 'Ì' => "Ì", 'Î' => "Î", 'Ð' => "Ð", 'Ò' => "Ò", 'Ô' => "Ô", 'Ö' => "Ö", 'Ø' => "Ø", 'Ú' => "Ú", 'Ü' => "Ü", 'Þ' => "Þ", 'à' => "à", 'â' => "â", 'ä' => "ä", 'æ' => "æ", 'è' => "è", 'ê' => "ê", 'ì' => "ì", 'î' => "î", 'ð' => "ð", 'ò' => "ò", 'ô' => "ô", 'ö' => "ö", 'ø' => "ø", 'ú' => "ú", 'ü' => "ü", 'þ' => "þ", '¡' => "¡", '£' => "£", '¥' => "¥", '§' => "§", '©' => "©", '«' => "«", '¯' => "¯", '±' => "±", '³' => "³", 'µ' => "µ", '·' => "·", '¹' => "¹", '»' => "»", '½' => "½", '¿' => "¿", 'Á' => "Á", 'Ã' => "Ã", 'Å' => "Å", 'Ç' => "Ç", 'É' => "É", 'Ë' => "Ë", 'Í' => "Í", 'Ï' => "Ï", 'Ñ' => "Ñ", 'Ó' => "Ó", 'Õ' => "Õ", '×' => "×", 'Ù' => "Ù", 'Û' => "Û", 'Ý' => "Ý", 'ß' => "ß", 'á' => "á", 'ã' => "ã", 'å' => "å", 'ç' => "ç", 'é' => "é", 'ë' => "ë", 'í' => "í", 'ï' => "ï", 'ñ' => "ñ", 'ó' => "ó", 'õ' => "õ", '÷' => "÷", 'ù' => "ù", 'û' => "û", 'ý' => "ý", 'ÿ' => "ÿ");
while( ( my $key, my $obj ) = each( %entityRef ) ) {
if( $key ne '&' ) {
$str =~ s/$key/$obj/gis
} else {
$str =~ s#&((?!(quot;)|(amp;)|(cent;)|(curren;)|(brvbar;)|(uml;)|(ordf;)|(not;)|(reg;)|(deg;)|(sup2;)|(acute;)|(para;)|(cedil;)|(ordm;)|(frac14;)|(frac34;)|(Agrave;)|(Acirc;)|(Auml;)|(AElig;)|(Egrave;)|(Ecirc;)|(Igrave;)|(Icirc;)|(ETH;)|(Ograve;)|(Ocirc;)|(Ouml;)|(Oslash;)|(Uacute;)|(Uuml;)|(THORN;)|(agrave;)|(acirc;)|(auml;)|(aelig;)|(egrave;)|(ecirc;)|(igrave;)|(icirc;)|(eth;)|(ograve;)|(ocirc;)|(ouml;)|(oslash;)|(uacute;)|(uuml;)|(thorn;)|(iexcl;)|(pound;)|(yen;)|(sect;)|(copy;)|(laquo;)|(macr;)|(plusmn;)|(sup3;)|(micro;)|(middot;)|(sup1;)|(raquo;)|(frac12;)|(iquest;)|(Aacute;)|(Atilde;)|(Aring;)|(Ccedil;)|(Eacute;)|(Euml;)|(Iacute;)|(Iuml;)|(Ntilde;)|(Oacute;)|(Otilde;)|(times;)|(Ugrave;)|(Ucirc;)|(Yacute;)|(szlig;)|(aacute;)|(atilde;)|(aring;)|(ccedil;)|(eacute;)|(euml;)|(iacute;)|(iuml;)|(ntilde;)|(oacute;)|(otilde;)|(divide;)|(ugrave;)|(ucirc;)|(yacute;)|(yuml;)|(nbsp;)))#$obj#gis;
}
}
return $str;
}
Run Code Online (Sandbox Code Playgroud)
正如对您的问题的评论中所述,我不确定您到底在问什么。
所以我假设您正在尝试将 Unicode 字符转换为 HTML 实体。在这种情况下,使用预制模块之一应该会更好。如果由于编码问题(这在 Perl 中非常棘手)而不起作用,那么您的问题的答案是:
有没有像这样的编码选项
Run Code Online (Sandbox Code Playgroud)open FILE, "<", $file or die "Cannot open:$!\n", "UTF-8";
...可能会解决它,它也可能使您自己的尝试工作,但最好使用现成的 ;-) (顺便说一句,您在那里编写它的方式是“UTF-8 " 选项die让你有点难以理解你在问什么;-)
是的,有一个 UTF-8 选项,假设您有一个最近的perl(>= v5.8):
open(my $fh,'<:encoding(UTF-8)', $file) or die "Error opening $file: $!";
Run Code Online (Sandbox Code Playgroud)
(示例改编自perluniintro)
您还可以binmode用于更改已打开的文件句柄(例如 STDIN/OUT)。
binmode(STDOUT, ":encoding(UTF-8)");
Run Code Online (Sandbox Code Playgroud)
您还可以使用open pragma设置默认编码。
但为此,我建议尝试binmode或更改您的开放线路,看看是否能解决问题。
如果您的版本perl低于 v5.8,事情会更棘手,但如果您告诉我们版本,也许可以解决。
顺便说一下,我注意到的其他几件事:
my $fh而不是FILE)被认为更好。die字符串上放置换行符时,它会抑制通常添加的行号信息以帮助您找到问题。sub unicodeConvert($))。不要把$/ @/%等放在那里。它不只是检查事物,它可能会以令人困惑的方式改变含义。只需要创建新的“内置样式”操作符。