如何在Perl中处理商标字符™

php*_*ete 1 perl encoding character-encoding

我正在使用Perl从SQLite数据库和WWW:Mechanize模块中获取数据以进行一些Web抓取.

我发布的数据(在数据库中)中有一些字符,在查看网站上的文字后,它有几个奇怪的字符:â¢而不是.

我在Perl程序的顶部设置了以下内容.我用它来防止终端中有关"宽字符"的警告.

binmode(STDOUT, ":utf-8");
Run Code Online (Sandbox Code Playgroud)

我对编码/解码字符并不是很了解,所以任何帮助都会有用.

编辑:在阅读了关于Perl IO之后,我能够找到这个解决了我的问题的stackoverflow答案.

ike*_*ami 5

解码输入,编码输出.

use open ':std', ':encoding(UTF-8)';  # Outputs are UTF-8
BEGIN { binmode STDIN; }              # ...but not the raw CGI request.

use CGI qw( -utf8 );                  # Decode parameters
use DBI qw( );

{
   my $cgi = CGI->new();
   print $cgi->header(
      -type    => "text/plain",  # Just cause it's shorter.
      -charset => "UTF-8",       # Tell browser encoding used.
   );

   my $dbh = DBI->connect(
      "dbi:SQLite:dbname=/tmp/tmp.sqlite", "", "",
      {
         AutoCommit     => 1,
         RaiseError     => 1,
         PrintError     => 0,
         PrintWarn      => 1,
         sqlite_unicode => 1,   # Encode and decode for us.
      },
   );

   $dbh->do("CREATE TABLE Testing ( str TEXT )");
Run Code Online (Sandbox Code Playgroud)

   my $from_html_parser = "\x{2122}";

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_html_parser = %v04X\n", $from_html_parser);

   print("$from_html_parser\n");

   $dbh->do("INSERT INTO Testing VALUES (?)", undef, $from_html_parser);
Run Code Online (Sandbox Code Playgroud)

   my $from_database = $dbh->selectrow_array("SELECT * FROM Testing");

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_database = %v04X\n", $from_database);

   print("$from_database\n");
}

END { unlink("/tmp/tmp.sqlite"); }
Run Code Online (Sandbox Code Playgroud)