相关疑难解决方法(0)

  header("Content-Type: text/plain; charset=utf-8"); // same if text/html
  //PHP Version 5.5.4-1+debphp.org~precise+1
  //using a .php file enconded as UTF8.

  $s = "THE UTF-8 NO-BREAK\xA0SPACE"; // a non-ASCII byte
  preg_match_all('/[-\'\p{L}]+/u',$s,$m);
  var_dump($m);            // empty! (corrupted)
  $m=str_word_count($s,1);
  var_dump($m);            // ok

  $s = "THE UTF-8 NO-BREAK\xC2\xA0SPACE";  // utf8-encoded nbsp
  preg_match_all('/[-\'\p{L}]+/u',$s,$m);
  var_dump($m);            // ok!
  $m=str_word_count($s,1);
  var_dump($m);            // ok

Run Code Online (Sandbox Code Playgroud)

php utf-8

Pet*_*uss

lucky-day

6
推荐指数

1
解决办法

899
查看次数

PHP cp1252/windows-1252转换为UTF-8

我正在尝试将我们的数据库从latin1转换为UTF-8.不幸的是,我无法进行大规模的单一切换,因为应用程序需要保持在线状态,我们有700GB的数据库进行转换.

所以我试图利用一个小的mysql hack将表转换为UTF-8而不是数据.我想要实时读取,转换和替换数据.(如果你愿意,可以进行JIT转换)

我们的php应用程序目前使用所有默认值,因此它使用latin1字符集连接到mysql,并丢弃以latin1编码的UTF-8数据.使用latin1查看数据时,UTF-8字符将按预期显示.当您使用UTF-8查看数据时,事情会变得混乱.

所以我建议将mysql字符集强制为UTF-8,然后在必要时进行数据的及时转换.现在,看到cp1252/windows-1252是UTF-8的子集,它不是那么直接(据我所知)来检测cp1252/windows-1252编码.

我编写了以下代码,试图检测cp1252/windows-1252编码并根据需要进行转换.它还应检测正确编码的UTF-8字符并且不执行任何操作.

$a = 'Cardâ˜ƒ'; //cp1252 encoded
$a_test = '?'.$a; //add known UTF8 character
$c = mb_convert_encoding($a_test, 'cp1252', 'UTF-8');
// attempt to detect known utf8 character after conversion
if (mb_strpos($c, '?') === false) {
    // not found, original string was not cp1252 encoded, so print
    var_dump($a);
} else {
    // found, original string was cp1252 encoded, remove test character and print
    // This case runs
    $c = mb_strcut($c, 1);
    var_dump($c);
}

$a = 'COD?'; //proper UTF8 …

Run Code Online (Sandbox Code Playgroud)

php mysql encoding utf-8 character-encoding

rna*_*rro

lucky-day

5
推荐指数

1
解决办法

1万
查看次数