在PHP 5.3中替换UTF-8字符

Ale*_*ber 2 php utf-8 preg-replace php-5.3

为什么这个测试用例不起作用?

<?php
// cards with cyrillic inidices and suits in UTF-8 encoding
$a = array('7?', '??', '??', '8?', '??', '??', '10?', '10?', '??', '??');
foreach ($a as $card) {
        $suit = substr($card, -1);

        $card = preg_replace('/(\d+)?/', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)?/', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)?/', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)?/', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>
Run Code Online (Sandbox Code Playgroud)

输出:

suit: ?, html: <span class="black">7&spades;</span>
suit: ?, html: ??
suit: ?, html: ??
suit: ?, html: <span class="red">8&diams;</span>
suit: ?, html: ??
suit: ?, html: ??
suit: ?, html: <span class="black">10&clubs;</span>
suit: ?, html: <span class="red">10&hearts;</span>
suit: ?, html: ??
suit: ?, html: ??
Run Code Online (Sandbox Code Playgroud)

即我在PHP脚本中遇到2个问题:

  1. 为什么不能正确提取最后一个UTF-8字符?
  2. 为什么只有第一套服装被替换preg_replace

使用PHP 5.3.3,PostgreSQL 8.4.12在CentOS 6.2上持有UTF-8 JSON(带俄文和卡套装).

如果1.是PHP 5.3.3中的错误,那么有一个很好的解决方法吗?(我不想升级库存包).

更新:

<?php
$a = array('7?', '??', '??', '8?', '??', '??', '10?', '10?', '??', '??');
foreach ($a as $card) {
        $suit = mb_substr($card, -1, 1, 'UTF-8');

        $card = preg_replace('/(\d+)?/u', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)?/u', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)?/u', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)?/u', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>
Run Code Online (Sandbox Code Playgroud)

新输出:

suit: ?, html: <span class="black">7&spades;</span>
suit: ?, html: ??
suit: ?, html: ??
suit: ?, html: <span class="red">8&diams;</span>
suit: ?, html: ??
suit: ?, html: ??
suit: ?, html: <span class="black">10&clubs;</span>
suit: ?, html: <span class="red">10&hearts;</span>
suit: ?, html: ??
Run Code Online (Sandbox Code Playgroud)

dec*_*eze 10

substr是一个天真的PHP核心函数之一,它假定1个字节= 1个字符.从字符串中substr(..., -1)提取最后一个字节."♠"虽然长于一个字节.你应该使用mb_substr($card, -1, 1, 'UTF-8').

您需要将u(PCRE_UTF8)修饰符添加到正则表达式,以使其正确处理UTF-8编码的表达式和字符串:

preg_replace('/(\d+)?/u', ...
Run Code Online (Sandbox Code Playgroud)