如何反转Unicode字符串

OIS*_*OIS 13 php string unicode reverse

对这个问题的答案评论中暗示PHP无法反转Unicode字符串.

至于Unicode,它适用于PHP,因为大多数应用程序将其处理为二进制.是的,PHP是8位干净的.在PHP中尝试相当于这个:perl -Mutf8 -e'print scalar reverse("ほげほげ")'你会得到垃圾,而不是"げほげほ". - jrockway

不幸的是,PHPs unicode支持atm最好是"缺乏".这将有望与PHP6彻底改变.

PHP的MultiByte函数确实提供了处理unicode所需的基本功能,但它不一致并且缺少很多功能.其中一个是反转字符串的函数.

我当然想要反驳这个文本,没有其他原因,然后弄清楚它是否可能.我做了一个函数来完成这个巨大的复杂任务来反转这个Unicode文本,所以你可以放松一点直到PHP6.

测试代码:

$enc = 'UTF-8';
$text = "????";
$defaultEnc = mb_internal_encoding();

echo "Showing results with encoding $defaultEnc.\n\n";

$revNormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text, $enc);

echo "Original text is: $text .\n";
echo "Normal strrev output: " . $revNormal . ".\n";
echo "mb_strrev without encoding output: $revInt.\n";
echo "mb_strrev with encoding $enc output: $revEnc.\n";

if (mb_internal_encoding($enc)) {
    echo "\nSetting internal encoding to $enc from $defaultEnc.\n\n";

    $revNormal = strrev($text);
    $revInt = mb_strrev($text);
    $revEnc = mb_strrev($text, $enc);

    echo "Original text is: $text .\n";
    echo "Normal strrev output: " . $revNormal . ".\n";
    echo "mb_strrev without encoding output: $revInt.\n";
    echo "mb_strrev with encoding $enc output: $revEnc.\n";

} else {
    echo "\nCould not set internal encoding to $enc!\n";
}
Run Code Online (Sandbox Code Playgroud)

Fix*_*ree 9

这是使用正则表达式的另一种方法:

function utf8_strrev($str){
 preg_match_all('/./us', $str, $ar);
 return implode(array_reverse($ar[0]));
}
Run Code Online (Sandbox Code Playgroud)


sea*_*lea 6

这是另一种方式.这似乎工作,而无需指定输出编码(使用几个不同的mb_internal_encodings 测试):

function mb_strrev($text)
{
    return join('', array_reverse(
        preg_split('~~u', $text, -1, PREG_SPLIT_NO_EMPTY)
    ));
}
Run Code Online (Sandbox Code Playgroud)


mas*_*tic 5

字形函数比mbstring和PCRE函数更正确地处理UTF-8字符串/ Mbstring和PCRE可能会破坏字符。通过执行以下代码,您可以看到它们之间的差异。

function str_to_array($string)
{
    $length = grapheme_strlen($string);
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

        $ret[] = grapheme_substr($string, $i, 1);
    }

    return $ret;
}

function str_to_array2($string)
{
    $length = mb_strlen($string, "UTF-8");
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

    $ret[] = mb_substr($string, $i, 1, "UTF-8");
}

    return $ret;
}

function str_to_array3($string)
{
    return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
}

function utf8_strrev($string)
{
    return implode(array_reverse(str_to_array($string)));
}

function utf8_strrev2($string)
{
    return implode(array_reverse(str_to_array2($string)));
}

function utf8_strrev3($string)
{
    return implode(array_reverse(str_to_array3($string)));
}

// http://www.php.net/manual/en/function.grapheme-strlen.php
$string = "a\xCC\x8A"  // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
         ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS'  (U+00F6)

var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },
[
  'should be' => "o\xCC\x88"."a\xCC\x8A",
  'grapheme' => utf8_strrev($string),
  'mbstring' => utf8_strrev2($string),
  'pcre' => utf8_strrev3($string)
]));
Run Code Online (Sandbox Code Playgroud)

结果在这里。

array(4) {
  ["should be"]=>
  string(12) "6FCC8861CC8A"
  ["grapheme"]=>
  string(12) "6FCC8861CC8A"
  ["mbstring"]=>
  string(12) "CC886FCC8A61"
  ["pcre"]=>
  string(12) "CC886FCC8A61"
}
Run Code Online (Sandbox Code Playgroud)

从PHP 5.5(intl 3.0)开始可以使用IntlBreakIterator;

function utf8_strrev($str)
{
    $it = IntlBreakIterator::createCodePointInstance();
    $it->setText($str);

    $ret = '';
    $pos = 0;
    $prev = 0;

    foreach ($it as $pos) {
        $ret = substr($str, $prev, $pos - $prev) . $ret;
        $prev = $pos;
    }

    return $ret;  
}
Run Code Online (Sandbox Code Playgroud)