OIS*_*OIS 13 php string unicode reverse
在对这个问题的答案的评论中暗示PHP无法反转Unicode字符串.
至于Unicode,它适用于PHP,因为大多数应用程序将其处理为二进制.是的,PHP是8位干净的.在PHP中尝试相当于这个:perl -Mutf8 -e'print scalar reverse("ほげほげ")'你会得到垃圾,而不是"げほげほ". - jrockway
不幸的是,PHPs unicode支持atm最好是"缺乏".这将有望与PHP6彻底改变.
PHP的MultiByte函数确实提供了处理unicode所需的基本功能,但它不一致并且缺少很多功能.其中一个是反转字符串的函数.
我当然想要反驳这个文本,没有其他原因,然后弄清楚它是否可能.我做了一个函数来完成这个巨大的复杂任务来反转这个Unicode文本,所以你可以放松一点直到PHP6.
测试代码:
$enc = 'UTF-8';
$text = "????";
$defaultEnc = mb_internal_encoding();
echo "Showing results with encoding $defaultEnc.\n\n";
$revNormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text, $enc);
echo "Original text is: $text .\n";
echo "Normal strrev output: " . $revNormal . ".\n";
echo "mb_strrev without encoding output: $revInt.\n";
echo "mb_strrev with encoding $enc output: $revEnc.\n";
if (mb_internal_encoding($enc)) {
echo "\nSetting internal encoding to $enc from $defaultEnc.\n\n";
$revNormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text, $enc);
echo "Original text is: $text .\n";
echo "Normal strrev output: " . $revNormal . ".\n";
echo "mb_strrev without encoding output: $revInt.\n";
echo "mb_strrev with encoding $enc output: $revEnc.\n";
} else {
echo "\nCould not set internal encoding to $enc!\n";
}
Run Code Online (Sandbox Code Playgroud)
这是使用正则表达式的另一种方法:
function utf8_strrev($str){
preg_match_all('/./us', $str, $ar);
return implode(array_reverse($ar[0]));
}
Run Code Online (Sandbox Code Playgroud)
这是另一种方式.这似乎工作,而无需指定输出编码(使用几个不同的mb_internal_encodings 测试):
function mb_strrev($text)
{
return join('', array_reverse(
preg_split('~~u', $text, -1, PREG_SPLIT_NO_EMPTY)
));
}Run Code Online (Sandbox Code Playgroud)
字形函数比mbstring和PCRE函数更正确地处理UTF-8字符串/ Mbstring和PCRE可能会破坏字符。通过执行以下代码,您可以看到它们之间的差异。
function str_to_array($string)
{
$length = grapheme_strlen($string);
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = grapheme_substr($string, $i, 1);
}
return $ret;
}
function str_to_array2($string)
{
$length = mb_strlen($string, "UTF-8");
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = mb_substr($string, $i, 1, "UTF-8");
}
return $ret;
}
function str_to_array3($string)
{
return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
}
function utf8_strrev($string)
{
return implode(array_reverse(str_to_array($string)));
}
function utf8_strrev2($string)
{
return implode(array_reverse(str_to_array2($string)));
}
function utf8_strrev3($string)
{
return implode(array_reverse(str_to_array3($string)));
}
// http://www.php.net/manual/en/function.grapheme-strlen.php
$string = "a\xCC\x8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6)
var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },
[
'should be' => "o\xCC\x88"."a\xCC\x8A",
'grapheme' => utf8_strrev($string),
'mbstring' => utf8_strrev2($string),
'pcre' => utf8_strrev3($string)
]));
Run Code Online (Sandbox Code Playgroud)
结果在这里。
array(4) {
["should be"]=>
string(12) "6FCC8861CC8A"
["grapheme"]=>
string(12) "6FCC8861CC8A"
["mbstring"]=>
string(12) "CC886FCC8A61"
["pcre"]=>
string(12) "CC886FCC8A61"
}
Run Code Online (Sandbox Code Playgroud)
从PHP 5.5(intl 3.0)开始可以使用IntlBreakIterator;
function utf8_strrev($str)
{
$it = IntlBreakIterator::createCodePointInstance();
$it->setText($str);
$ret = '';
$pos = 0;
$prev = 0;
foreach ($it as $pos) {
$ret = substr($str, $prev, $pos - $prev) . $ret;
$prev = $pos;
}
return $ret;
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7504 次 |
| 最近记录: |