Ant*_*ony 27 php unicode utf-8
我有这种格式的数据:U+597D
或者像这样U+6211
.我想将它们转换为UTF-8(原始字符是好和我).我该怎么做?
Mez*_*Mez 33
$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');
Run Code Online (Sandbox Code Playgroud)
可能是最简单的解决方案.
vel*_*row 25
function utf8($num)
{
if($num<=0x7F) return chr($num);
if($num<=0x7FF) return chr(($num>>6)+192).chr(($num&63)+128);
if($num<=0xFFFF) return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
if($num<=0x1FFFFF) return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);
return '';
}
function uniord($c)
{
$ord0 = ord($c{0}); if ($ord0>=0 && $ord0<=127) return $ord0;
$ord1 = ord($c{1}); if ($ord0>=192 && $ord0<=223) return ($ord0-192)*64 + ($ord1-128);
$ord2 = ord($c{2}); if ($ord0>=224 && $ord0<=239) return ($ord0-224)*4096 + ($ord1-128)*64 + ($ord2-128);
$ord3 = ord($c{3}); if ($ord0>=240 && $ord0<=247) return ($ord0-240)*262144 + ($ord1-128)*4096 + ($ord2-128)*64 + ($ord3-128);
return false;
}
Run Code Online (Sandbox Code Playgroud)
utf8()和uniord()尝试镜像php上的chr()和ord()函数:
echo utf8(0x6211)."\n";
echo uniord(utf8(0x6211))."\n";
echo "U+".dechex(uniord(utf8(0x6211)))."\n";
//In your case:
$wo='U+6211';
$hao='U+597D';
echo utf8(hexdec(str_replace("U+","", $wo)))."\n";
echo utf8(hexdec(str_replace("U+","", $hao)))."\n";
Run Code Online (Sandbox Code Playgroud)
输出:
?
25105
U+6211
?
?
Run Code Online (Sandbox Code Playgroud)
我刚刚写了一个polyfill
缺少的多字节版本,ord
并chr
考虑到以下内容:
它定义了函数,mb_ord
并且mb_chr
只有它们尚不存在时才定义.如果它们确实存在于您的框架或PHP的未来版本中,则将忽略polyfill.
它使用广泛使用的mbstring
扩展来进行转换.如果mbstring
未加载扩展名,则会使用iconv
扩展名.
我还为HTML实体编码/解码和编码/解码添加了JSON格式的函数以及一些如何使用这些函数的演示代码
if (!function_exists('codepoint_encode')) {
function codepoint_encode($str) {
return substr(json_encode($str), 1, -1);
}
}
if (!function_exists('codepoint_decode')) {
function codepoint_decode($str) {
return json_decode(sprintf('"%s"', $str));
}
}
if (!function_exists('mb_internal_encoding')) {
function mb_internal_encoding($encoding = NULL) {
return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
}
}
if (!function_exists('mb_convert_encoding')) {
function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
}
}
if (!function_exists('mb_chr')) {
function mb_chr($ord, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
return pack("N", $ord);
} else {
return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
}
}
}
if (!function_exists('mb_ord')) {
function mb_ord($char, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
return $ord;
} else {
return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
}
}
}
if (!function_exists('mb_htmlentities')) {
function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {
return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {
return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));
}, $string);
}
}
if (!function_exists('mb_html_entity_decode')) {
function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {
return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);
}
}
Run Code Online (Sandbox Code Playgroud)
echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));
echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));
echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('?'));
var_dump(mb_ord('?'));
echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('?')));
var_dump(dechex(mb_ord('?')));
echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('??', false));
var_dump(mb_html_entity_decode('我好'));
echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('??'));
var_dump(mb_html_entity_decode('我好'));
echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("??"));
var_dump(codepoint_decode('\u6211\u597d'));
Run Code Online (Sandbox Code Playgroud)
Get string from numeric DEC value
string(3) "?"
string(3) "?"
Get string from numeric HEX value
string(3) "?"
string(3) "?"
Get numeric value of character as DEC string
int(25105)
int(22909)
Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"
Encode / decode to DEC based HTML entities
string(16) "我好"
string(6) "??"
Encode / decode to HEX based HTML entities
string(16) "我好"
string(6) "??"
Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "??"
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
68221 次 |
最近记录: |