我有一个简单的 PHP 一维数组。
当我执行 var dump ( echo var_dump($a)) 时,我将其作为输出:
array(3) { [0]=> string(3) "?" [1]=> string(21) "exhausted||to exhaust" [2]=> string(4) "jin3" }
Run Code Online (Sandbox Code Playgroud)
但是,当我对它进行 json_encode ( echo json_encode($a)) 时,我得到了这个:
["\u5c3d","exhausted||to exhaust","jin3"]
Run Code Online (Sandbox Code Playgroud)
它返回的十六进制值是正确的,但我不知道如何阻止它给我十六进制。我只是想让它显示字符。
如果我echo mb_internal_encoding()返回 UTF-8,这就是我设置的。我在所有字符串操作中都非常小心地使用 mb_ 函数,因此没有任何数据被弄乱。
我知道我可以编写一个修改过的 json_encode 函数来解决这个问题。但我想知道这里发生了什么。
我知道这个问题比较老,但我想我会借用我在中国的工作to_json和to_utf8函数——其中包括一些很好的格式 (JSON_PRETTY_PRINT) 在开发和缩小生产时。(适应你自己的环境/系统)
// Produces JSON with Chinese Characters fully un-encoded.
// NOT RFC4627 compliant
json_encode($data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
Run Code Online (Sandbox Code Playgroud)
function to_json($data, $pretty=null, $inculde_security=false, $try_to_recover=true) {
// @Note: json_encode() *REQUIRES* data to be in valid UTF8 format BEFORE
// trying to json_encode and since we are working with Chinese
// characters, we need to make sure that we explicitly allow:
// JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES
// *Unless a mode is explicitly passed into the function
$json_encoded = '{}';
if ($pretty === null && is_env_prod()) { // @NOTE: Substitute with your own Production env check
$json_encoded = json_encode( $data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES );
} else if ($pretty === null && is_env_dev()){ // @NOTE: Substitute with your own Development env check
$json_encoded = json_encode( $data, JSON_PRETTY_PRINT|JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES );
} else {
// PRODUCTION
$json_encoded = json_encode( $data, $pretty );
}
// (1) Do not return an error if the inital data was empty
// (2) Return an error if json_encode() failed
if (json_last_error() > 0) {
if (!!$data || !empty($data)) {
if (!$json_encoded == false || empty($json_encoded) || $json_encoded == '{}') {
$json_encoded = json_encode([
'status' => false,
'error' => [
'json_last_error' => json_last_error(),
'json_last_error_msg' => json_last_error_msg()
]
]);
} else if (!!$try_to_recover) {
// there was data in $data so lets try to forensically recover a little? by removing $k => $v pairs that fail to be JSON encoded
foreach (((array) $data) as $k => $v) {
if (!json_encode([$k => $v])) {
if (is_array($data)) {
unset($data[$k]);
} else if (is_object($data)) {
unset($data->{$k});
}
}
}
// if the data still is not empty, and there is a status set in the data
// then set it to false and add a error message/data
// ONLY for Array & Objects
if (!empty($json_encoded) && count($json_encoded) < 1) {
if (!json_encode($data)) {
if (is_array($json_encoded)) {
$json_encoded['status'] = false;
$json_encoded['message'] = "json_encoding_error";
$json_encoded['error'] = [
'json_last_error' => json_last_error(),
'json_last_error_msg' => json_last_error_msg()
];
} else if (is_object($json_encoded)) {
$json_encoded->status = false;
$json_encoded->message = "json_encoding_error";
$json_encoded->error = [
'json_last_error' => json_last_error(),
'json_last_error_msg' => json_last_error_msg()
];
}
} else {
// We have removed the offending data
return to_json($data, $pretty, $include_security, $try_to_recover);
}
}
// we've cleaned out any data that was causing the problem, and included
// false to indicate this is a one-time recursion recovery.
return $this->to_json($pretty, $include_security, false);
}
} else { } // don't do anything as the value is already false
}
return ( ($inculde_security) ? ")]}',\n" : '' ) . $json_encoded;
}
Run Code Online (Sandbox Code Playgroud)
另一个可能to_utf8()有用的函数是我的递归函数:
// @NOTE: Common Chinese GBK encoding: to_utf8($data, 'GB2312')
function to_utf8($in, $source_encoding='HTML-ENTITIES') {
if (is_string($in)) {
return mb_convert_encoding(
$in,
$source_encoding,
'UTF-8'
);
} else if (is_array($in) || is_object($in)) {
array_walk_recursive($in, function(&$item, &$key) {
$key = to_utf8($key);
if (is_object($item) || is_array($item)) {
$item = to_utf8($item);
} else {
if (!mb_detect_encoding($item, 'UTF-8', true)){
$item = utf8_encode($item);
}
}
});
$ret_object = is_object($in);
return ($ret_object) ? (object) $in : (array) $in;
}
return $in;
}
Run Code Online (Sandbox Code Playgroud)
$pcre_regex = '
/
(?(DEFINE)
(?<number> -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
(?<boolean> true | false | null )
(?<string> " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
(?<array> \[ (?: (?&json) (?: , (?&json) )* )? \s* \] )
(?<pair> \s* (?&string) \s* : (?&json) )
(?<object> \{ (?: (?&pair) (?: , (?&pair) )* )? \s* \} )
(?<json> \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
)
\A (?&json) \Z
/six
';
$matches = false;
preg_match($pcre_regex, trim($body), $matches);
var_dump('RFC4627 Verification (Regex) ', [
'has_passed' => (count($matches) == 1) ? 'YES' : 'NO',
'matches' => $matches
]);
Run Code Online (Sandbox Code Playgroud)
// One Liner
is_string($json_string) && !preg_match('/[^,:{}\\[\\]0-9.\\-+Eaeflnr-u \\n\\r\\t]/', preg_replace('/"(\\.|[^"\\\\])*"/', '', $json_string));
// Alt Function — more consistant
function is_json($json_string) {
if (!is_string($json_string) || is_numeric($json_string)) {
return false;
}
$val = @json_decode($json_string);
return ($val != null) && (json_last_error() === JSON_ERROR_NONE);
// Inconsistant results, reverted to json_decode() + JSON_ERROR_NONE check
// return is_string($json_string) && !preg_match('/[^,:{}\\[\\]0-9.\\-+Eaeflnr-u \\n\\r\\t]/', preg_replace('/"(\\.|[^"\\\\])*"/', '', $json_string));
}
Run Code Online (Sandbox Code Playgroud)
function is_utf8($str) {
if (is_array($str)) {
foreach ($str as $k=>$v) {
if (is_string($v) && !is_utf8($v)) {
return false;
}
}
}
return (is_string($str) && preg_match('//u', $str));
}
Run Code Online (Sandbox Code Playgroud)
我也做过同样的调查json_encode()。我的结论是,你无法改变这种行为,但它不会造成任何问题,所以我就这样离开。
如果您确实不喜欢它,请preg_replace_callback()在输出上进行json_encode()操作并将代码点十六进制转换回 UTF-8 字符。
| 归档时间: |
|
| 查看次数: |
10627 次 |
| 最近记录: |