Hip*_*pny 3 php special-characters
对不起标题,我真的不知道怎么说这个...
我经常有一个字符串需要在X字符后剪切,我的问题是这个字符串经常包含特殊字符,如:è
所以,我想知道,他们是一种在php中知道的方式,而不是改变我的字符串,如果我在剪切字符串时,我正处于一个特殊字符的中间.
例
This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
Run Code Online (Sandbox Code Playgroud)
所以现在我的子字符串结果将是:
This is my string with a special char : &egra
Run Code Online (Sandbox Code Playgroud)
但是我想要这样的东西:
This is my string with a special char : è
Run Code Online (Sandbox Code Playgroud)
这里最好的做法是将您的字符串存储为UTF-8而不使用任何html实体,并使用mb_*函数族utf8作为编码.
但是,如果您的字符串是ASCII或iso-8859-1/win1252,则可以使用HTML-ENTITIESmb_string库的特殊编码:
$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
Run Code Online (Sandbox Code Playgroud)
但是,如果你的底层字符串是UTF-8或其他一些多字节编码,使用HTML-ENTITIES是不安全的!这是因为HTML-ENTITIES真正意味着"win1252具有高位字符作为html实体".这是一个可能出错的例子:
// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'é'
// should be 'é '
Run Code Online (Sandbox Code Playgroud)
当您的字符串采用多字节编码时,您必须在拆分之前将所有html实体转换为通用编码.例如:
$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding);
$s_trunc_noentities = mb_substr($s_noentities, 0, 41, $strings_actual_encoding);
Run Code Online (Sandbox Code Playgroud)