PHP - 带有特殊字符的X字符后的子串

Hip*_*pny 3 php special-characters

对不起标题,我真的不知道怎么说这个...

我经常有一个字符串需要在X字符后剪切,我的问题是这个字符串经常包含特殊字符,如:è

所以,我想知道,他们是一种在php中知道的方式,而不是改变我的字符串,如果我在剪切字符串时,我正处于一个特殊字符的中间.

This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
Run Code Online (Sandbox Code Playgroud)

所以现在我的子字符串结果将是:

This is my string with a special char : &egra
Run Code Online (Sandbox Code Playgroud)

但是我想要这样的东西:

This is my string with a special char : è
Run Code Online (Sandbox Code Playgroud)

Fra*_*ila 7

这里最好的做法是将您的字符串存储为UTF-8而不使用任何html实体,并使用mb_*函数族utf8作为编码.

但是,如果您的字符串是ASCII或iso-8859-1/win1252,则可以使用HTML-ENTITIESmb_string库的特殊编码:

$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
Run Code Online (Sandbox Code Playgroud)

但是,如果你的底层字符串是UTF-8或其他一些多字节编码,使用HTML-ENTITIES不安全的!这是因为HTML-ENTITIES真正意味着"win1252具有高位字符作为html实体".这是一个可能出错的例子:

// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'é'
// should be 'é '
Run Code Online (Sandbox Code Playgroud)

当您的字符串采用多字节编码时,您必须在拆分之前将所有html实体转换为通用编码.例如:

$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding); 
$s_trunc_noentities =  mb_substr($s_noentities, 0, 41, $strings_actual_encoding);
Run Code Online (Sandbox Code Playgroud)