Rac*_*mad 6 php regex unicode preg-split
像这样的示例文档:
"国际象棋帮助我们克服困难和痛苦,"Unnikrishnan说,带走了我的女王."在棋盘上,你正在战斗.因为我们也在为日常生活中的艰辛而战."他说.
我想获得这样的输出:
Array
(
[0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
[1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
)
Run Code Online (Sandbox Code Playgroud)
我的代码仍然被点破坏.
function sample($string)
{
$data=array();
$break=explode(".", $string);
array_push($data, $break);
print_r($data);
}
Run Code Online (Sandbox Code Playgroud)
关于双引号和点分割两个分隔符,我仍然很困惑.因为在双引号内有一个包含点分隔符的句子.
这是一个更简单的模式,使用preg_split()follow bypreg_replace()来修复左右双引号(演示):
$in = \'\xe2\x80\x9cChess helps us overcome difficulties and sufferings,\xe2\x80\x9d said Unnikrishnan, taking my queen. \xe2\x80\x9cOn a chess board you are fighting. as we are also fighting the hardships in our daily life.\xe2\x80\x9d he said.\';\n\n$out = preg_split(\'/ (?=\xe2\x80\x9c)/\', $in, 0, PREG_SPLIT_NO_EMPTY);\n//$out = preg_match_all(\'/\xe2\x80\x9c.+?(?= \xe2\x80\x9c|$)/\', $in, $out) ? $out[0] : null;\n\n$find = \'/[\xe2\x80\x9c\xe2\x80\x9d]/u\'; // unicode flag is essential\n$replace = \'"\';\n$out = preg_replace($find, $replace, $out); // replace curly quotes with standard double quotes\n\nvar_export($out);\nRun Code Online (Sandbox Code Playgroud)\n输出:
\narray (\n 0 => \'"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.\',\n 1 => \'"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.\',\n)\nRun Code Online (Sandbox Code Playgroud)\npreg_split()匹配空格后跟\xe2\x80\x9c(左双引号)。
该preg_replace()步骤需要带有修饰符的模式,u以确保识别字符类中的左双引号和右双引号。使用\'/\xe2\x80\x9c|\xe2\x80\x9d/\'意味着您可以删除u修饰符,但它使正则表达式引擎必须执行的步骤加倍(在这种情况下,我的字符类仅使用 189 个步骤,而管道字符使用 372 个步骤)。
preg_split()此外,关于和之间的选择preg_match_all(),选择的原因preg_split()是因为目标只是在 后面的空格上分割字符串left double quote。 preg_match_all()如果目标是省略与分隔空格字符不相邻的子字符串,这将是一个更实际的选择。
尽管我的逻辑如此,如果您仍然想使用preg_match_all(),我的preg_split()行可以替换为:
$out = preg_match_all(\'/\xe2\x80\x9c.+?(?= \xe2\x80\x9c|$)/\', $in, $out) ? $out[0] : null;\nRun Code Online (Sandbox Code Playgroud)\n