the*_*cat 9 php regex string preg-split
如何将文本拆分成一系列句子?
示例文字:
给我一个海狸.给我一个海狸!给我一个海狸?炸我海狸没有.4?炸了我很多海狸......结束
应输出:
0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End
Run Code Online (Sandbox Code Playgroud)
我尝试了一些我通过搜索在SO上找到的解决方案,但它们都失败了,特别是在第4句.
/(?<=[!?.])./
/\.|\?|!/
/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/
/(?<=[.!?]|[.!?][\'"])\s+/ // <- closest one
Run Code Online (Sandbox Code Playgroud)
Ham*_*mZa 28
既然你想"分裂"句子,你为什么试图匹配它们呢?
对于这种情况,让我们使用preg_split().
码:
$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])\s+(?=[a-z])/i', $str);
print_r($sentences);
Run Code Online (Sandbox Code Playgroud)
输出:
Array
(
[0] => Fry me a Beaver.
[1] => Fry me a Beaver!
[2] => Fry me a Beaver?
[3] => Fry me Beaver no. 4?!
[4] => Fry me many Beavers...
[5] => End
)
Run Code Online (Sandbox Code Playgroud)
说明:
好吧,简单地说,我们按分组空间分割\ s +并做两件事:
(?<= [.?!])断言背后的正面看法,基本上我们搜索空格后面是否有点或问号或感叹号.
(?= [az])正向前看断言,搜索空格后面是否有字母,这是no. 4问题的解决方法.