我怎么能找到除引号之间的所有空格?

alt*_*ern 7 php regex

我需要按空格分割字符串,但引号中的短语应保留未分割.例:

  word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5
Run Code Online (Sandbox Code Playgroud)

这应该导致preg_split之后的数组:

array(
 [0] => 'word1',
 [1] => 'word2',
 [2] => 'this is a phrase',
 [3] => 'word3',
 [4] => 'word4',
 [5] => 'this is a second phrase',
 [6]  => 'word5'
)
Run Code Online (Sandbox Code Playgroud)

我应该如何构建我的正则表达式呢?

PS.有相关的问题,但我不认为它适用于我的情况.接受的答案提供regexp来查找单词而不是空格.

alt*_*ern 9

在#regex irc channel(irc.freenode.net)的用户MizardX的帮助下找到了解决方案.它甚至支持单引号.

$str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';

$regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/';

$arr = preg_split($regexp, $str);

print_r($arr);
Run Code Online (Sandbox Code Playgroud)

结果是:

Array (
    [0] => word1
    [1] => word2
    [2] => 'this is a phrase'
    [3] => word3
    [4] => word4
    [5] => "this is a second phrase"
    [6] => word5
    [7] => word1
    [8] => word2
    [9] => "this is a phrase"
    [10] => word3
    [11] => word4
    [12] => "this is a second phrase"
    [13] => word5  
)
Run Code Online (Sandbox Code Playgroud)

PS.唯一的缺点是这个正则表达式只适用于PCRE 7.

原来我在生产服务器上没有PCRE 7支持,只安装了PCRE 6.即使它不像以前的PCRE 7那样灵活,可行的正则表达式(摆脱\ G和\ K):

/(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/
Run Code Online (Sandbox Code Playgroud)

对于给定的输入结果与上面相同.