正则表达式拆分字符串保留引号

Question

正则表达式拆分字符串保留引号

我需要根据空格作为分隔符拆分下面的字符串.但是应该保留引用中的任何空格.

research library "not available" author:"Bernard Shaw"

Run Code Online (Sandbox Code Playgroud)

至

research
library
"not available"
author:"Bernard Shaw"

Run Code Online (Sandbox Code Playgroud)

我试图在C Sharp中做到这一点,我有这个正则表达式:@"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*"""来自SO中的另一个帖子,它将字符串拆分为

research
library
"not available"
author
"Bernard Shaw"

Run Code Online (Sandbox Code Playgroud)

遗憾的是,这不符合我的确切要求.

我正在寻找任何正则表达式,这将成功.

任何帮助赞赏.

Answer 1

Tim*_*ker 27

只要引用的字符串中没有转义引用,以下内容应该有效:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

Run Code Online (Sandbox Code Playgroud)

只有在空格字符前面和后面跟偶数引号时,此正则表达式才会拆分.

没有所有那些逃脱引用的正则表达式解释说:

(?<=      # Assert that it's possible to match this before the current position (positive lookbehind):
 ^        # The start of the string
 [^"]*    # Any number of non-quote characters
 (?:      # Match the following group...
  "[^"]*  # a quote, followed by any number of non-quote characters
  "[^"]*  # the same
 )*       # ...zero or more times (so 0, 2, 4, ... quotes will match)
)         # End of lookbehind assertion.
[ ]       # Match a space
(?=       # Assert that it's possible to match this after the current position (positive lookahead):
 (?:      # Match the following group...
  [^"]*"  # see above
  [^"]*"  # see above
 )*       # ...zero or more times.
 [^"]*    # Match any number of non-quote characters
 $        # Match the end of the string
)         # End of lookahead assertion

Run Code Online (Sandbox Code Playgroud)

@TimPietzcker好，我不知道为什么，但是我问了几乎相同的问题（http://stackoverflow.com/questions/33886103/how-to-find-recurring-word-groups-in-text-with-c）和我收到了诸如“这里不是代码编写服务”或“不清楚”之类的过多反应，因此我尝试在注释中尝试一下。 (2认同)

归档时间：	14 年，9 月前
查看次数：	7566 次
最近记录：	12 年，6 月前