除了某些字符之外,在空格上分割

MxL*_*evs 3 python string-parsing

我正在解析一个包含如下行的文件

type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")

我想把它分成不同的字段.

在我的示例中,有四个字段:类型,标题,页面和注释.

分裂后的期望结果是

['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments("good read")]

很明显,简单的字符串拆分不起作用,因为它只会在每个空间分开.我想拆分空格,但保留括号和引号之间的任何内容.

我怎么能分开这个?

Nar*_*ala 13

这个正则表达式应该适合你 \s+(?=[^()]*(?:\(|$))

result = re.split(r"\s+(?=[^()]*(?:\(|$))", subject)
Run Code Online (Sandbox Code Playgroud)

说明

r"""
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=            # Assert that the regex below can be matched, starting at this position (positive lookahead)
   [^()]          # Match a single character NOT present in the list “()”
      *              # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:              # Match the regular expression below
                     # Match either the regular expression below (attempting the next alternative only if this one fails)
         \(             # Match the character “(” literally
      |              # Or match regular expression number 2 below (the entire group fails if this one fails to match)
         $              # Assert position at the end of a line (at the end of the string or before a line break character)
   )
)
"""
Run Code Online (Sandbox Code Playgroud)

  • 试试这个:`re.split(r"\ s +(?= [^()]*(?:\(| $))",主题)` (2认同)