以逗号分割字符串,但在括号环境中除外

Nic*_*mer 9 python regex parsing pyparsing

我想在其逗号中拆分Python多行字符串,除非逗号位于括号内的表达式中.例如,字符串

{J. Doe, R. Starr}, {Lorem
{i}psum dolor }, Dol. sit., am. et.
Run Code Online (Sandbox Code Playgroud)

应该拆分成

['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']
Run Code Online (Sandbox Code Playgroud)

这涉及括号匹配,所以可能正则表达式在这里没有帮助.PyParsingcommaSeparatedList几乎做什么,我需要的只是引用(")环境得到保护,而不是{}-delimited的.

任何提示?

dee*_*ets 12

编写自己的自定义拆分功能:

 input_string = """{J. Doe, R. Starr}, {Lorem
 {i}psum dolor }, Dol. sit., am. et."""


 expected = ['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']

 def split(s):
     parts = []
     bracket_level = 0
     current = []
     # trick to remove special-case of trailing chars
     for c in (s + ","):
         if c == "," and bracket_level == 0:
             parts.append("".join(current))
             current = []
         else:
             if c == "{":
                 bracket_level += 1
             elif c == "}":
                 bracket_level -= 1
             current.append(c)
     return parts

 assert split(input_string), expected
Run Code Online (Sandbox Code Playgroud)


iCo*_*dez 6

你可以re.split在这种情况下使用:

>>> from re import split
>>> data = '''\
... {J. Doe, R. Starr}, {Lorem
... {i}psum dolor }, Dol. sit., am. et.'''
>>> split(',\s*(?![^{}]*\})', data)
['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']
>>>
Run Code Online (Sandbox Code Playgroud)

以下是正则表达式模式匹配的解释:

,       # Matches ,
\s*     # Matches zero or more whitespace characters
(?!     # Starts a negative look-ahead assertion
[^{}]*  # Matches zero or more characters that are not { or }
\}      # Matches }
)       # Closes the look-ahead assertion
Run Code Online (Sandbox Code Playgroud)

  • 对于稍微复杂一点的嵌套括号示例,这不会失败吗?例如`“{J. Doe,R. Starr {x,{y}}},{Lorem {i}psum dolor },Dol.sit.,am.et.”`? (2认同)