Nic*_*mer 9 python regex parsing pyparsing
我想在其逗号中拆分Python多行字符串,除非逗号位于括号内的表达式中.例如,字符串
{J. Doe, R. Starr}, {Lorem
{i}psum dolor }, Dol. sit., am. et.
Run Code Online (Sandbox Code Playgroud)
应该拆分成
['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']
Run Code Online (Sandbox Code Playgroud)
这涉及括号匹配,所以可能正则表达式在这里没有帮助.PyParsing有commaSeparatedList
这几乎做什么,我需要的只是引用("
)环境得到保护,而不是{}
-delimited的.
任何提示?
dee*_*ets 12
编写自己的自定义拆分功能:
input_string = """{J. Doe, R. Starr}, {Lorem
{i}psum dolor }, Dol. sit., am. et."""
expected = ['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']
def split(s):
parts = []
bracket_level = 0
current = []
# trick to remove special-case of trailing chars
for c in (s + ","):
if c == "," and bracket_level == 0:
parts.append("".join(current))
current = []
else:
if c == "{":
bracket_level += 1
elif c == "}":
bracket_level -= 1
current.append(c)
return parts
assert split(input_string), expected
Run Code Online (Sandbox Code Playgroud)
你可以re.split
在这种情况下使用:
>>> from re import split
>>> data = '''\
... {J. Doe, R. Starr}, {Lorem
... {i}psum dolor }, Dol. sit., am. et.'''
>>> split(',\s*(?![^{}]*\})', data)
['{J. Doe, R. Starr}', '{Lorem\n{i}psum dolor }', 'Dol. sit.', 'am. et.']
>>>
Run Code Online (Sandbox Code Playgroud)
以下是正则表达式模式匹配的解释:
, # Matches ,
\s* # Matches zero or more whitespace characters
(?! # Starts a negative look-ahead assertion
[^{}]* # Matches zero or more characters that are not { or }
\} # Matches }
) # Closes the look-ahead assertion
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3749 次 |
最近记录: |