多行python正则表达式

Question

多行python正则表达式

jma*_*gue 6 python regex multiline regex-greedy

我有一个像这样结构的文件:

A: some text
B: more text
even more text
on several lines
A: and we start again
B: more text
more
multiline text

Run Code Online (Sandbox Code Playgroud)

我试图找到将这样拆分我的文件的正则表达式:

>>>re.findall(regex,f.read())
[('some text','more text','even more text\non several lines'),
 ('and we start again','more text', 'more\nmultiline text')]

Run Code Online (Sandbox Code Playgroud)

到目前为止,我最终得到了以下内容:

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL)
[(' some text', ' more text', ''), (' and we start again', ' more text', '')]

Run Code Online (Sandbox Code Playgroud)

多线文本没有被捕获.我想是因为懒惰的限定符真的很懒,什么都没有,但我把它拿出来,正则表达式变得非常贪婪:

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL)
[(' some text',
' more text',
'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')]

Run Code Online (Sandbox Code Playgroud)

有人有想法吗？谢谢 !

Answer 1

Tim*_*ker 5

您可以告诉正则表达式在以A:（或字符串的结尾）开头的下一行停止匹配：

re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE)

Run Code Online (Sandbox Code Playgroud)

@ user1731620不要忘了“接受”可以帮助您的答案。 (5认同)

归档时间：	13 年，2 月前
查看次数：	2287 次
最近记录：	13 年，2 月前