Python正则表达式只匹配一次

Question

Python正则表达式只匹配一次

我正在尝试创建一个简单的降价乳胶转换器,只是为了学习python和基本的正则表达式,但我不知道试图弄清楚为什么下面的代码不起作用:

re.sub (r'\[\*\](.*?)\[\*\]: ?(.*?)$',  r'\\footnote{\2}\1', s, flags=re.MULTILINE|re.DOTALL)

Run Code Online (Sandbox Code Playgroud)

我想转换像:

s = """This is a note[*] and this is another[*]
[*]: some text
[*]: other text"""

Run Code Online (Sandbox Code Playgroud)

至:

This is a note\footnote{some text} and this is another\footnote{other text}

Run Code Online (Sandbox Code Playgroud)

这就是我得到的(使用上面的正则表达式):

This is a note\footnote{some text} and this is another[*]

[*]: note 2

Run Code Online (Sandbox Code Playgroud)

为什么模式只匹配一次？

编辑:

我尝试了以下先行断言:

re.sub(r'\[\*\](?!:)(?=.+?\[\*\]: ?(.+?)$',r'\\footnote{\1}',flags=re.DOTALL|re.MULTILINE)
#(?!:) is to prevent [*]: to be matched

Run Code Online (Sandbox Code Playgroud)

现在它匹配所有脚注,但它们没有正确匹配.

s = """This is a note[*] and this is another[*]
[*]: some text
[*]: other text"""

Run Code Online (Sandbox Code Playgroud)

给了我

This is a note\footnote{some text} and this is another\footnote{some text}
[*]: note 1
[*]: note 2

Run Code Online (Sandbox Code Playgroud)

有什么想法吗？

Answer 1

Cas*_*yte 2

原因是您无法多次匹配相同的字符。一旦字符匹配，它就会被正则表达式引擎消耗，并且不能重复用于其他匹配。

（一般）解决方法包括使用捕获组捕获先行断言内的重叠部分。但在您的情况下无法完成此操作，因为无法区分哪个注释与占位符相关联。

更简单的方法是首先提取列表中的所有注释，然后用回调替换每个占位符。例子：

import re

s='''This is a note[*] and this is another[*]
[*]: note 1
[*]: note 2'''

# text and notes are separated
[text,notes] = re.split(r'((?:\r?\n\[\*\]:[^\r\n]*)+$)', s)[:-1]

# this generator gives the next replacement string 
def getnote(notes):
    for note in re.split(r'\r?\n\[\*\]: ', notes)[1:]:
        yield r'\footnote{{{}}}'.format(note)

note = getnote(notes)

res = re.sub(r'\[\*\]', lambda m: note.next(), text)
print res

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，6 月前
查看次数：	795 次
最近记录：	10 年，6 月前