Python:如何在re.sub()的替换参数中添加计数器

Nem*_*XXX 1 html python regex auto-increment html-parsing

我想将id添加到html标签.例如,我想改变:

<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
Run Code Online (Sandbox Code Playgroud)

<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>
Run Code Online (Sandbox Code Playgroud)

IIRC,可以使用lambda函数来实现此功能,但我不记得确切的语法.

ale*_*cxe 6

我会使用HTML解析器,比如BeautifulSoup.

我们的想法是使用enumerate()索引迭代所有段落,从以下开始1:

from bs4 import BeautifulSoup

data = """
<p>First paragraph</p>
<p>Second paragraph</p>
<p>Third paragraph</p>
"""

soup = BeautifulSoup(data, 'html.parser')
for index, p in enumerate(soup.find_all('p'), start=1):
    p['id'] = index

print soup
Run Code Online (Sandbox Code Playgroud)

打印:

<p id="1">First paragraph</p>
<p id="2">Second paragraph</p>
<p id="3">Third paragraph</p>
Run Code Online (Sandbox Code Playgroud)


Mic*_*x2a 5

如果您想使用正则表达式,快速但肮脏的解决方案是使用全局变量,如下所示:

i = 0

def replace(match):
    global i
    i += 1
    return '<p id="{0}">'.format(i)

re.sub(pattern, replace, your_string)
Run Code Online (Sandbox Code Playgroud)

或者,您可以创建一个自定义类,“假装”为一个函数,使用__call__并将计数器定义为一个字段:

class Replace(object):
    def __init__(self):
        self.counter = 0

    def __call__(self, match):
        self.counter += 1
        return '<p id="{0}">'.format(self.counter)

replace = Replace()
re.sub(pattern, replace, your_string)
Run Code Online (Sandbox Code Playgroud)