美丽的汤4:如何用文本和另一个标签替换标签?

Nem*_*XXX 7 html python replace beautifulsoup html-parsing

我想用另一个标签替换标签,并将旧标签的内容放在新标签之前.例如:

我想改变这个:

<html>
<body>
<p>This is the <span id="1">first</span> paragraph</p>
<p>This is the <span id="2">second</span> paragraph</p>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)

进入这个:

<html>
<body>
<p>This is the first<sup>1</sup> paragraph</p>
<p>This is the second<sup>2</sup> paragraph</p>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)

我可以很容易地找到所有spansfind_all()从id属性获得的数量和更换使用另一个标签一个标签replace_with(),但我怎么用文字代替标签新标签的更换标签之前或插入文本?

ale*_*cxe 6

我们的想法是找到每个span带有id属性的标签(span[id] CSS Selector),用于insert_after()sup其后插入标签并unwrap()用它的内容替换标签:

from bs4 import BeautifulSoup

data = """
<html>
<body>
<p>This is the <span id="1">first</span> paragraph</p>
<p>This is the <span id="2">second</span> paragraph</p>
</body>
</html>
"""

soup = BeautifulSoup(data)
for span in soup.select('span[id]'):
    # insert sup tag after the span
    sup = soup.new_tag('sup')
    sup.string = span['id']
    span.insert_after(sup)

    # replace the span tag with it's contents
    span.unwrap()

print soup
Run Code Online (Sandbox Code Playgroud)

打印:

<html>
<body>
<p>This is the first<sup>1</sup> paragraph</p>
<p>This is the second<sup>2</sup> paragraph</p>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)

  • 感谢您的非常有帮助的回答。我读过BS文档,但我显然错过了关于**wrap()**和**unwrap()**的部分,这是解决这个问题的关键。 (2认同)