如何用BeautifulSoup更改标签名称?

dap*_*hez 13 python beautifulsoup html-parsing

我使用python + BeautifulSoup来解析HTML文档.

现在我需要替换<h2 class="someclass">HTML文档中的所有元素<h1 class="someclass">.

如何在不更改文档中的任何其他内容的情况下更改标记名称?

Man*_*res 19

我不知道你是如何访问的,tag但以下是我的工作:

import BeautifulSoup

if __name__ == "__main__":
    data = """
<html>
<h2 class='someclass'>some title</h2>
<ul>
   <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
   <li>Aliquam tincidunt mauris eu risus.</li>
   <li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>

    """
    soup = BeautifulSoup.BeautifulSoup(data)
    h2 = soup.find('h2')
    h2.name = 'h1'
    print soup
Run Code Online (Sandbox Code Playgroud)

输出print soup命令是:

<html>
<h1 class='someclass'>some title</h1>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
<li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>
Run Code Online (Sandbox Code Playgroud)

如你所见,h2成了h1.文件中没有其他内容发生变化.我使用的是Python 2.6和BeautifulSoup 3.2.0.

如果您有多个h2并且想要全部更改它们,您可以这样做:

soup = BeautifulSoup.BeautifulSoup(your_data)
while True: 
    h2 = soup.find('h2')
    if not h2:
        break
    h2.name = 'h1'
Run Code Online (Sandbox Code Playgroud)