dap*_*hez 13 python beautifulsoup html-parsing
我使用python + BeautifulSoup来解析HTML文档.
现在我需要替换<h2 class="someclass">HTML文档中的所有元素<h1 class="someclass">.
如何在不更改文档中的任何其他内容的情况下更改标记名称?
Man*_*res 19
我不知道你是如何访问的,tag但以下是我的工作:
import BeautifulSoup
if __name__ == "__main__":
data = """
<html>
<h2 class='someclass'>some title</h2>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
<li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>
"""
soup = BeautifulSoup.BeautifulSoup(data)
h2 = soup.find('h2')
h2.name = 'h1'
print soup
Run Code Online (Sandbox Code Playgroud)
输出print soup命令是:
<html>
<h1 class='someclass'>some title</h1>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
<li>Vestibulum auctor dapibus neque.</li>
</ul>
</html>
Run Code Online (Sandbox Code Playgroud)
如你所见,h2成了h1.文件中没有其他内容发生变化.我使用的是Python 2.6和BeautifulSoup 3.2.0.
如果您有多个h2并且想要全部更改它们,您可以这样做:
soup = BeautifulSoup.BeautifulSoup(your_data)
while True:
h2 = soup.find('h2')
if not h2:
break
h2.name = 'h1'
Run Code Online (Sandbox Code Playgroud)