Cha*_*les 2 html python beautifulsoup html-parsing
假设我有一碗汤,我想删除所有段落的所有样式标签。所以我想把整个汤都放进<p style='blah' id='bla' class=...>去。<p id='bla' class=...>但我不想碰<img style='...'>标签。我该怎么做?
这个想法是使用并删除 style 属性来迭代所有p标签:find_all('p')
from bs4 import BeautifulSoup
data = """
<body>
<p style='blah' id='bla1'>paragraph1</p>
<p style='blah' id='bla2'>paragraph2</p>
<p style='blah' id='bla3'>paragraph3</p>
<img style="awesome_image"/>
</body>"""
soup = BeautifulSoup(data, 'html.parser')
for p in soup.find_all('p'):
if 'style' in p.attrs:
del p.attrs['style']
print soup.prettify()
Run Code Online (Sandbox Code Playgroud)
印刷:
<body>
<p id="bla1">
paragraph1
</p>
<p id="bla2">
paragraph2
</p>
<p id="bla3">
paragraph3
</p>
<img style="awesome_image"/>
</body>
Run Code Online (Sandbox Code Playgroud)