从特定标签中删除样式 BeautifulSoup/Python

Cha*_*les 2 html python beautifulsoup html-parsing

假设我有一碗汤,我想删除所有段落的所有样式标签。所以我想把整个汤都放进<p style='blah' id='bla' class=...>去。<p id='bla' class=...>但我不想碰<img style='...'>标签。我该怎么做?

ale*_*cxe 5

这个想法是使用并删除 style 属性来迭代所有p标签:find_all('p')

from bs4 import BeautifulSoup


data = """
<body>
    <p style='blah' id='bla1'>paragraph1</p>
    <p style='blah' id='bla2'>paragraph2</p>
    <p style='blah' id='bla3'>paragraph3</p>
    <img style="awesome_image"/>
</body>"""


soup = BeautifulSoup(data, 'html.parser')
for p in soup.find_all('p'):
    if 'style' in p.attrs:
        del p.attrs['style']

print soup.prettify()
Run Code Online (Sandbox Code Playgroud)

印刷:

<body>
 <p id="bla1">
  paragraph1
 </p>
 <p id="bla2">
  paragraph2
 </p>
 <p id="bla3">
  paragraph3
 </p>
 <img style="awesome_image"/>
</body>
Run Code Online (Sandbox Code Playgroud)