如何用美丽的汤跳过 <span>

Question

如何用美丽的汤跳过 <span>

tar*_*san 3 python beautifulsoup python-3.x

这是我的代码的输出

<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>

Run Code Online (Sandbox Code Playgroud)

我只想获取项目名称，没有“详细信息”部分。

我选择特定 div id 的 Python 代码是

for content in soup.select('#itemTitle'):
    print(content.text)

Run Code Online (Sandbox Code Playgroud)

Answer 1

jua*_*lgo 7

您可以使用分解() clear()或extract()。根据文档：

Tag.decompose() 从树中删除一个标签，然后完全销毁它及其内容

Tag.clear() 删除标签的内容

PageElement.extract() 从树中删除标签或字符串。它返回提取的标签或字符串

from bs4 import BeautifulSoup
html = '''<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>'''

soup = BeautifulSoup(html, 'lxml')
for content in soup.select('#itemTitle'):
    content.span.decompose()
    print(content.text)

Run Code Online (Sandbox Code Playgroud)

输出：

  item name goes here

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	2753 次
最近记录：	7 年，9 月前

如何用美丽的汤跳过 &lt;span&gt;

如何用美丽的汤跳过 <span>