Python BeautifulSoup 查找包含文本的元素

Question

Python BeautifulSoup 查找包含文本的元素

<div class="info">
       <h3> Height:
            <span>1.1</span>
       </h3>
</div>

<div class="info">
       <h3> Number:
            <span>111111111</span>
       </h3>
</div>

Run Code Online (Sandbox Code Playgroud)

这是该网站的一部分。最终，我想提取 111111111。我知道我可以 soup.find_all("div", { "class" : "info" }) 获取两个 div 的列表；但是，我宁愿不必执行循环来检查它是否包含文本“Number”。

是否有一种更优雅的方法来提取“1111111”，以便它确实如此soup.find_all("div", { "class" : "info" })，但也使其必须包含“Number”？

我也尝试过numberSoup = soup.find('h3', text='Number') ，但它返回None

Answer 1

dok*_*ung 6

您可以编写自己的过滤函数并将其作为 function 的参数find_all。

from bs4 import BeautifulSoup

def number_span(tag):
    return tag.name=='span' and 'Number:' in tag.parent.contents[0]

soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(number_span)

Run Code Online (Sandbox Code Playgroud)

顺便说一下，无法使用text参数获取标签的原因是：文本参数帮助我们找到.string值等于其值的标签。如果一个标签包含多个内容，那么就不清楚.string应该指代什么。所以.string定义为None.

你可以参考美丽汤医生。

归档时间：	10 年前
查看次数：	9874 次
最近记录：	3 年，10 月前