美丽的汤嵌套标签搜索

Question

美丽的汤嵌套标签搜索

我正在尝试编写将对网页上的单词进行计数的python程序。我使用Beautiful Soup 4刮取了页面，但是访问嵌套的HTML标签（例如：<p class="hello">inside <div>）时遇到了困难。

每次尝试使用page.findAll()（页面是包含整个页面的Beautiful Soup对象）方法尝试找到此类标记时，它都不会找到任何标记，尽管有。有什么简单的方法或其他方法可以做到吗？

Answer 1

也许我在想，您要尝试的工作是先查找特定的div标签，然后搜索其中的所有p标签并计算它们的数量或执行您想做的任何事情。例如：

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

Run Code Online (Sandbox Code Playgroud)

希望能有所帮助

归档时间：	8 年，2 月前
查看次数：	8120 次
最近记录：	8 年前