如何提取div标签中的强元素

Question

如何提取div标签中的强元素

Puj*_*apu 4 python beautifulsoup web-scraping

我是网络抓取的新手。我正在使用 Python 来抓取数据。有人可以帮助我如何从以下位置提取数据：

<div class="dept"><strong>LENGTH:</strong> 15 credits</div>

Run Code Online (Sandbox Code Playgroud)

我的输出应该是长度： 15 credits

这是我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup 

length=bsObj.findAll("strong")
for leng in length:
    print(leng.text,leng.next_sibling)

Run Code Online (Sandbox Code Playgroud)

输出：

DELIVERY:  Campus
LENGTH:  2 years
OFFERED BY:  Olin Business School

Run Code Online (Sandbox Code Playgroud)

但我只想有长度。

网站：http : //www.mastersindatascience.org/specialties/business-analytics/

Answer 1

ale*_*cxe 5

您应该稍微改进代码以通过文本定位strong元素：

soup.find("strong", text="LENGTH:").next_sibling
Run Code Online (Sandbox Code Playgroud)
或者，对于多个长度：

for length in soup.find_all("strong", text="LENGTH:"): print(length.next_sibling.strip())
Run Code Online (Sandbox Code Playgroud)
演示：

>>> import requests >>> from bs4 import BeautifulSoup >>> >>> url = "http://www.mastersindatascience.org/specialties/business-analytics/" >>> response = requests.get(url) >>> soup = BeautifulSoup(response.content, "html.parser") >>> for length in soup.find_all("strong", text="LENGTH:"): ... print(length.next_sibling.strip()) ... 33 credit hours 15 months 48 Credits ... 12 months 1 year
Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	8721 次
最近记录：	5 年，10 月前