使用BeautifulSoup获取特定标记后的值

Question

使用BeautifulSoup获取特定标记后的值

kna*_*mes 5 python beautifulsoup html-parsing web-scraping

我很难让BeautifulSoup为我收集一些数据.从此代码示例中访问日期(实际数字,2008)的最佳方法是什么？这是我第一次使用Beautifulsoup,我已经想出了如何从网页上删除网址,但我不能将其缩小到只选择单词Date,然后只返回任何数字日期(在dd中)括号).我甚至可能要问什么？

<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
        2008
    </dd>
</div>

Run Code Online (Sandbox Code Playgroud)

Answer 1

ale*_*cxe 11

按文字查找dt标签并找到下一个兄弟:dd

soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text

Run Code Online (Sandbox Code Playgroud)

完整的代码:

from bs4 import BeautifulSoup

data = """
<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
    2008
    </dd>
</div>
"""

soup = BeautifulSoup(data)
date_field = soup.find('div', class_='detail_date').find('dt', text='Date')
print date_field.find_next_sibling('dd').text.strip()

Run Code Online (Sandbox Code Playgroud)

打印2008.

归档时间：	11 年，4 月前
查看次数：	3616 次
最近记录：	9 年，1 月前