从 BeautifulSoup resultSet 对象获取 href 时遇到问题，但可以获取“style”和“class”？

Question

从 BeautifulSoup resultSet 对象获取 href 时遇到问题，但可以获取“style”和“class”？

bla*_*ite 1 python beautifulsoup web-scraping

next_page = \xe2\x80\x98https://research.stlouisfed.org/fred2/tags/series?et=&pageID=1&t=\'\nopened_url = urllib2.urlopen(next_page).read()\n\nsoup = BeautifulSoup(opened_url)\n\nhrefs = soup.find_all("div",{"class":"col-xs-12 col-sm-10"})\n

Run Code Online (Sandbox Code Playgroud)\n\n

hrefs现在看起来像这样：

\n\n

[<div class="col-xs-12 col-sm-10">\\n<a class="series-title" href="/fred2/series/GDPC1" style="font-size:1.2em">Real Gross Domestic Product</a>\\n</div>, <div class="col-xs-12 col-sm-10">\\n<a class="series-title" href="/fred2/series/CPIAUCSL" style="font-size:1.2em">Consumer Price Index for All Urban Consumers: All Items</a>\\n</div>,...

\n\n

我尝试使用href类似的方法离开那里hrefs[1][\'href\']，但出现以下错误：

\n\n

Traceback (most recent call last):\n  File "<stdin>", line 2, in <module>\n  File "/Library/Python/2.7/site-packages/bs4/element.py", line 958, in __getitem__\n    return self.attrs[key]\nKeyError: \'href\'\n

Run Code Online (Sandbox Code Playgroud)\n\n

我只想删除此页面上的所有 18 个链接。我想我可以将每个元素转换为hrefs字符串，然后只将findhref 转换为其中的内容，但这违背了 bs4 的目的。

\n

Answer 1

Avi*_*Raj 5

您需要获取a标签href

hrefs = soup.find_all("div",{"class":"col-xs-12 col-sm-10"})
print hrefs[1].find('a')['href']

Run Code Online (Sandbox Code Playgroud)

要获取所有 a 标签href内部的 div 标签，您可以使用

for tag in hrefs:
    print tag.find('a', href=True)['href']

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，9 月前
查看次数：	2424 次
最近记录：	9 年，9 月前