AttributeError: 'ResultSet' 对象没有属性 'find_all' Beautifulsoup

Question

AttributeError: 'ResultSet' 对象没有属性 'find_all' Beautifulsoup

Imo*_*Imo 1 python beautifulsoup web-scraping

我不明白为什么会出现此错误：

我有一个相当简单的功能：

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

Run Code Online (Sandbox Code Playgroud)

这是我试图抓取的网页的结构：

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*ers 5

你做错了两件事：

您呼叫find_all的news结果集; 大概您打算在links对象上调用它，该对象是该结果集中的一个元素。
<href ...>您的文档中没有标签，因此使用 with 搜索find_all('href')不会为您提供任何信息。您只有带有href 属性的标签。

您可以将代码更正为：

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

Run Code Online (Sandbox Code Playgroud)

做我认为你试图做的事情。

我会使用CSS 选择器：

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

Run Code Online (Sandbox Code Playgroud)

如果您想返回href属性的值（链接本身），当然也需要提取它：

return news_links[0]['href']

Run Code Online (Sandbox Code Playgroud)

如果您需要所有链接对象，而不是第一个，只需返回news_links链接对象，或使用列表理解来提取 URL：

return [link['href'] for link in news_links]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，1 月前
查看次数：	8169 次
最近记录：	10 年，1 月前