BeautifulSoup无效,收到NoneType错误

Joh*_*mbo 2 html python beautifulsoup html-parsing python-3.x

我使用以下代码(使用python和BeautifulSoup从网页检索链接获取):

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print link['href']
Run Code Online (Sandbox Code Playgroud)

但是,我不明白为什么我收到以下错误消息:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
    if link.has_attr('href'):
TypeError: 'NoneType' object is not callable
Run Code Online (Sandbox Code Playgroud)

BeautifulSoup 3.2.0 Python 2.7

编辑:

我尝试了类似问题的解决方案(如果link.has_attr('href'),则输入类型错误:TypeError:'NoneType'对象不可调用),但它给出了以下错误:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
    for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable
Run Code Online (Sandbox Code Playgroud)

ale*_*cxe 5

首先:

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用BeautifulSoup的版本3这是不再保留.切换到BeautifulSoup版本4.通过以下方式安装

pip install beautifulsoup4
Run Code Online (Sandbox Code Playgroud)

并将您的导入更改为:

from bs4 import BeautifulSoup
Run Code Online (Sandbox Code Playgroud)

也:

回溯(最近一次调用最后一次):文件"C:\ Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py",第13行,如果是link.has_attr('href'):TypeError:'NoneType'对象不是可调用

link是一个Tag没有has_attr方法的实例.这意味着,记住点符号的含义BeautifulSoup,它会尝试搜索元素has_attr内部的link元素,这会导致无法找到.换句话说,link.has_attrNone显然None('href')的结果为错误.

相反,做:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
    print(link['href'])
Run Code Online (Sandbox Code Playgroud)

仅供参考,这是一个完整的工作代码,我用来调试你的问题(使用requests):

import requests
from bs4 import BeautifulSoup, SoupStrainer


response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
    print(link['href'])
Run Code Online (Sandbox Code Playgroud)