我正在编写一些抓取代码并遇到上述错误。我的代码如下。
# -*- coding: utf-8 -*-
import scrapy
from myproject.items import Headline
class NewsSpider(scrapy.Spider):
name = 'IC'
allowed_domains = ['kosoku.jp']
start_urls = ['http://kosoku.jp/ic.php']
def parse(self, response):
"""
extract target urls and combine them with the main domain
"""
for url in response.css('table a::attr("href")'):
yield(scrapy.Request(response.urljoin(url), self.parse_topics))
def parse_topics(self, response):
"""
pick up necessary information
"""
item=Headline()
item["name"]=response.css("h2#page-name ::text").re(r'.*??????????')
item["road"]=response.css("div.ic-basic-info-left div:last-of-type ::text").re(r'.*?$')
yield item
Run Code Online (Sandbox Code Playgroud)
当我在 shell 脚本上单独执行它们时,我可以获得正确的响应,但是一旦它进入程序并运行,它就不会发生。
2017-11-27 18:26:17 [scrapy.core.scraper] ERROR: Spider error processing <GET http://kosoku.jp/ic.php> (referer: None)
Traceback (most recent call …Run Code Online (Sandbox Code Playgroud)