Python,Scrapy,Pipeline:函数"process_item"没有被调用

Question

Python,Scrapy,Pipeline:函数"process_item"没有被调用

我有一个非常简单的代码,如下所示.刮痧是可以的,我可以看到所有print语句生成正确的数据.在Pipeline,初始化工作正常.但是,process_item函数不会被调用,因为print函数开头的语句永远不会被执行.

蜘蛛:comosham.py

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re


class ComoSham(Spider):
    name = "comosham"
    allowed_domains = ["www.comoshambhala.com"]
    start_urls = [
        "http://www.comoshambhala.com/singapore/classes/schedules",
        "http://www.comoshambhala.com/singapore/about/location-contact",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
    ]

    def parse(self, response):  
        category = (response.url)[39:44]
        print 'in parse'
        if category == 'class':
            pass
            """self.gen_req_class(response)"""
        elif category == 'about':
            print 'about to call parse_location'
            self.parse_location(response)
        elif category == 'rates':
            pass
            """self.parse_rates(response)"""
        else:
            print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


    def parse_location(self, response):
        print 'in parse_location'       
        item = ComoShamLocation()
        item['category'] = 'location'
        loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
        item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
        item['pin'] = (loc[5])[11:18]
        item['phone'] = (loc[9])[6:20]
        item['fax'] = (loc[10])[6:20]
        item['email'] = loc[12]
        print item['address'],item['pin'],item['phone'],item['fax'],item['email']
        return item

Run Code Online (Sandbox Code Playgroud)

物品档案:

import scrapy
from scrapy.item import Item, Field

class ComoShamLocation(Item):
    address = Field()
    pin = Field()
    phone = Field()
    fax = Field()
    email = Field()
    category = Field()

Run Code Online (Sandbox Code Playgroud)

管道文件:

class ComoShamPipeline(object):
    def __init__(self):
        self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
        self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])


    def process_item(self,item,spider):
        print 'processing item now'
        if item['category'] == 'location':
            print item['address'],item['pin'],item['phone'],item['fax'],item['email']
            self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
        else:
            pass

Run Code Online (Sandbox Code Playgroud)

Answer 1

roc*_*m4l 10

你的问题是你从来没有真正屈服于这个项目.parse_location返回要解析的项,但解析永远不会产生该项.

解决方案是替换:

self.parse_location(response)

Run Code Online (Sandbox Code Playgroud)

同

yield self.parse_location(response)

Run Code Online (Sandbox Code Playgroud)

更具体地说,如果没有产生任何项目,则不会调用process_item.

Answer 2

Gan*_*esh 5

在settings.py中使用ITEM_PIPELINES：

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，4 月前
查看次数：	4160 次
最近记录：	8 年前