我知道你可以在process_item()中访问spider变量,但是如何在管道init函数中访问spider变量呢?
class SiteSpider(CrawlSpider):
def __init__(self):
self.id = 10
class MyPipeline(object):
def __init__(self):
...
Run Code Online (Sandbox Code Playgroud)
我还需要在MyPipeline中访问CUSTOM_SETTINGS_VARIABLE.
您无法访问spider实例,因为在引擎启动时完成了管道初始化.实际上,您必须认为您的管道处理多个蜘蛛而不仅仅是一个蜘蛛.
话虽如此,您可以挂钩spider_opened信号以在启动时访问蜘蛛实例.
from scrapy import signals
class MyPipeline(object):
def __init__(self, mysetting):
# do stuff with the arguments...
self.mysetting = mysetting
@classmethod
def from_crawler(cls, crawler):
settings = crawler.settings
instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
return instance
def spider_opened(self, spider):
# do stuff with the spider: initialize resources, etc.
spider.log("[MyPipeline] Initializing resources for %s" % spider.name)
def process_item(self, item, spider):
return item
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1481 次 |
| 最近记录: |