fcm*_*max 7 python scrapy web-scraping
我无法在parse方法中更改spider设置.但绝对必须是一种方式.
例如:
class SomeSpider(BaseSpider): name = 'mySpider' allowed_domains = ['example.com'] start_urls = ['http://example.com'] settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.FirstPipeline'] print settings['ITEM_PIPELINES'][0] #printed 'myproject.pipelines.FirstPipeline' def parse(self, response): #...some code settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.SecondPipeline'] print settings['ITEM_PIPELINES'][0] # printed 'myproject.pipelines.SecondPipeline' item = Myitem() item['mame'] = 'Name for SecondPipeline'
但!项目将由FirstPipeline处理.新的ITEM_PIPELINES参数不起作用.如何在开始抓取后更改设置?提前致谢!
如果您希望不同的蜘蛛具有不同的管道,您可以为蜘蛛设置管道列表属性,该属性定义该蜘蛛的管道。比在管道中检查是否存在:
class MyPipeline(object):
def process_item(self, item, spider):
if self.__class__.__name__ not in getattr(spider, 'pipelines',[]):
return item
...
return item
class MySpider(CrawlSpider):
pipelines = set([
'MyPipeline',
'MyPipeline3',
])
Run Code Online (Sandbox Code Playgroud)
如果您希望不同的项目由不同的管道处理,您可以这样做:
class MyPipeline2(object):
def process_item(self, item, spider):
if isinstance(item, MyItem):
...
return item
return item
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3346 次 |
最近记录: |