我有一个类在init之前运行一些代码:
class NoFollowSpider(CrawlSpider):
rules = ( Rule (SgmlLinkExtractor(allow=("", ),),
callback="parse_items", follow= True),
)
def __init__(self, moreparams=None, *args, **kwargs):
super(NoFollowSpider, self).__init__(*args, **kwargs)
self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)
我使用以下命令运行此scrapy代码:
> scrapy runspider my_spider.py -a moreparams="more parameters" -o output.txt
Run Code Online (Sandbox Code Playgroud)
现在,我希望命令规则的静态变量可以从命令行进行配置:
> scrapy runspider my_spider.py -a crawl=True -a moreparams="more parameters" -o output.txt
Run Code Online (Sandbox Code Playgroud)
将init更改为:
def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
if (crawl_pages is True):
self.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items", follow= True),
)
self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)
但是,如果我在init中切换静态变量规则,scrapy不再考虑它:它运行,但只抓取给定的start_urls而不是整个域.似乎规则必须是静态类变量.
那么,我该如何动态设置静态变量呢?
所以这就是我在@Not_a_Golfer和@nramirezuy的帮助下解决问题的方法,我只是简单地使用了他们建议的两点:
class NoFollowSpider(CrawlSpider):
def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
super(NoFollowSpider, self).__init__(*args, **kwargs)
# Set the class member from here
if (crawl_pages is True):
NoFollowSpider.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items", follow= True),)
# Then recompile the Rules
super(NoFollowSpider, self)._compile_rules()
# Keep going as before
self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)
感谢大家的帮助!
| 归档时间: |
|
| 查看次数: |
1974 次 |
| 最近记录: |