如何动态设置Scrapy规则?

Ant*_*nel 3 python scrapy

我有一个类在init之前运行一些代码:

class NoFollowSpider(CrawlSpider):
    rules = ( Rule (SgmlLinkExtractor(allow=("", ),),
                callback="parse_items",  follow= True),
)

def __init__(self, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)

我使用以下命令运行此scrapy代码:

> scrapy runspider my_spider.py -a moreparams="more parameters" -o output.txt 
Run Code Online (Sandbox Code Playgroud)

现在,我希望命令规则的静态变量可以从命令行进行配置:

> scrapy runspider my_spider.py -a crawl=True -a moreparams="more parameters" -o output.txt
Run Code Online (Sandbox Code Playgroud)

init更改为:

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    if (crawl_pages is True):
        self.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),
    )
    self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)

但是,如果我在init中切换静态变量规则,scrapy不再考虑它:它运行,但只抓取给定的start_urls而不是整个域.似乎规则必须是静态类变量.

那么,我该如何动态设置静态变量呢?

Ant*_*nel 7

所以这就是我在@Not_a_Golfer和@nramirezuy的帮助下解决问题的方法,我只是简单地使用了他们建议的两点:

class NoFollowSpider(CrawlSpider):

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    # Set the class member from here
    if (crawl_pages is True):
        NoFollowSpider.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),)
        # Then recompile the Rules
        super(NoFollowSpider, self)._compile_rules()

    # Keep going as before
    self.moreparams = moreparams
Run Code Online (Sandbox Code Playgroud)

感谢大家的帮助!