在scrapy中来自_crawler的类方法

Question

在scrapy中来自_crawler的类方法

在问了我的最后一个问题（如何将参数传递给scrapy管道对象）之后，我试图更好地理解scrapy中管道和爬虫之间的关系

答案之一是：

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    settings = crawler.settings
    table = settings.get('table')

    # Instantiate the pipeline with your table
    return cls(table)

def __init__(self, table):
    _engine = create_engine("sqlite:///data.db")
    _connection = _engine.connect()
    _metadata = MetaData()
    _stack_items = Table(table, _metadata,
                         Column("id", Integer, primary_key=True),
                         Column("detail_url", Text),
    _metadata.create_all(_engine)
    self.connection = _connection
    self.stack_items = _stack_items

Run Code Online (Sandbox Code Playgroud)

我很困惑：

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    settings = crawler.settings
    table = settings.get('table')

Run Code Online (Sandbox Code Playgroud)

爬虫类是否已经存在，或者我们是否在此处创建它。有人可以更详细地解释这里发生的事情吗？我一直在阅读许多来源，包括http://scrapy.readthedocs.io/en/latest/topics/api.html#crawler-api和http://scrapy.readthedocs.io/en/latest/topics/ architecture.html，但我还没有把这些部分放在一起。

Answer 1

luc*_*tti 5

又是我 :)

也许你没有得到的是classmethodPython 中的含义。在您的情况下，它是属于您的SQLlitePipeline类的方法。因此，cls是SQLlitePipeline类本身。

Scrapy 调用此管道方法传递crawler对象，Scrapy 自己实例化该对象。到目前为止，我们还没有SQLlitePipeline实例。换句话说，管道流还没有开始。

获得所需的参数（后table从爬虫的设置），from_crawler最后返回管道的情况下这样做cls(table)（记得cls是吧？所以，这是一样的做SQLlitePipeline(table)）。

这是一个普通的 Python 对象实例化，因此__init__将使用它期望的表名调用，然后管道流将开始。

编辑

大概了解一下 Scrapy 执行的过程，一步一步的概述是件好事。当然，它比我将要说明的要复杂得多，但希望它能让您更好地理解。

1) 你调用 Scrapy

2）Scrapy实例化一个crawler对象

crawler = Crawler(...)

Run Code Online (Sandbox Code Playgroud)

3）Scrapy 识别你要使用的管道类（SQLlitePipeline）并调用它的from_crawler方法。

# Note that SQLlitePipeline is not instantiated here, as from_crawler is a class method
# However, as we saw before, this method returns an instance of the pipeline class
pipeline_instance = SQLlitePipeline.from_crawler(crawler)

Run Code Online (Sandbox Code Playgroud)

4) 从现在开始，它调用这里列出的管道实例方法

pipeline_instance.open_spider(...)
pipeline_instance.process_item(...)
pipeline_instance.close_spider(...)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，9 月前
查看次数：	1159 次
最近记录：	8 年，9 月前