是否可以为每个单独的请求从start_requests()传递变量到parse()?

Chi*_*Abs 18 scrapy

我正在使用循环来生成我的请求start_request(),我想传递索引,parse()以便它可以将它存储在项目中.但是当我使用self.i输出时i,每个项目都有最大值(最后一次循环转动).我可以使用,response.url.re('regex to extract the index')但我想知道是否有一种干净的方法将变量从start_requests传递到解析.

Gra*_*rus 32

你可以使用scrapy.Request meta属性:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'index':index})

    def parse(self, response):
        print(response.url)
        print(response.meta['index'])
Run Code Online (Sandbox Code Playgroud)


jay*_*iya 5

您可以将cb_kwargs参数传递给scrapy.Request()

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index})

    def parse(self, response, index):
        pass
Run Code Online (Sandbox Code Playgroud)