我正在使用循环来生成我的请求start_request(),我想传递索引,parse()以便它可以将它存储在项目中.但是当我使用self.i输出时i,每个项目都有最大值(最后一次循环转动).我可以使用,response.url.re('regex to extract the index')但我想知道是否有一种干净的方法将变量从start_requests传递到解析.
Gra*_*rus 32
你可以使用scrapy.Request meta属性:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = [...]
for index, url in enumerate(urls):
yield scrapy.Request(url, meta={'index':index})
def parse(self, response):
print(response.url)
print(response.meta['index'])
Run Code Online (Sandbox Code Playgroud)
您可以将cb_kwargs参数传递给scrapy.Request()
https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = [...]
for index, url in enumerate(urls):
yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index})
def parse(self, response, index):
pass
Run Code Online (Sandbox Code Playgroud)