vic*_*vic 15 python arguments callback scrapy
def parse(self, response):
for sel in response.xpath('//tbody/tr'):
item = HeroItem()
item['hclass'] = response.request.url.split("/")[8].split('-')[-1]
item['server'] = response.request.url.split('/')[2].split('.')[0]
item['hardcore'] = len(response.request.url.split("/")[8].split('-')) == 3
item['seasonal'] = response.request.url.split("/")[6] == 'season'
item['rank'] = sel.xpath('td[@class="cell-Rank"]/text()').extract()[0].strip()
item['battle_tag'] = sel.xpath('td[@class="cell-BattleTag"]//a/text()').extract()[1].strip()
item['grift'] = sel.xpath('td[@class="cell-RiftLevel"]/text()').extract()[0].strip()
item['time'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
item['date'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()
yield Request(url, callback=self.parse_profile)
def parse_profile(self, response):
sel = Selector(response)
item = HeroItem()
item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
return item
Run Code Online (Sandbox Code Playgroud)
好吧,我正在主解析方法中抓取整个表格,我从该表中取了几个字段.其中一个字段是一个网址,我想探索它以获得一大堆字段.如何将已创建的ITEM对象传递给回调函数,以便最终项保留所有字段?
正如上面的代码中所示,我能够保存url中的字段(目前的代码)或只保存表中的字段(只是写yield item)但我不能只生成一个包含所有字段的对象一起.
我试过这个,但很明显,它不起作用.
yield Request(url, callback=self.parse_profile(item))
def parse_profile(self, response, item):
sel = Selector(response)
item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
return item
Run Code Online (Sandbox Code Playgroud)
Rej*_*ted 35
这就是你使用meta关键字的原因.
def parse(self, response):
for sel in response.xpath('//tbody/tr'):
item = HeroItem()
# Item assignment here
url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()
yield Request(url, callback=self.parse_profile, meta={'hero_item': item})
def parse_profile(self, response):
item = response.meta.get('hero_item')
item['weapon'] = response.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
yield item
Run Code Online (Sandbox Code Playgroud)
还要注意,这样做sel = Selector(response)是浪费资源,与之前的做法不同,所以我改变了它.它自动映射到responseas response.selector,它也有方便的快捷方式response.xpath.
pen*_*Dev 13
这是将 args 传递给回调函数的更好方法:
def parse(self, response):
request = scrapy.Request('http://www.example.com/index.html',
callback=self.parse_page2,
cb_kwargs=dict(main_url=response.url))
request.cb_kwargs['foo'] = 'bar' # add more arguments for the callback
yield request
def parse_page2(self, response, main_url, foo):
yield dict(
main_url=main_url,
other_url=response.url,
foo=foo,
)
Run Code Online (Sandbox Code Playgroud)
来源:https : //docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments
| 归档时间: |
|
| 查看次数: |
10819 次 |
| 最近记录: |