我有item
对象,我需要将它传递到许多页面以将数据存储在单个项目中
喜欢我的项目
class DmozItem(Item):
title = Field()
description1 = Field()
description2 = Field()
description3 = Field()
Run Code Online (Sandbox Code Playgroud)
现在这三个描述分为三个单独的页面.我想做些喜欢的事
现在这很适合 parseDescription1
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1)
request.meta['item'] = item
return request
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return item
Run Code Online (Sandbox Code Playgroud)
但我想要类似的东西
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1)
request.meta['item'] = item
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription2)
request.meta['item'] = item
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription2)
request.meta['item'] = item
return request
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return item
def parseDescription2(self,response):
item = response.meta['item']
item['desc2'] = "test2"
return item
def parseDescription3(self,response):
item = response.meta['item']
item['desc3'] = "test3"
return item
Run Code Online (Sandbox Code Playgroud)
war*_*iuc 30
没问题.代替
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1)
request.meta['item'] = item
yield request
request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription2, meta={'item': item})
yield request
yield Request("http://www.example.com/lin1.cpp", callback=self.parseDescription3, meta={'item': item})
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return item
def parseDescription2(self,response):
item = response.meta['item']
item['desc2'] = "test2"
return item
def parseDescription3(self,response):
item = response.meta['item']
item['desc3'] = "test3"
return item
Run Code Online (Sandbox Code Playgroud)
做
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1)
request.meta['item'] = item
yield request
request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription2, meta={'item': item})
yield request
yield Request("http://www.example.com/lin1.cpp", callback=self.parseDescription3, meta={'item': item})
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return item
def parseDescription2(self,response):
item = response.meta['item']
item['desc2'] = "test2"
return item
def parseDescription3(self,response):
item = response.meta['item']
item['desc3'] = "test3"
return item
Run Code Online (Sandbox Code Playgroud)
小智 27
为了保证请求/回调的排序,并且最终只返回一个项目,您需要使用以下形式链接您的请求:
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback=self.parseDescription1)
request.meta['item'] = Item()
return [request]
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return [Request("http://www.example.com/lin2.cpp", callback=self.parseDescription2, meta={'item': item})]
def parseDescription2(self,response):
item = response.meta['item']
item['desc2'] = "test2"
return [Request("http://www.example.com/lin3.cpp", callback=self.parseDescription3, meta={'item': item})]
def parseDescription3(self,response):
item = response.meta['item']
item['desc3'] = "test3"
return [item]
Run Code Online (Sandbox Code Playgroud)
每个回调函数都返回一个可迭代的项目或请求,计划请求并通过项目管道运行项目.
如果你从每个回调中返回一个项目,你最终会在你的管道中找到4个不同完整状态的项目,但如果你返回下一个请求,那么你可以保证请求的顺序,你将完全拥有执行结束时的一个项目.
oli*_*her 19
接受的答案总共返回三个项目[desc(i)设置为i = 1,2,3].
如果你想返回一个项目,戴夫·麦克莱恩的项目做的工作,但它需要parseDescription1
,parseDescription2
以及parseDescription3
获得成功,并没有错误,以回报的项目上运行.
对于我的用例,一些子请求可以随机返回HTTP 403/404错误,因此我丢失了一些项目,即使我可以部分地删除它们.
因此,我目前采用以下解决方法:不是仅在request.meta
dict中传递项目,而是传递一个知道接下来要调用的请求的调用堆栈.它将调用堆栈上的下一个项目(只要它不为空),并在堆栈为空时返回该项目.
该errback
请求参数用于返回到错误时调度方法,只是继续下一个堆栈的项目.
def callnext(self, response):
''' Call next target for the item loader, or yields it if completed. '''
# Get the meta object from the request, as the response
# does not contain it.
meta = response.request.meta
# Items remaining in the stack? Execute them
if len(meta['callstack']) > 0:
target = meta['callstack'].pop(0)
yield Request(target['url'], meta=meta, callback=target['callback'], errback=self.callnext)
else:
yield meta['loader'].load_item()
def parseDescription1(self, response):
# Recover item(loader)
l = response.meta['loader']
# Use just as before
l.add_css(...)
# Build the call stack
callstack = [
{'url': "http://www.example.com/lin2.cpp",
'callback': self.parseDescription2 },
{'url': "http://www.example.com/lin3.cpp",
'callback': self.parseDescription3 }
]
return self.callnext(response)
def parseDescription2(self, response):
# Recover item(loader)
l = response.meta['loader']
# Use just as before
l.add_css(...)
return self.callnext(response)
def parseDescription3(self, response):
# ...
return self.callnext(response)
Run Code Online (Sandbox Code Playgroud)
此解决方案仍然是同步的,如果回调中有任何异常,它仍然会失败.
有关更多信息,请查看我撰写的有关该解决方案的博文.
归档时间: |
|
查看次数: |
30941 次 |
最近记录: |