我有item
对象,我需要将它传递到许多页面以将数据存储在单个项目中
喜欢我的项目
class DmozItem(Item):
title = Field()
description1 = Field()
description2 = Field()
description3 = Field()
Run Code Online (Sandbox Code Playgroud)
现在这三个描述分为三个单独的页面.我想做些喜欢的事
现在这很适合 parseDescription1
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1)
request.meta['item'] = item
return request
def parseDescription1(self,response):
item = response.meta['item']
item['desc1'] = "test"
return item
Run Code Online (Sandbox Code Playgroud)
但我想要类似的东西
def page_parser(self, response):
sites = hxs.select('//div[@class="row"]')
items = []
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription1)
request.meta['item'] = item
request = Request("http://www.example.com/lin1.cpp", callback =self.parseDescription2)
request.meta['item'] = item
request = Request("http://www.example.com/lin1.cpp", callback …
Run Code Online (Sandbox Code Playgroud) 这是我的简单代码,我没有得到它的工作.
我是子类 initspider
这是我的代码
class MytestSpider(InitSpider):
name = 'mytest'
allowed_domains = ['example.com']
login_page = 'http://www.example.com'
start_urls = ["http://www.example.com/ist.php"]
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.parse)
def parse(self, response):
item = MyItem()
item['username'] = "mytest"
return item
Run Code Online (Sandbox Code Playgroud)
class TestPipeline(object):
def process_item(self, item, spider):
print item['username']
Run Code Online (Sandbox Code Playgroud)
如果尝试打印该项目,我会发出相同的错误
我得到的错误是
File "crawler/pipelines.py", line 35, in process_item
myitem.username = item['username']
exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'
Run Code Online (Sandbox Code Playgroud)
我的问题是InitSpider
.我的pieplines没有得到项目对象
class MyItem(Item):
username = Field()
Run Code Online (Sandbox Code Playgroud)
BOT_NAME = …
Run Code Online (Sandbox Code Playgroud) 有什么方法可以response.body
从 scrapy 中的 Request 函数中获取?
我有这个:
request = Request("http://www.example.com", callback = self.mytest)
def mytest(self, response)
return response.body
Run Code Online (Sandbox Code Playgroud)
现在我想输入response.body
一个Python变量,我怎样才能得到它?
我想要类似的东西
myresponse = Request("http://www.example.com").get('response')