这是我的简单代码,我没有得到它的工作.
我是子类 initspider
这是我的代码
class MytestSpider(InitSpider):
name = 'mytest'
allowed_domains = ['example.com']
login_page = 'http://www.example.com'
start_urls = ["http://www.example.com/ist.php"]
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.parse)
def parse(self, response):
item = MyItem()
item['username'] = "mytest"
return item
Run Code Online (Sandbox Code Playgroud)
class TestPipeline(object):
def process_item(self, item, spider):
print item['username']
Run Code Online (Sandbox Code Playgroud)
如果尝试打印该项目,我会发出相同的错误
我得到的错误是
File "crawler/pipelines.py", line 35, in process_item
myitem.username = item['username']
exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'
Run Code Online (Sandbox Code Playgroud)
我的问题是InitSpider.我的pieplines没有得到项目对象
class MyItem(Item):
username = Field()
Run Code Online (Sandbox Code Playgroud)
BOT_NAME = 'crawler'
SPIDER_MODULES = ['spiders']
NEWSPIDER_MODULE = 'spiders'
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware': 700 # <-
}
COOKIES_ENABLED = True
COOKIES_DEBUG = True
ITEM_PIPELINES = [
'pipelines.TestPipeline',
]
IMAGES_STORE = '/var/www/htmlimages'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6776 次 |
| 最近记录: |