Vin*_*tro 5 scrapy python-2.7 scrapy-spider
我是编程的新手,我正在尝试学习scrapy,使用scrapy教程:http://doc.scrapy.org/en/latest/intro/tutorial.html
所以我运行"scrapy crawl dmoz"命令并得到此错误:
2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial)
2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11
2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
2015-07-14 16:11:05 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
e, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-07-14 16:11:06 [twisted] CRITICAL: Unhandled error in Deferred:
2015-07-14 16:11:07 [twisted] CRITICAL:
Run Code Online (Sandbox Code Playgroud)
我正在使用Windows 7和python 2.7.有谁知道这是什么问题?我怎么能解决这个问题?
编辑:我的蜘蛛文件代码是:
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/computers/programming/languages/python/books/",
"http://www.dmoz.org/computer/programming/languages/python/resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename,'wb') as f:
f.write(response.body)
Run Code Online (Sandbox Code Playgroud)
items.py代码:
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
Run Code Online (Sandbox Code Playgroud)
点子列表:
对我可怜的英语的关注和讽刺,不是我的母语.
小智 2
我也开始学习scrapy,也遇到了和你一样的问题。折腾了一下午,最后发现是因为pywin32模块只下载而没有安装。您可以尝试在cmd中输入以下命令来完成pywin32模块的安装并再次尝试抓取:
python python27\scripts\pywin32_postinstall.py -install
我希望它会有所帮助!
| 归档时间: |
|
| 查看次数: |
9364 次 |
| 最近记录: |