我在scrapy文档中做scrapy教程.这是我当前的目录如下:
.
??? scrapy.cfg
??? tutorial
??? __init__.py
??? __init__.pyc
??? items.py
??? pipelines.py
??? settings.py
??? settings.pyc
??? spiders
??? __init__.py
??? __init__.pyc
??? dmoz_spider
Run Code Online (Sandbox Code Playgroud)
dmoz_spider.py与scrapy教程页面中描述的相同.
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
Run Code Online (Sandbox Code Playgroud)
然后我从当前目录运行此命令
scrapy crawl dmoz
Run Code Online (Sandbox Code Playgroud)
但我收到错误消息:
2015-12-17 12:23:22 [scrapy] INFO: Scrapy 1.0.3 started (bot: tutorial)
2015-12-17 12:23:22 [scrapy] INFO: Optional features available: ssl, http11
2015-12-17 12:23:22 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
...
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: dmoz'
Run Code Online (Sandbox Code Playgroud)
有什么建议我做错了吗?我已经检查了堆栈溢出中的类似问题,并按照那里的解决方案.但我仍然得到错误.
| 归档时间: |
|
| 查看次数: |
716 次 |
| 最近记录: |