如何将我的蜘蛛组织到Scrapy中的嵌套目录中？

Question

如何将我的蜘蛛组织到Scrapy中的嵌套目录中？

我有以下目录结构：

my_project/
  __init__.py
  spiders/
    __init__.py
    my_spider.py
    other_spider.py
  pipeines.py
  # other files

Run Code Online (Sandbox Code Playgroud)

现在，我可以进入my_project目录并使用来开始抓取scrapy crawl my_spider。

我想要实现的是能够以scrapy crawl my_spider这种更新的结构运行：

my_project/
  __init__.py
  spiders/
    __init__.py
    subtopic1/
      __init__.py # <-- I get the same error whether this is present or not
      my_spider.py
    subtopicx/
      other_spider.py
  pipeines.py
  # other files

Run Code Online (Sandbox Code Playgroud)

但是现在我得到这个错误：

KeyError：“未找到蜘蛛：my_spider”

将Scrapy蜘蛛组织到目录中的合适方法是什么？

Answer 1

par*_*ola 13

我知道这已经过期了，但这是在嵌套目录中组织蜘蛛的正确方法。您可以在此处定义的设置中设置模块位置。

例子：

SPIDER_MODULES = ['my_project.spiders', 'my_project.spiders.subtopic1', 'my_project.spiders.subtopicx']

Run Code Online (Sandbox Code Playgroud)

我是 Python 初学者，第一次尝试时无法应用您的答案。经过一些测试，我发现需要创建一个空文件“__init__.py”才能使该文件夹被识别为模块。您可能想在您的答案中添加这些信息，以便初学者可以更好地理解它，如果您认为这太基础了，请忽略我的评论。 (2认同)

Answer 2

小智 1

您必须从包含scrapy.cfgscrapy crawl my_spider的目录运行它。您不会收到任何错误。

my_project/ __init__.py spiders/ __init__.py my_spider.py sub_directory __init__.py other_spider.py pipeines.py scrapy.cfg
Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	786 次
最近记录：	6 年，4 月前