相关疑难解决方法(0)

在Scrapy本地运行所有蜘蛛

有没有办法在不使用Scrapy守护进程的情况下运行Scrapy项目中的所有蜘蛛?以前有一种方法可以运行多个蜘蛛scrapy crawl,但语法被删除了,Scrapy的代码也发生了很大变化.

我尝试创建自己的命令:

from scrapy.command import ScrapyCommand
from scrapy.utils.misc import load_object
from scrapy.conf import settings

class Command(ScrapyCommand):
    requires_project = True

    def syntax(self):
        return '[options]'

    def short_desc(self):
        return 'Runs all of the spiders'

    def run(self, args, opts):
        spman_cls = load_object(settings['SPIDER_MANAGER_CLASS'])
        spiders = spman_cls.from_settings(settings)

        for spider_name in spiders.list():
            spider = self.crawler.spiders.create(spider_name)
            self.crawler.crawl(spider)

        self.crawler.start()
Run Code Online (Sandbox Code Playgroud)

但是一旦注册了蜘蛛self.crawler.crawl(),我就会得到所有其他蜘蛛的断言错误:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 138, in _run_command
    cmd.run(args, opts)
  File "/home/blender/Projects/scrapers/store_crawler/store_crawler/commands/crawlall.py", line 22, in run
    self.crawler.crawl(spider)
  File "/usr/lib/python2.7/site-packages/scrapy/crawler.py", line 47, …
Run Code Online (Sandbox Code Playgroud)

python web-crawler scrapy

13
推荐指数
4
解决办法
8115
查看次数

标签 统计

python ×1

scrapy ×1

web-crawler ×1