相关疑难解决方法(0)

使用scrapyd一次运行多个scrapy蜘蛛

我正在使用scrapy进行一个项目,我想要刮掉一些网站 - 可能是数百个 - 我必须为每个网站编写一个特定的蜘蛛.我可以使用以下方法在部署到scrapyd的项目中安排一个蜘蛛:

curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2

Run Code Online (Sandbox Code Playgroud)

但是如何一次安排项目中的所有蜘蛛呢？

所有帮助非常感谢!

python screen-scraping scrapy scrapyd

use*_*453

2012 05-29

10
推荐指数

1
解决办法

6271
查看次数

运行scrapy crawler的最简单方法，这样它就不会阻塞脚本

官方文档提供了许多scrapy从代码运行爬虫的方法：

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

Run Code Online (Sandbox Code Playgroud)

但它们都阻止脚本，直到爬行完成。python中以非阻塞、异步方式运行爬虫的最简单方法是什么？

python scrapy

net*_*men

lucky-day

3
推荐指数

1
解决办法

2371
查看次数