Scrapy process.crawl()将数据导出到json

Question

Scrapy process.crawl()将数据导出到json

Car*_*ele 4 python json web-crawler scrapy

这可能是在Scrapy python中将参数传递给process.crawl的一个子问题,但作者将答案(不能回答我问自己的子问题)标记为令人满意的答案.

这是我的问题:我不能使用scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
相反我想要/需要使用crawlerProcess.crawl(spider)我已经想出几种方式来传递参数(无论如何它在我链接的问题中得到回答)但我无法理解我应该如何告诉它将数据转储到myData.json ... -o myData.json部分
任何人都有建议吗？或者我只是不理解它应该如何工作..？

这是代码:

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.

dispatcher.connect(handleSpiderIdle, signals.spider_idle)

log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

Run Code Online (Sandbox Code Playgroud)

Answer 1

eLR*_*uLL 5

您需要在设置上指定它:

process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',
})

process.crawl(MySpider)
process.start()

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，7 月前
查看次数：	924 次
最近记录：	9 年，7 月前