soj*_*wok 7 python scrapy python-3.x scrapy-spider
我需要的:
我试试这个:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep
while True:
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()
sleep(60)
Run Code Online (Sandbox Code Playgroud)
但得到错误:
twisted.internet.error.ReactorNotRestartable
请帮我做对
Python 3.6
Scrapy 1.3.2
Linux
我想我找到了解决方案:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task
timeout = 60
def run_spider():
l.stop()
runner = CrawlerRunner(get_project_settings())
d = runner.crawl('spider_name')
d.addBoth(lambda _: l.start(timeout, False))
l = task.LoopingCall(run_spider)
l.start(timeout)
reactor.run()
Run Code Online (Sandbox Code Playgroud)
为了避免 ReactorNotRestartable 错误,您可以尝试创建一个main.py文件,从那里使用subprocesses
.
这个 main.py 文件可能是这样的:
from time import sleep
import subprocess
timeout = 60
while True:
command = 'scrapy crawl yourSpiderName'
subprocess.run(command, shell=True)
sleep(timeout)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1645 次 |
最近记录: |