Ami*_*der 3 python splash-screen scrapy redis
我正在尝试从 redis (rq) 中检索一个函数,它生成一个 CrawlerProcess 但我得到了
工作马进程意外终止(waitpid 返回 11)
控制台日志:
将作业移至“失败”队列(工作马意外终止;waitpid 返回 11)
在我用评论标记的那一行
这条线杀死了程序
我究竟做错了什么?我该如何解决?
这个函数我从 RQ 中检索得很好:
def custom_executor(url):
process = CrawlerProcess({
'USER_AGENT': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36",
'DOWNLOAD_TIMEOUT': 20000, # 100
'ROBOTSTXT_OBEY': False,
'HTTPCACHE_ENABLED': False,
'REDIRECT_ENABLED': False,
'SPLASH_URL': 'http://localhost:8050/',
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
'DOWNLOADER_MIDDLEWARES': {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
},
'SPIDER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': True,
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': True,
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': True,
'scrapy.extensions.closespider.CloseSpider': True,
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
})
### THIS LINE KILL THE PROGRAM
process.crawl(ExtractorSpider,
start_urls=[url, ], es_client=es_get_connection(),
redis_conn=redis_get_connection())
process.start()
Run Code Online (Sandbox Code Playgroud)
这是我的 ExtractorSpider:
class ExtractorSpider(Spider):
name = "Extractor Spider"
handle_httpstatus_list = [301, 302, 303]
def parse(self, response):
yield SplashRequest(url=url, callback=process_screenshot,
endpoint='execute', args=SPLASH_ARGS)
Run Code Online (Sandbox Code Playgroud)
谢谢
| 归档时间: |
|
| 查看次数: |
3166 次 |
| 最近记录: |