如何在 asyncio 偶数循环中运行scrapy spider？

Question

如何在 asyncio 偶数循环中运行scrapy spider？

看来我走到了死胡同。有没有办法在 asyncio 循环中运行scrapy spider？例如在下面的代码中：

import asyncio
from scrapy.crawler import CrawlerProcess
from myscrapy import MySpider
import scrapy

async def do_some_work():
    process = CrawlerProcess()
    await process.crawl(MySpider)

loop = asyncio.get_even_loop()
loop.run_until_complete(do_some_work())

Run Code Online (Sandbox Code Playgroud)

这导致我的错误：

raise TypeError('A Future, a coroutine or an awaitable is required')
TypeError: A Future, a coroutine or an awaitable is required

Run Code Online (Sandbox Code Playgroud)

我确实明白在 await 之后应该有另一个协程。有什么方法可以绕过它并仍然使其异步工作？谢谢

Answer 1

小智 0

整个scrapy都是同步代码。每当发生阻塞时，没有异步机制（协程）将正在运行的资源返回到选择循环。主要的阻塞是网络请求。scrapy 使用的库不支持 asyncio。所以也许你打开scrapy源码实现asyncio或aiohttp来替换原来的网络库，这样就可以了。然而在这些库之上，还有复杂的扭曲模块。（类似于 asyncio，虽然从 python 2 开始没有 asyncio 那么快）。可能比从头开始使用 asyncio 构建一个新框架更困难。

这是不正确的。Scrapy 使用 Twisted 进行事件驱动网络，因此不完全同步。[来源](https://doc.scrapy.org/en/latest/topics/architecture.html#event-driven-networking) (2认同)

归档时间：	8 年前
查看次数：	2279 次
最近记录：	6 年，10 月前