在使用 Channels 实现 WebSockets 后,Scrapy 蜘蛛无法在 Django 上工作(无法从异步上下文中调用它)

Ask*_*kew 5 django websocket scrapy scrapyd django-channels

我正在提出一个新问题,因为我在 Django 应用程序中遇到了 Scrapy 和 Channels 的问题,如果有人能够指导我正确的方向,我将不胜感激。

我使用通道的原因是因为我想从 Scrapyd API 实时检索抓取状态,而不必一直使用 setIntervals,因为这应该成为一个 SaaS 服务,可能会被许多人使用用户。

如果我运行的话,我已经正确实现了通道:

python manage.py runserver
Run Code Online (Sandbox Code Playgroud)

我可以正确地看到系统现在正在使用 ASGI:

System check identified no issues (0 silenced).
September 01, 2020 - 15:12:33
Django version 3.0.7, using settings 'seotoolkit.settings'
Starting ASGI/Channels version 2.4.0 development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Run Code Online (Sandbox Code Playgroud)

此外,客户端和服务器通过 WebSocket 正确连接:

WebSocket HANDSHAKING /crawler/22/ [127.0.0.1:50264]
connected {'type': 'websocket.connect'}
WebSocket CONNECT /crawler/22/ [127.0.0.1:50264]
Run Code Online (Sandbox Code Playgroud)

到目前为止一切顺利,当我通过 Scrapyd-API 运行 scrapy 时出现问题

2020-09-01 15:31:25 [scrapy.core.scraper] ERROR: Error processing {'url': 'https://www.example.com'}
raceback (most recent call last):
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/scrapy/utils/defer.py", line 157, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/private/var/folders/qz/ytk7wml54zd6rssxygt512hc0000gn/T/crawler-1597767314-spxv81dy.egg/webspider/pipelines.py", line 67, in process_item
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 411, in get
    num = len(clone)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1150, in execute_sql
    cursor = self.connection.cursor()
  File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/utils/asyncio.py", line 24, in inner
    raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.
Run Code Online (Sandbox Code Playgroud)

我认为错误消息非常清楚:你不能从异步上下文中调用它 - 使用线程或sync_to_async =我猜想通过启用ASGI会与Scrapy库发生冲突,从而阻止它正常工作。

不幸的是,我无法理解这背后的原因,也不明白我应该在哪里使用建议的“线程或sync_to_async”。

请注意,WebSocket 仅用于检查爬网状态,仅用于检查爬网状态。

谁能尝试向我解释这种不兼容背后的原因,并给我一些如何克服这个障碍的提示?我花了很多时间寻找答案,但找不到任何答案。

多谢。

Dau*_*med 2

您只需转到 pipelines.py 文件即可解决此错误。从asgiref.sync导入sync_to_async

from asgiref.sync import sync_to_async
Run Code Online (Sandbox Code Playgroud)

导入sync_to_async后,您需要将其用作用于将数据存储到数据库的函数的装饰器。

例如

from itemadapter import ItemAdapter
from crawler.models import Movie
from asgiref.sync import sync_to_async


class MovieSpiderPipeline:
    @sync_to_async
    def process_item(self, item, spider):
        movie = Movie(**item)
        movie.save()
        return item

Run Code Online (Sandbox Code Playgroud)