安装的 Reactor 与请求的不匹配

Ome*_*001 5 python scrapy python-asyncio playwright

我试图在 Quotes.toscrape.com/scroll 上运行滚动示例时运行 scrape-playwrights 文档上的示例,但由于反应器问题,我什至无法进行抓取:

URL SPIDER TEST
***********************
SCRAPE STARTED
***********************
2022-08-11 15:47:38 [scrapy.crawler] INFO: Overridden settings: {'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} crawled: <Deferred at 0x11ef17e50 current result: <twisted.python.failure.Failure builtins.Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)>>
Run Code Online (Sandbox Code Playgroud)

代码是:

import csv
import json
import pygsheets
import scrapy
from scrapy_playwright.page import PageMethod
import json
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
from scrapy.utils.reactor import install_reactor
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
import datetime as dt
from datetime import date
from twisted.internet import reactor, defer
import tempfile

def breaker(comment):
    print('***********************')
    print(comment)
    print('***********************')

class UrlSpider(scrapy.Spider):
    name = "Url"

    custom_settings={
        'DOWNLOAD_HANDLERS':{
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
    }

    def start_requests(self):
        yield scrapy.Request(
            url='http://quotes.toscrape.com/scroll',
            meta=dict(
                playwright=True,
                playwright_include_page=True,
                playwright_page_methods=[
                    PageMethod('wait_for_selector','div.quote'),
                    PageMethod('evaluate','window.scrollBy(0, document.body.scrollHeight)'),
                    PageMethod('wait_for_selector','div.quote:nth-child(11)'),
                ],
            ),
        )
async def parse(self, response):
    page=response.meta['playwright_page']
    await page.screenshot(path='quotes.png',full_page=True)
    await page.close()
    return {'quotes_count':len(response.css('div.quote'))}

print('URL SPIDER TEST')

configure_logging()
settings=get_project_settings()
runner = CrawlerRunner(settings)

@defer.inlineCallbacks
def crawl():
    breaker('SCRAPE STARTED')
    bug=runner.crawl(UrlSpider)
    reactor.close()
    yield bug

url_list=crawl()
print('crawled: '+str(url_list))
reactor.run()
Run Code Online (Sandbox Code Playgroud)

我花了几个小时试图找到解决方案,但没有成功,我使用 CrawlerRunner 是因为我想在某个时候自动化代码,但即使使用 CrawlerProcess 我也会收到错误。

我还使用自定义设置,因为我遇到了项目设置未通过 get_project_settings 添加的问题,自定义设置让我确保它已被使用。

如果我在自定义设置中删除扭曲反应器的设置,蜘蛛会报废并屈服,但反应器错误再次发生,并且不会检索任何内容。

小智 -2

在设置中,找到并注释: REQUEST_FINGERPRINTER_IMPLMENTATION = '2.7' TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

这是我的最后几行