在 Heroku 中使用 Selenium 以及 Python、FastAPI 和 Celery 时,错误 R14(超出内存配额)导致 TimeoutException

Kac*_*per 3 python selenium heroku out-of-memory celery

我构建了一个抓取器,可以从页面收集数据,对其进行格式化并将其添加到数据库中。然后,它使用抓取的数据来构建模型,但抓取的一个值除外。一切都包含在 Celery 中,以便任务在后台运行。

@router.post("/run/{id}")
async def create(id: str):
    wallet_reputation.delay(id)

    return {"Status": "Task successfully add to execute"}
Run Code Online (Sandbox Code Playgroud)

上面的端点工作正常,一切正常。在上述端点中添加的 ID 值是唯一的,大约有 100 个这样的值。为了自动为每个 ID 构建模型,我创建了这样一个端点来不时调用它(抓取数据更改,因此我需要更新我的模型)。

@router.post("/run")
async def create_all():
    for address in all_addresses_generator():
        wallet_reputation.delay(address)

    return {"Status": "Tasks successfully add to execute"}
Run Code Online (Sandbox Code Playgroud)

我收到该错误

2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723+00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724+00:00 app[worker.1]:     R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725+00:00 app[worker.1]:   File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728+00:00 app[worker.1]:     raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728+00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message: 
2022-03-26T15:26:02.875729+00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>
Run Code Online (Sandbox Code Playgroud)

我不明白为什么如果在 Celery 中执行相同任务的前一个端点正常工作,我会突然收到错误。下面,我粘贴了生成器和类方法的代码,其中弹出了错误。

2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723+00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724+00:00 app[worker.1]:     R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725+00:00 app[worker.1]:   File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728+00:00 app[worker.1]:     raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728+00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message: 
2022-03-26T15:26:02.875729+00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>
Run Code Online (Sandbox Code Playgroud)
def all_addresses_generator():
    for row in session.query(DbNcTransaction).all():
        yield row.to
Run Code Online (Sandbox Code Playgroud)

我该如何处理这个问题?

Deb*_*anB 5

这个错误信息...

2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException
Run Code Online (Sandbox Code Playgroud)

...意味着由于程序超出内存配额而初始化时出现错误,因此引发了TimeoutException 。ForkPoolWorker-8


深潜

这是内存不足错误的典型示例,其中内存使用量已超过最大级别。

Process running mem=543M(104.1%)
Run Code Online (Sandbox Code Playgroud)

现在,在使用543M期间,内存使用率为104.1%,大概根据您必须使用的Dyno 内存规格:

免费、爱好和标准-1x 有 512 MB


戴诺斯

Heroku平台使用容器模型来运行和扩展所有 Heroku 应用程序,这些容器称为dynos。Dyno 是隔离的虚拟化 Linux 容器,旨在根据用户指定的命令执行代码。应用程序可以根据其资源需求扩展到任何指定数量的测功机。


错误 R14(超出内存配额)

有时,测功机需要的内存可能超过其分配的配额。在这些特殊情况下,dyno 将分页到交换空间以继续运行,这有时可能会导致进程性能下降。这种现象会开始产生 R14 错误,该错误是通过总内存交换、RSS 和缓存计算得出的,如下所示:

2011-05-03T17:40:10+00:00 app[worker.1]: Working
2011-05-03T17:40:10+00:00 heroku[worker.1]: Process running mem=1028MB(103.3%)
2011-05-03T17:40:11+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2011-05-03T17:41:52+00:00 app[worker.1]: Working
Run Code Online (Sandbox Code Playgroud)

解决 R14 内存错误

在这些情况下,您可能希望应用程序使用更少的内存,并且您可能需要调整以下提到的因素之一:

  • 线程数
  • 最大可能的请求
  • 传入请求的分布
  • 减少线程数以减少内存需求(但这可能会降低吞吐量)
  • 通过横向扩展来增加容量,例如添加额外的测功机/服务器

一般来说,随着更多的服务器/测功机投入运行,分散请求,并且单个计算机上的所有线程同时处理最大请求的事件减少,增加容量的效果非常好。然而,从长远来看,减少总体内存需求的最佳途径是减少对象分配。


这个用例

在此用例中,似乎按照第一个代码块,即def create(id: str)大约 100 个 ID 值来自动为应用程序能够扩展的每个 ID 构建模型,但随后当您def create_all()开始看到错误时。


解决方案

除了为 go 中的每个 ID 创建所有模型之外,您还可以采用不同的方法。如果可能,将 ID 值划分为批次运行,每个批次包含最佳数量的模型,这样内存使用量就不会超过阈值。