如何使用工人初始化时使用的变量启动芹菜工人

Kon*_*arz 3 python celery

是否可以选择在启动时将变量传递给 celery worker 并在执行时在 worker 内部使用它?

我正在编写负责机器学习培训和评估的服务器。我想动态启动新的 worker 实例并将变量传递给它,该变量将用于在内部加载特定模型。

我找到了如何使用方法从这里的答案开始工作worker_main

我在考虑两种解决方案:

  1. 将其设置为环境变量。此解决方案的问题在于,当同时创建两个 worker 实例时,它可能会损坏。

  2. 将其作为 argv 传递,但我不知道如何读取 worker 内部的变量。


编辑

我找到了这个线程,但它只讨论了在任务中访问自定义参数。我的问题是关于在工人初始化时访问它。

灵感来自这个线程我会尝试用芹菜信号。 http://docs.celeryproject.org/en/latest/userguide/signals.html#worker-init

Kon*_*arz 5

Maybe my question wasn't accurate enough but I found answer by myself with doc and stackoverflow threads.

I wanted to run separate worker for Keras model. In worker initialization I needed to load model to memory and in tasks model was used for prediction.

My solution:

  1. Name worker with model_id (since id is unique and I need only one worker per model)
  2. On celeryd_after_setup signal function I parsed name and set global variable in worker
  3. On worker_process_init signal function I loaded model in my case it was static fields in Grasper class
  4. In task I used static fields from Grasper class

Bellow some code exactly describing the solution.

from celery.signals import worker_process_init, celeryd_after_setup
from celery.concurrency import asynpool

# my custom class containing static fields for model and tokenizer
# it also can be global variable as model_id
from myapp.ml import Grasper

# set to have some time for model loading otherwise worker_process_init can terminate
asynpool.PROC_ALIVE_TIMEOUT = 100.0
model_id = None

@celeryd_after_setup.connect()
def set_model_id(sender, instance, **kwargs):
    global model_id
    model_id = instance.hostname.split('@')[1]

@worker_process_init.connect()
def configure_worker(signal=None, sender=None, **kwargs):
    Grasper.load_model(model_id)
Run Code Online (Sandbox Code Playgroud)

Then in celery task you can use Grasper class with loaded model. This solution works but I know there is a place for improvement so if you have some ideas please comment.