与芹菜正在进行的任务互动

Question

与芹菜正在进行的任务互动

Jul*_*lio 27 python rabbitmq celery python-rq

我们有一个基于rabbitMQ和的分布式架构Celery.我们可以并行启动多个任务而不会出现任何问题.可扩展性很好.

现在我们需要远程控制任务:PAUSE,RESUME,CANCEL.我们发现的唯一解决方案是在Celery任务中对DB请求后回复命令的另一个任务进行RPC调用.Celery任务和RPC任务不在同一台机器上,只有RPC任务可以访问数据库.

您是否有任何建议如何改进它并轻松地与正在进行的任务沟通？谢谢

编辑: 事实上,我们想做的事情如下图所示.这很容易进行Blue配置或者Orange,但是我们不知道如何同时进行这两种配置. 在此输入图像描述工人正在订阅一个共同点Jobs queue,每个工人都有自己Admin queue在交易所申报的.

编辑: 如果这是不可能的Celery,我愿意接受其他框架的解决方案python-rq.

Answer 1

Nic*_*rot 5

它看起来像Control Bus pattern.

为了获得更好的可伸缩性并为了减少RPC调用,我建议颠倒逻辑.PAUSE, RESUME, CANCEL当状态发生变化时,该命令通过控制总线推送到Celery任务.Celery应用程序将Celery应用程序的当前状态存储在商店中(可以在内存中,在文件系统上......).如果即使在停止/启动应用程序之后也必须保留任务状态,则将涉及更多工作以使两个应用程序保持同步(例如,启动时的同步).

Answer 2

Cha*_*ase 0

我想演示通过工作流模式实现可暂停（和可恢复）正在进行的celery 任务的通用方法。注：原始答案写在这里。由于这篇文章非常相关，所以在这里重写。

概念

使用celery 工作流程- 您可以将整个操作设计为分为chain多个任务。它不一定是纯粹的链，但它应该遵循一个任务在另一个任务（或任务group）完成后的一般概念。

一旦您拥有这样的工作流程，您最终可以定义整个工作流程的暂停点。在每个点，您都可以检查前端用户是否已请求操作暂停并相应地继续。这个概念是这样的：-

一个复杂且耗时的操作 O 被分为 5 个 celery 任务 - T1、T2、T3、T4 和 T5 - 每个任务（第一个任务除外）都依赖于前一个任务的返回值。

假设我们定义了每个任务之后的暂停点，因此工作流程如下所示：

T1执行
T1完成，检查用户是否请求暂停
- 如果用户未请求暂停 - 继续
- 如果用户请求暂停，则序列化剩余的工作流程链并将其存储在某个地方以便稍后继续

... 等等。由于每个任务之后都有一个暂停点，因此在每个任务之后都会执行该检查（当然最后一个任务除外）。

但这只是理论，我很难在网上的任何地方找到它的实现，所以这就是我想出的-

执行

from typing import Any, Optional

from celery import shared_task
from celery.canvas import Signature, chain, signature

@shared_task(bind=True)
def pause_or_continue(
    self, retval: Optional[Any] = None, clause: dict = None, callback: dict = None
):
    # Task to use for deciding whether to pause the operation chain
    if signature(clause)(retval):
        # Pause requested, call given callback with retval and remaining chain
        # chain should be reversed as the order of execution follows from end to start
        signature(callback)(retval, self.request.chain[::-1])
        self.request.chain = None
    else:
        # Continue to the next task in chain
        return retval


def tappable(ch: chain, clause: Signature, callback: Signature, nth: Optional[int] = 1):
    '''
    Make a operation workflow chain pause-able/resume-able by inserting
    the pause_or_continue task for every nth task in given chain

    ch: chain
        The workflow chain

    clause: Signature
        Signature of a task that takes one argument - return value of
        last executed task in workflow (if any - othewise `None` is passsed)
        - and returns a boolean, indicating whether or not the operation should continue

        Should return True if operation should continue normally, or be paused

    callback: Signature
        Signature of a task that takes 2 arguments - return value of
        last executed task in workflow (if any - othewise `None` is passsed) and
        remaining chain of the operation workflow as a json dict object
        No return value is expected

        This task will be called when `clause` returns `True` (i.e task is pausing)
        The return value and the remaining chain can be handled accordingly by this task

    nth: Int
        Check `clause` after every nth task in the chain
        Default value is 1, i.e check `clause` after every task
        Hence, by default, user given `clause` is called and checked
        after every task

    NOTE: The passed in chain is mutated in place
    Returns the mutated chain
    '''
    newch = []
    for n, sig in enumerate(ch.tasks):
        if n != 0 and n % nth == nth - 1:
            newch.append(pause_or_continue.s(clause=clause, callback=callback))
        newch.append(sig)
    ch.tasks = tuple(newch)
    return ch

Run Code Online (Sandbox Code Playgroud)

解释 -`pause_or_continue`

这里pause_or_continue就是前面提到的暂停点。这是一个将以特定间隔调用的任务（间隔如任务间隔，而不是时间间隔）。然后，该任务调用用户提供的函数（实际上是一个任务）——clause以检查该任务是否应该继续。

如果clause函数（实际上是一个任务）返回，则调用True用户提供的函数，最新的返回值（如果有 -否则）将传递到此回调，以及剩余的任务链。执行它需要执行的操作并设置为，这告诉 celery“任务链现在是空的 - 一切都已完成”。callbackNonecallbackpause_or_continueself.request.chainNone

如果clause函数（实际上是一个任务）返回False，则前一个任务的返回值（如果有的话 -None否则）将被返回以供下一个任务接收 - 并且链会继续下去。因此工作流程继续进行。

为什么是clause任务callback签名而不是常规函数？

和clause都callback被直接调用- 没有delayor apply_async。它在当前进程、当前上下文中执行。所以它的行为与普通函数完全相同，那么为什么要使用呢signatures？

答案是序列化。您无法方便地将常规函数对象传递给 celery 任务。但您可以传递任务签名。这正是我在这里所做的。和clause都callback应该是celery 任务的常规对象。 signature

什么是self.request.chain？

self.request.chain存储一个字典列表（代表 json，因为 celery 任务序列化器默认为 json）——每个字典代表一个任务签名。该列表中的每个任务都按相反顺序执行。这就是为什么在传递给用户提供的callback函数（实际上是一个任务）之前列表被颠倒的原因 - 用户可能希望任务的顺序从左到右。

快速说明：与本讨论无关，但如果您使用link参数 fromapply_async来构造链而不是chain基元本身。self.request.callback是要修改的属性（即设置None为删除回调和停止链）而不是self.request.chain

解释 -tappable

tappable只是一个基本函数，它采用一个链（为简洁起见，这是这里介绍的唯一工作流原语）并pause_or_continue在每个nth任务之后插入。您可以将它们插入到您真正想要的任何位置，由您在操作中定义暂停点。这只是一个例子！

对于每个chain对象，任务的实际签名（按从左到右的顺序）存储在属性中.tasks。它是任务签名的元组。因此，我们所要做的就是将此元组转换为列表，插入暂停点并转换回元组以分配给链。然后返回修改后的链对象。

和clause也callback附在pause_or_continue签名上。正常的芹菜东西。

这涵盖了主要概念，但为了展示使用此模式的真实项目（以及展示暂停任务的恢复部分），这里有一个包含所有必要资源的小演示

用法

此示例使用假设具有数据库的基本 Web 服务器的概念。每当一个操作（即工作流链）启动时，都会为其分配一个 ID并存储到数据库中。该表的架构看起来像 -

-- Create operations table -- Keeps track of operations and the users that started them CREATE TABLE operations ( id INTEGER PRIMARY KEY AUTOINCREMENT, requester_id INTEGER NOT NULL, completion TEXT NOT NULL, workflow_store TEXT, result TEXT, FOREIGN KEY (requester_id) REFERENCES user (id) );
Run Code Online (Sandbox Code Playgroud)
目前唯一需要了解的领域是completion. 它只存储操作的状态-

当操作开始并创建数据库条目时，它被设置为IN PROGRESS

当用户请求暂停时，路由控制器（即视图）将其修改为REQUESTING PAUSE

当操作实际暂停并调用callback(from tappable, inside ) 时，应该将其修改为pause_or_continuecallbackPAUSED

任务完成后，应将其修改为COMPLETED

一个例子clause

@celery.task() def should_pause(_, operation_id: int): # This is the `clause` to be used for `tappable` # i.e it lets celery know whether to pause or continue db = get_db() # Check the database to see if user has requested pause on the operation operation = db.execute( "SELECT * FROM operations WHERE id = ?", (operation_id,) ).fetchone() return operation["completion"] == "REQUESTING PAUSE"
Run Code Online (Sandbox Code Playgroud)
这是在暂停点调用的任务，以确定是否暂停。这是一个需要 2 个参数的函数......好吧。第一个是强制性的，tappable 要求有一个clause（并且恰好一个）参数 - 因此它可以将前一个任务的返回值传递给它（即使该返回值是None）。在这个例子中，不需要使用返回值 - 所以我们可以忽略它。

第二个参数是操作id。看，这一切clause所做的就是检查数据库中的操作（工作流）条目并查看它是否具有状态REQUESTING PAUSE。为此，它需要知道操作 ID。但clause应该是一项只有一个参数的任务，什么给出呢？

好在签名可以是部分的。当任务第一次启动并tappable创建链时。操作 id是已知的，因此我们可以获取带有一个should_pause.s(operation_id)参数的任务的签名，该参数是前一个任务的返回值。这符合!clause

一个例子callback

import os import json from typing import Any, List @celery.task() def save_state(retval: Any, chains: dict, operation_id: int): # This is the `callback` to be used for `tappable` # i.e this is called when an operation is pausing db = get_db() # Prepare directories to store the workflow operation_dir = os.path.join(app.config["OPERATIONS"], f"{operation_id}") workflow_file = os.path.join(operation_dir, "workflow.json") if not os.path.isdir(operation_dir): os.makedirs(operation_dir, exist_ok=True) # Store the remaining workflow chain, serialized into json with open(workflow_file, "w") as f: json.dump(chains, f) # Store the result from the last task and the workflow json path db.execute( """ UPDATE operations SET completion = ?, workflow_store = ?, result = ? WHERE id = ? """, ("PAUSED", workflow_file, f"{retval}", operation_id), ) db.commit()
Run Code Online (Sandbox Code Playgroud)
这是任务暂停时要调用的任务。请记住，这应该采用最后执行的任务的返回值和剩余的签名列表（按从左到右的顺序）。又多了一个参数 - operation_id- 。对此的解释与的解释相同clause。

此函数将剩余的链存储在 json 文件中（因为它是字典列表）。请记住，您可以使用不同的序列化器 - 我使用 json，因为它是 celery 使用的默认任务序列化器。

存储剩余的链后，它将completion状态更新为PAUSED并将 json 文件的路径记录到数据库中。

现在，让我们看看这些的实际效果——

启动工作流程的示例

def start_operation(user_id, *operation_args, **operation_kwargs): db = get_db() operation_id: int = db.execute( "INSERT INTO operations (requester_id, completion) VALUES (?, ?)", (user_id, "IN PROGRESS"), ).lastrowid # Convert a regular workflow chain to a tappable one tappable_workflow = tappable( (T1.s() | T2.s() | T3.s() | T4.s() | T5.s(operation_id)), should_pause.s(operation_id), save_state.s(operation_id), ) # Start the chain (i.e send task to celery to run asynchronously) tappable_workflow(*operation_args, **operation_kwargs) db.commit() return operation_id
Run Code Online (Sandbox Code Playgroud)
接收用户 ID 并启动操作工作流的函数。这或多或少是一个围绕视图/路由控制器建模的不切实际的虚拟函数。但我认为它传达了总体想法。

假设T[1-4]是操作的所有单元任务，每个任务都将前一个任务的返回作为参数。这只是普通芹菜链的一个例子，您可以随意使用您的芹菜链。

T5T4是将最终结果（来自的结果）保存到数据库的任务。因此，除了它的返回值之外，还T4需要operation_id. 哪个被传递到签名中。

暂停工作流程的示例

def pause(operation_id): db = get_db() operation = db.execute( "SELECT * FROM operations WHERE id = ?", (operation_id,) ).fetchone() if operation and operation["completion"] == "IN PROGRESS": # Pause only if the operation is in progress db.execute( """ UPDATE operations SET completion = ? WHERE id = ? """, ("REQUESTING PAUSE", operation_id), ) db.commit() return 'success' return 'invalid id'
Run Code Online (Sandbox Code Playgroud)
这采用了前面提到的修改数据库条目以更改为的completion概念REQUESTING PAUSE。一旦提交，下次pause_or_continue调用时should_pause，它就会知道用户已请求操作暂停，并且会相应地执行此操作。

恢复工作流程的示例

def resume(operation_id): db = get_db() operation = db.execute( "SELECT * FROM operations WHERE id = ?", (operation_id,) ).fetchone() if operation and operation["completion"] == "PAUSED": # Resume only if the operation is paused with open(operation["workflow_store"]) as f: # Load the remaining workflow from the json workflow_json = json.load(f) # Load the chain from the json (i.e deserialize) workflow_chain = chain(signature(x) for x in serialized_ch) # Start the chain and feed in the last executed task result workflow_chain(operation["result"]) db.execute( """ UPDATE operations SET completion = ? WHERE id = ? """, ("IN PROGRESS", operation_id), ) db.commit() return 'success' return 'invalid id'
Run Code Online (Sandbox Code Playgroud)
回想一下，当操作暂停时 - 剩余的工作流程存储在 json 中。由于我们当前将工作流程限制为chain对象。我们知道这个 json 是一个签名列表，应该将其转换为chain. 因此，我们相应地反序列化它并将其发送给 celery worker。

请注意，这个剩余的工作流程仍然具有pause_or_continue原来的任务 - 因此这个工作流程本身再次是可暂停/可恢复的。当它暂停时，workflow.json将简单地更新。

归档时间：	10 年，9 月前
查看次数：	1073 次
最近记录：	10 年，8 月前

与芹菜正在进行的任务互动

概念

执行

解释 -`pause_or_continue`

为什么是`clause`任务`callback`签名而不是常规函数？

什么是`self.request.chain`？

解释 -`tappable`

用法

一个例子`clause`

一个例子`callback`

启动工作流程的示例

暂停工作流程的示例

恢复工作流程的示例

与芹菜正在进行的任务互动

概念

执行

解释 -pause_or_continue

为什么是clause任务callback签名而不是常规函数？

什么是self.request.chain？

解释 -tappable

用法

一个例子clause

一个例子callback

启动工作流程的示例

暂停工作流程的示例

恢复工作流程的示例

解释 -`pause_or_continue`

为什么是`clause`任务`callback`签名而不是常规函数？

什么是`self.request.chain`？

解释 -`tappable`

一个例子`clause`

一个例子`callback`