使用插件导入DAG时出现气流错误 - 只能在操作员之间设置关系

Jas*_*ith 6 airflow google-cloud-composer

我编写了一个气流插件,它只包含一个自定义运算符(以支持BigQuery中的CMEK).我可以使用单个任务创建一个简单的DAG,该任务使用此运算符并且执行正常.

但是,如果我尝试在DAG中从DummyOperator任务创建依赖关系到我的自定义操作员任务,则DAG无法在UI中加载并抛出以下错误,我无法理解为什么会抛出此错误?

破坏的DAG:[/home/airflow/gcs/dags/js_bq_custom_plugin_v2.py]关系只能在运营商之间设置; 收到BQCMEKOperator

到目前为止,我已经在composer-1.4.2-airflow-1.9.0,composer-1.4.2-airflow-1.10.0和composer-1.4.1-airflow-1.10.0上进行了测试.

每个任务的运行气流测试都可以顺利完成.

在DAG中单独使用它可以正常工作(如下所示)所以我不相信插件本身存在任何错误

import datetime
import logging
from airflow.models import DAG
from airflow.operators.bq_cmek_plugin import BQCMEKOperator


default_dag_args = {
    'start_date': datetime.datetime(2019,1,1),
    'retries': 0
}


dag = DAG(
    'js_bq_custom_plugin',
    schedule_interval=None,
    catchup=False,
    concurrency=1,
    max_active_runs=1,
    default_args=default_dag_args)

run_this = BQCMEKOperator(
    task_id     = 'cmek_plugin_test',
    sql         = 'select * from ds.foo LIMIT 15',
    project     = 'xxx',
    dataset     = 'js_dev',
    table       = 'cmek_test10',
    key         = 'xxx',
    dag     = dag
)
Run Code Online (Sandbox Code Playgroud)

然而,如果我引入DummyOperator和依赖项,则会发生错误

import datetime
import logging
from airflow.models import DAG
from airflow.operators.bq_cmek_plugin import BQCMEKOperator
from airflow.operators.dummy_operator import DummyOperator

default_dag_args = {
    'start_date': datetime.datetime(2019,1,1),
    'retries': 0
}

dag = DAG(
    'js_bq_custom_plugin_v2',
    schedule_interval=None,
    catchup=False,
    concurrency=1,
    max_active_runs=1,
    default_args=default_dag_args)

etl_start = DummyOperator(task_id='etl_start', dag=dag)

extract = BQCMEKOperator(
    task_id     = 'extract',
    sql         = 'select * from foo.bar LIMIT 15',
    project     = 'xxx',
    dataset     = 'js_dev',
    table       = 'cmek_test5',
    key         = 'xxx',
    dag         = dag
)

etl_start.set_downstream(extract)
Run Code Online (Sandbox Code Playgroud)

操作员本身很简单,我可以用最简单的自定义操作符(如下面的操作符)重现问题

import logging
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults


class TestOperator(BaseOperator):

    @apply_defaults
    def __init__(self,
                *args,
                **kwargs):
        super(TestOperator, self).__init__(*args, **kwargs)


    def execute(self, context):
        logging.info("Executed by TestOperator")
Run Code Online (Sandbox Code Playgroud)

使用init .py中的以下插件定义

from airflow.plugins_manager import AirflowPlugin
from test_plugin.operators.test_operator import TestOperator

class TestPlugin(AirflowPlugin):
    name = "test_plugin"
    operators = [TestOperator]
    hooks = []
    executors = []
    macros = []
    admin_views = []
    flask_blueprints = []
    menu_links = []
Run Code Online (Sandbox Code Playgroud)

还查看了models.py中生成此错误的气流代码,它使用isinstance(t,BaseOperator),当我在python中运行它时,使用我的自定义运算符返回true,所以我不知道发生了什么?

for t in task_list:
    if not isinstance(t, BaseOperator):
        raise AirflowException(
            "Relationships can only be set between "
            "Operators; received {}".format(t.__class__.__name__))
Run Code Online (Sandbox Code Playgroud)

Fen*_* Lu 5

我们已经修复了composer-1.4.2版本中引入的一个错误,尝试创建一个新的Composer环境并且DAG错误应该消失.同时,我们还将在接下来的几天内自动在现有1.4.2环境中应用该修复程序.