big*_*add 4 python bigdata airflow
我是气流方面的新手,所以我在这里有疑问。
如果满足第一个任务的条件,我想运行 DAG。如果条件不满足,我想在第一个任务之后停止该任务。
例子:
# first task
def get_number_func(**kwargs):
number = randint(0, 10)
print(number)
if (number >= 5):
print('A')
return 'continue_task'
else:
#STOP DAG
# second task if number is higher or equal 5
def continue_func(**kwargs):
print("The number is " + str(number))
# first task declaration
start_op = BranchPythonOperator(
task_id='get_number',
provide_context=True,
python_callable=get_number_func,
op_kwargs={},
dag=DAG,
)
# second task declaration
continue_op = PythonOperator(
task_id='continue_task',
provide_context=True,
python_callable=continue_func,
op_kwargs={},
dag=DAG,
)
start_op >> continue_op
Run Code Online (Sandbox Code Playgroud)
如果满足数量条件,我只会运行第二个任务。如果条件未得到验证,DAG 不应运行第二个任务。
我怎样才能做到这一点?我不想使用 xcom、全局变量或虚拟任务。
提前致谢!
Jos*_*ell 10
你检查过了吗ShortCircuitOperator?此任务根据条件是 True 还是 False 来控制您的任务流程。如果条件为 True,则下游任务将继续。否则,将跳过所有下游任务。尝试将您的第一个任务更改为 aShortCircuitOperator并更新get_number_func函数以返回 True 或 False。
这是我使用您的代码进行的测试:
from airflow.decorators import dag, task
from airflow.models import DAG
from airflow.operators.python import PythonOperator, ShortCircuitOperator
from datetime import datetime
default_args = dict(
start_date=datetime(2021, 4, 26),
owner="me",
retries=0,
)
dag_args = dict(
dag_id="short_circuit",
schedule_interval=None,
default_args=default_args,
catchup=False,
)
def get_number_func(**kwargs):
from random import randint
number = randint(0, 10)
print(number)
if number >= 5:
print("A")
return True
else:
# STOP DAG
return False
def continue_func(**kwargs):
pass
with DAG(**dag_args) as dag:
# first task declaration
start_op = ShortCircuitOperator(
task_id="get_number",
provide_context=True,
python_callable=get_number_func,
op_kwargs={},
)
# second task declaration
continue_op = PythonOperator(
task_id="continue_task",
provide_context=True,
python_callable=continue_func,
op_kwargs={},
)
start_op >> continue_op
Run Code Online (Sandbox Code Playgroud)