如何为 Apache Airflow DAG 定义超时?

Bab*_*ani 9 airflow

我使用的是 airflow 1.10.2,但 Airflow 似乎忽略了我为 DAG 设置的超时。

我正在使用dagrun_timeout参数(例如 20 秒)为 DAG 设置超时期限,并且我有一个需要 2 分钟才能运行的任务,但气流将 DAG 标记为成功!

args = {
    'owner': 'me',
    'start_date': airflow.utils.dates.days_ago(2),
    'provide_context': True
}

dag = DAG('test_timeout',
          schedule_interval=None,
          default_args=args,
          dagrun_timeout=timedelta(seconds=20))

def this_passes(**kwargs):
    return

def this_passes_with_delay(**kwargs):
    time.sleep(120)
    return

would_succeed = PythonOperator(task_id='would_succeed',
                               dag=dag,
                               python_callable=this_passes,
                               email=to)

would_succeed_with_delay = PythonOperator(task_id='would_succeed_with_delay',
                            dag=dag,
                            python_callable=this_passes_with_delay,
                            email=to)

would_succeed >> would_succeed_with_delay
Run Code Online (Sandbox Code Playgroud)

不会抛出任何错误消息。我是否使用了错误的参数?

Luc*_*cas 23

源代码所述

:param dagrun_timeout: specify how long a DagRun should be up before
    timing out / failing, so that new DagRuns can be created. The timeout
    is only enforced for scheduled DagRuns, and only once the
    # of active DagRuns == max_active_runs.
Run Code Online (Sandbox Code Playgroud)

所以这可能是您设置时的预期行为schedule_interval=None。在这里,这个想法是为了确保预定的 DAG 不会永远持续下去并阻止后续的运行实例。

现在,您可能对execution_timeout所有运营商中的可用内容感兴趣。例如,您可以PythonOperator像这样设置 60 秒超时:

:param dagrun_timeout: specify how long a DagRun should be up before
    timing out / failing, so that new DagRuns can be created. The timeout
    is only enforced for scheduled DagRuns, and only once the
    # of active DagRuns == max_active_runs.
Run Code Online (Sandbox Code Playgroud)

  • @VendableFall 有点搞笑,关于打字错误的评论有打字错误。 (3认同)