Airflow SSH 在警告后收到 SIGTERM:记录的 pid 1098 与当前 pid 31631 不匹配

Ale*_*par 5 ssh paramiko airflow

我在这里需要一些帮助。

在docker容器+LocalExecutor上运行airflow。

Airflow版本是2.0.0(https://pypi.org/project/apache-airflow/2.0.0/

我正在使用 SSHOperator 的包装器运行一个长时间运行的任务。基本上,我打开一个 SSH 会话来在 Spark Edge 节点中运行 Spark-Submit 作业。(YARN JOB 成功,但气流任务失败)

任务以 PID 31675 开始:

[2021-06-24 18:29:09,664] {standard_task_runner.py:51} INFO - Started process 31675 to run task
Run Code Online (Sandbox Code Playgroud)

然后一段时间后收到此警告:

记录的pid 1098与当前pid 31631不匹配

然后任务失败:

[2021-06-24 19:45:44,493] {local_task_job.py:166} WARNING - Recorded pid 1098 does not match the current pid 31631
[2021-06-24 19:45:44,496] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 31675
[2021-06-24 19:45:44,496] {taskinstance.py:1214} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-06-24 19:45:44,528] {taskinstance.py:1396} ERROR - LatamSSH operator error: Task received SIGTERM signal
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/latamairflow/operators/latam_ssh_operator.py", line 453, in execute
    readq, _, _ = select([channel], [], [], self.timeout)
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1216, in signal_handler
    raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.6/site-packages/latamairflow/operators/latam_ssh_operator.py", line 502, in execute
    raise AirflowException("LatamSSH operator error: {0}".format(str(e)))
airflow.exceptions.AirflowException: LatamSSH operator error: Task received SIGTERM signal
[2021-06-24 19:45:44,529] {taskinstance.py:1440} INFO - Marking task as FAILED.
Run Code Online (Sandbox Code Playgroud)