我正在使用集群Airflow环境,其中我有四个用于服务器的AWS ec2实例.
EC2-实例
我的设置已经完美地工作了三个月了,但偶尔每周一次,当Airflow试图记录某些东西时,我得到了一个断管异常.
*** Log file isn't local.
*** Fetching here: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07-13T00:00:00/1.log
[2018-07-16 00:00:15,521] {cli.py:374} INFO - Running on host ip-1-2-3-4
[2018-07-16 00:00:15,698] {models.py:1197} INFO - Dependencies all met for <TaskInstance: foobar.task_1 2018-07-13 00:00:00 [queued]>
[2018-07-16 00:00:15,710] {models.py:1197} INFO - Dependencies all met for <TaskInstance: foobar.task_1 2018-07-13 00:00:00 [queued]>
[2018-07-16 00:00:15,710] {models.py:1407} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
[2018-07-16 00:00:15,719] {models.py:1428} INFO - Executing <Task(OmegaFileSensor): task_1> on 2018-07-13 00:00:00
[2018-07-16 00:00:15,720] {base_task_runner.py:115} …Run Code Online (Sandbox Code Playgroud) 我有一个使用Airflow版本1.9的Airflow环境,该环境在Amazon EC2实例上运行.我需要升级到Airflow的最新版本1.10.我可以选择从1.9版升级或在新服务器上新安装1.10.气流版本1.10未在Pip上列出,所以我通过此命令从Git安装它,
pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable
Run Code Online (Sandbox Code Playgroud)
此命令成功安装Airflow版本1.10.您可以通过运行命令airflow version并查看输出来查看
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.0
Run Code Online (Sandbox Code Playgroud)
当我尝试启动Airflow调度程序时,airflow scheduler我得到以下异常,
ModuleNotFoundError: No module named 'MySQLdb'
[2018-08-14 14:03:16,195] {celery_executor.py:112} ERROR - Error syncing the celery executor, ignoring it:
[2018-08-14 14:03:16,195] {celery_executor.py:113} ERROR - …Run Code Online (Sandbox Code Playgroud)