Apache Airflow - 调度程序缓慢

use*_*545 5 scheduler airflow

我们正在使用带有CeleryExecutor的Airflow 1.7.3.气流调度程序设置为systemd服务,其中--num-run设置为10以停止它并每10次运行重启(如此处所示).

我们注意到,与常规循环~16秒相比,调度程序的每个第9个循环需要相当长的时间(大约160秒或更长).根据日志,这是调度程序通过刷新所有dag填充DagBag时的循环.随着我们的气流装置中的笨拙/任务数量的增加,这个时间会增加.

我们的大多数任务都非常小,只需几秒钟就可以运行,但是它们会陷入"未定义"状态并且没有排队,而调度程序正在忙着"填满dagbag".与此同时,芹菜工人闲置着.我们尝试过以下方法:

  • 增加了celeryd_concurrency(这使我们能够向工人发送更多任务)
  • 增加了non_pooled_task_slot_count(以便更多任务可以排队)
  • 也增加了并行性和dag_concurrency

所有这些措施都允许启动更多任务,只有在调度程序对它们进行排队时才会启动,而这些任务在进入刷新阶段时无效.以下是每个调度程序循环的计时:

[2016-11-07 23:18:28,106] {jobs.py:680} INFO - Starting the scheduler  
[2016-11-07 23:21:26,515] {jobs.py:744} INFO - Loop took: 16.422769 seconds  
[2016-11-07 23:21:46,186] {jobs.py:744} INFO - Loop took: 16.058172 seconds  
[2016-11-07 23:22:02,800] {jobs.py:744} INFO - Loop took: 14.410493 seconds  
[2016-11-07 23:22:21,310] {jobs.py:744} INFO - Loop took: 16.275255 seconds  
[2016-11-07 23:22:41,470] {jobs.py:744} INFO - Loop took: 17.93543 seconds  
[2016-11-07 23:22:59,176] {jobs.py:744} INFO - Loop took: 15.484449 seconds  
[2016-11-07 23:23:17,455] {jobs.py:744} INFO - Loop took: 16.130971 seconds  
[2016-11-07 23:23:35,948] {jobs.py:744} INFO - Loop took: 16.311113 seconds  
[2016-11-07 23:23:55,043] {jobs.py:744} INFO - Loop took: 16.830728 seconds  
[2016-11-07 23:26:57,044] {jobs.py:744} INFO - Loop took: 179.613778 seconds  
[2016-11-07 23:27:09,328] {jobs.py:680} INFO - Starting the scheduler  
[2016-11-07 23:29:57,988] {jobs.py:744} INFO - Loop took: 16.881139 seconds  
[2016-11-07 23:30:17,584] {jobs.py:744} INFO - Loop took: 17.021958 seconds  
[2016-11-07 23:30:36,062] {jobs.py:744} INFO - Loop took: 16.148552 seconds  
[2016-11-07 23:30:56,975] {jobs.py:744} INFO - Loop took: 18.532384 seconds  
[2016-11-07 23:31:16,214] {jobs.py:744} INFO - Loop took: 16.907037 seconds  
[2016-11-07 23:31:39,060] {jobs.py:744} INFO - Loop took: 15.637057 seconds  
[2016-11-07 23:31:56,231] {jobs.py:744} INFO - Loop took: 15.003683 seconds  
[2016-11-07 23:32:13,618] {jobs.py:744} INFO - Loop took: 15.215657 seconds  
[2016-11-07 23:32:35,738] {jobs.py:744} INFO - Loop took: 19.938704 seconds  
[2016-11-07 23:35:33,905] {jobs.py:744} INFO - Loop took: 176.030812 seconds  
[2016-11-07 23:35:45,908] {jobs.py:680} INFO - Starting the scheduler 
Run Code Online (Sandbox Code Playgroud)

问题:

  • 1.7.1.3版本中是否需要--num-run(如陷阱中所述:https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls )?在每n次运行后我们还要重新启动调度程序吗?
  • 增加max_threads值(启动多个调度程序线程)会有帮助吗?我认为defualt是2.

谢谢你的帮助.