在YARN上运行时，Spark调度程序池如何工作？

Question

在YARN上运行时，Spark调度程序池如何工作？

Nic*_*mas 4 hadoop scheduling hadoop-yarn apache-spark

我混合了所有在YARN（Hadoop 2.6.0 / CDH 5.5）上部署的Spark版本（1.6、2.0、2.1）。我试图保证某个应用程序永远不会在YARN群集上耗尽资源，而不管那里可能正在运行什么。

我已经启用了shuffle服务并按照Spark文档中的说明设置了一些Fair Scheduler Pools。我为永不资源匮乏的高优先级应用程序创建了一个单独的池，并为它minShare分配了一些资源：

<?xml version="1.0"?>
<allocations>
  <pool name="default">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>0</minShare>
  </pool>
  <pool name="high_priority">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>24</minShare>
  </pool>
</allocations>

Run Code Online (Sandbox Code Playgroud)

当我在YARN群集上运行Spark应用程序时，我可以看到我配置的池被识别：

17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1

Run Code Online (Sandbox Code Playgroud)

但是，high_priority即使我将设置spark.scheduler.pool为，我也看不到我的应用程序正在使用新池spark-submit。因此，这意味着当集群与常规活动挂钩时，我的高优先级应用程序未获得所需的资源：

17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default
17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Run Code Online (Sandbox Code Playgroud)

我在这里想念什么？我和我的同事都尝试在YARN中启用抢占，但没有做任何事情。然后我们意识到YARN中有一个非常类似于Spark调度程序池的概念，称为YARN队列。。因此，现在我们不确定这两个概念是否存在某种冲突。

我们如何才能使我们的高优先级池按预期工作？Spark调度程序池和YARN队列之间是否存在某种冲突？

Answer 1

Nic*_*mas 6

spark-users列表上的某人澄清了一些原因，解释了为什么我没有达到我的期望：Spark调度程序池用于管理应用程序内的资源，而YARN队列用于管理跨应用程序的资源。我需要后者，并且错误地使用了前者。

Spark文档在Job Scheduling下进行了解释。我只是被粗心的阅读所咬住，加上在Spark技术意义上的“工作”（即Spark应用程序中的动作）和“工作”的混淆，因为我的同事，我通常用它来表示提交给集群的应用程序。

那么如何在纱线池上使用公平调度来进行火花提交呢？什么是正确的配置？ (2认同)

归档时间：	8 年，7 月前
查看次数：	2504 次
最近记录：	8 年，7 月前