集群自动缩放器不会触发 Daemonset 部署的扩展

Question

集群自动缩放器不会触发 Daemonset 部署的扩展

我使用在 Kubernetes 中部署的Datadog Helm 图表部署了 Datadog 代理Daemonset。然而，当检查 Daemonset 的状态时，我发现它没有创建所有 Pod：

NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
datadog-agent-datadog   5         2         2       2            2           <none>          1h

Run Code Online (Sandbox Code Playgroud)

当描述Daemonset以找出问题所在时，我发现它没有足够的资源：

Events:
  Type     Reason            Age                From                  Message
  ----     ------            ----               ----                  -------
  Warning  FailedPlacement   42s (x6 over 42s)  daemonset-controller  failed to place pod on "ip-10-0-1-124.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
  Warning  FailedPlacement   42s (x6 over 42s)  daemonset-controller  failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
  Warning  FailedPlacement   42s (x5 over 42s)  daemonset-controller  failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
  Warning  FailedPlacement   42s (x7 over 42s)  daemonset-controller  failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
  Normal   SuccessfulCreate  42s                daemonset-controller  Created pod: datadog-agent-7b2kp

Run Code Online (Sandbox Code Playgroud)

但是，我在集群中安装了集群自动缩放器Pod并进行了正确配置（它确实会在没有足够资源来安排的常规部署上触发），但它似乎不会在以下位置触发Daemonset：

I0424 14:14:48.545689       1 static_autoscaler.go:273] No schedulable pods
I0424 14:14:48.545700       1 static_autoscaler.go:280] No unschedulable pods

Run Code Online (Sandbox Code Playgroud)

AutoScalingGroup 还剩下足够的节点：

我是否遗漏了集群自动缩放器配置中的某些内容？我该怎么做才能确保它Daemonset也在资源上触发？

编辑：守护进程的描述

Name:           datadog-agent
Selector:       app=datadog-agent
Node-Selector:  <none>
Labels:         app=datadog-agent
                chart=datadog-1.27.2
                heritage=Tiller
                release=datadog-agent
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 5
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status:  2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=datadog-agent
  Annotations:      checksum/autoconf-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
                    checksum/checksd-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
                    checksum/confd-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
  Service Account:  datadog-agent
  Containers:
   datadog:
    Image:      datadog/agent:6.10.1
    Port:       8125/UDP
    Host Port:  0/UDP
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:     200m
      memory:  256Mi
    Liveness:  http-get http://:5555/health delay=15s timeout=5s period=15s #success=1 #failure=6
    Environment:
      DD_API_KEY:                  <set to the key 'api-key' in secret 'datadog-secret'>  Optional: false
      DD_LOG_LEVEL:                INFO
      KUBERNETES:                  yes
      DD_KUBERNETES_KUBELET_HOST:   (v1:status.hostIP)
      DD_HEALTH_PORT:              5555
    Mounts:
      /host/proc from procdir (ro)
      /host/sys/fs/cgroup from cgroups (ro)
      /var/run/docker.sock from runtimesocket (ro)
      /var/run/s6 from s6-run (rw)
  Volumes:
   runtimesocket:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  
   procdir:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  
   cgroups:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
   s6-run:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
Events:
  Type     Reason            Age                 From                  Message
  ----     ------            ----                ----                  -------
  Warning  FailedPlacement   33m (x6 over 33m)   daemonset-controller  failed to place pod on "ip-10-0-2-144.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
  Normal   SuccessfulCreate  33m                 daemonset-controller  Created pod: datadog-agent-7b2kp
  Warning  FailedPlacement   16m (x25 over 33m)  daemonset-controller  failed to place pod on "ip-10-0-1-124.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
  Warning  FailedPlacement   16m (x25 over 33m)  daemonset-controller  failed to place pod on "ip-10-0-2-174.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
  Warning  FailedPlacement   16m (x25 over 33m)  daemonset-controller  failed to place pod on "ip-10-0-3-250.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ger*_*erg 5

您可以添加priorityClassName 以指向DaemonSet 中的高优先级PriorityClass。然后，Kubernetes 将删除其他 pod 以运行 DaemonSet 的 pod。如果这导致 pod 无法调度，cluster-autoscaler 应添加一个节点来调度它们。

请参阅文档（大多数示例都基于该文档）（对于某些 1.14 之前的版本，apiVersion 可能是 beta (1.11-1.13) 或 alpha 版本 (1.8 - 1.10))

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "High priority class for essential pods"

Run Code Online (Sandbox Code Playgroud)

将其应用到您的工作负载中

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: datadog-agent
spec:
  template:
    metadata:
      labels:
        app: datadog-agent
      name: datadog-agent
    spec:
      priorityClassName: high-priority
      serviceAccountName: datadog-agent
      containers:
      - image: datadog/agent:latest
############ Rest of template goes here

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	2439 次
最近记录：	6 年，3 月前