Google Kubernetes Engine (GKE) 上的水平 Pod 自动缩放器 (HPA) 通过 Stackdriver 外部指标使用 Ingress LoadBalancer 的后端延迟

Bjo*_*son 5 google-cloud-platform kubernetes google-kubernetes-engine

我正在尝试使用 Ingress LoadBalancer 中的外部指标在 Google Kubernetes Engine (GKE) 上配置 Horizo​​ntal Pod Autoscaler (HPA),并根据以下指令进行配置

https://cloud.google.com/kubernetes-engine/docs/tutorials/external-metrics-autoscalinghttps://blog.doit-intl.com/autoscaling-k8s-hpa-with-google-http-s-负载均衡器-rps-stackdriver-metric-92db0a28e1ea

与 HPA 类似

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-api
  namespace: production
spec:
  minReplicas: 1
  maxReplicas: 20
  metrics:
  - external:
      metricName: loadbalancing.googleapis.com|https|request_count
      metricSelector:
        matchLabels:
          resource.labels.forwarding_rule_name: k8s-fws-production-lb-my-api--63e2a8ddaae70
      targetAverageValue: "1"
    type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api

Run Code Online (Sandbox Code Playgroud)

当请求计数增加时,自动缩放器确实会启动 - 但对服务施加重负载(例如每秒 100 个并发请求)不会将外部指标增加到request_count超过 6 RPS,而backend_latenciesStackdriver 中观察到的指标确实会显着增加;所以我想通过添加到 HPA 配置来利用该指标,如下所示:

  - external:
      metricName: loadbalancing.googleapis.com|https|backend_latencies
      metricSelector:
        matchLabels:
          resource.labels.forwarding_rule_name: k8s-fws-production-lb-my-api--63e2a8ddaae70
      targetValue: "3000"
    type: External
Run Code Online (Sandbox Code Playgroud)

但这会导致错误:

...unable to fetch metrics from external metrics API: googleapi: Error 400: Field aggregation.perSeriesAligner had an invalid value of "ALIGN_RATE": The aligner cannot be applied to metrics with kind DELTA and value type DISTRIBUTION., badRequest

可以用命令观察

$ kubectl describe hpa -n production
Run Code Online (Sandbox Code Playgroud)

或通过访问

http://localhost:8080/apis/external.metrics.k8s.io/v1beta1/namespaces/default/loadbalancing.googleapis.com%7Chttps%7Cbackend_latcies

设置代理后

$ kubectl proxy --port=8080
Run Code Online (Sandbox Code Playgroud)

GKE 的 HPA 配置中https/backend_latencies是否支持作为外部 Stackdriver 指标?https/total_latencies

car*_*ray 2

也许有人会发现这很有帮助,尽管这个问题已经很老了。

我的工作配置如下所示:

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 95
  - type: External
    external:
      metric:
       name: loadbalancing.googleapis.com|https|backend_latencies
       selector:
         matchLabels:
           resource.labels.backend_name: frontend
           metric.labels.proxy_continent: Europe
           reducer: REDUCE_PERCENTILE_95
      target:
        type: Value
        value: "79.5"
Run Code Online (Sandbox Code Playgroud)

type: Value使用它是因为这是不将度量值除以副本数的唯一方法。

reducer: REDUCE_PERCENTILE_95过去仅适用于分布的单个值()。

另外,我将custom-metrics-stackdriver-adapter部署编辑为如下所示:

  - image: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.12.2-gke.0
    imagePullPolicy: Always
    name: pod-custom-metrics-stackdriver-adapter
    command:
    - /adapter
    - --use-new-resource-model=true
    - --fallback-for-container-metrics=true
    - --enable-distribution-support=true
Run Code Online (Sandbox Code Playgroud)

关键是这个 key enable-distribution-support=true,它可以使用分布类型的指标。