注册目标消失

bkn*_*hts 3 amazon-eks

我有一个正在运行的 EKS 集群。它使用 ALB 进行入口。

当我应用服务然后进入时,其中大部分都会按预期工作。然而,一些目标群体最终没有登记目标。如果我获取服务 IP 地址kubectl describe svc my-service-name并在目标组中手动注册端点,则 Pod 可以再次访问,但这不是一个可持续的过程。

对可能发生的事情有什么想法吗?为什么 EKS 在 Pod 循环时找不到目标组?

每个服务(秘密、部署、服务和入口)都包含一组 .yaml 文件,应用如下:

deploy.sh

#!/bin/bash
set -e

kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
Run Code Online (Sandbox Code Playgroud)

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: "site-bob"
  namespace: "next-sites"
spec:
  ports:
    - port: 80
      targetPort: 3000
      protocol: TCP
  type: NodePort
  selector:
    app: "site-bob"
Run Code Online (Sandbox Code Playgroud)

ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "site-bob"
  namespace: "next-sites"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/ip-address-type: ipv4
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
    alb.ingress.kubernetes.io/group.name: eks-ingress-1
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-path: /
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
    alb.ingress.kubernetes.io/success-codes: 200,201
    alb.ingress.kubernetes.io/healthy-threshold-count: '2'
    alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
    alb.ingress.kubernetes.io/actions.ssl-redirect: >
      {
        "type": "redirect", 
        "redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
      }

    
    alb.ingress.kubernetes.io/actions.svc-host: >
      {
        "type":"forward",
        "forwardConfig":{
          "targetGroups":[
            {
              "serviceName":"site-bob",
              "servicePort": 80,"weight":20}
          ],
          "targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
        }
      }
  labels:
    app: site-bob
spec:
  rules:
    - host: "staging-bob.imgeinc.net"
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ssl-redirect
                port: 
                  name: use-annotation
          - backend:
              service:
                name: svc-host
                port:
                  name: use-annotation
            pathType: ImplementationSpecific
Run Code Online (Sandbox Code Playgroud)

bkn*_*hts 9

我的配置中添加了一些内容,将两个安全组标记为由集群拥有。当我检查负载平衡器控制器日志时:

kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
Run Code Online (Sandbox Code Playgroud)

我看到很多行,比如:

{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
Run Code Online (Sandbox Code Playgroud)

sg-07b026b037dd4d6a4描述:EKS 创建了应用于 ENI 的安全组,该 ENI 附加到 EKS 控制平面主节点以及任何托管工作负载。

sg-04b2754f1c85ac8b9具有描述:集群中所有节点的安全组。

我删除了标签:

{
    Key: 'kubernetes.io/cluster/_cluster name_', 
    value:'owned'
}
Run Code Online (Sandbox Code Playgroud)

sg-04b2754f1c85ac8b9

目标群体开始填充,现在一切正常。这两个组都是由 Terraform 创建并标记的。我怀疑我的工作组配置已关闭。