AKS 升级到 v1.22 后 Nginx-ingress-controller 无法启动

Ryc*_*chu 9 nginx kubernetes kubernetes-helm kubernetes-ingress azure-aks

我们将 kubernetes 集群从 v1.21 升级到 v1.22。执行此操作后,我们发现 nginx-ingress-controller 部署\xe2\x80\x99s pod 无法启动,并显示以下错误消息:\npkg/mod/k8s.io/client-go@v0.18.5/tools/cache/reflector.go:125: Failed to list *v1beta1.Ingress: the server could not find the requested resource

\n

我们发现这个问题在这里跟踪:https://github.com/bitnami/charts/issues/7264

\n

因为 azure 不允许将集群降级回 1.21,您能帮我们修复 nginx-ingress-controller 部署吗?由于我们不太熟悉helm.

\n

这是我们的部署当前 yaml:

\n
kind: Deployment\napiVersion: apps/v1\nmetadata:\n  name: nginx-ingress-controller\n  namespace: ingress\n  uid: 575c7699-1fd5-413e-a81d-b183f8822324\n  resourceVersion: \'166482672\'\n  generation: 16\n  creationTimestamp: \'2020-10-10T10:20:07Z\'\n  labels:\n    app: nginx-ingress\n    app.kubernetes.io/component: controller\n    app.kubernetes.io/managed-by: Helm\n    chart: nginx-ingress-1.41.1\n    heritage: Helm\n    release: nginx-ingress\n  annotations:\n    deployment.kubernetes.io/revision: \'2\'\n    meta.helm.sh/release-name: nginx-ingress\n    meta.helm.sh/release-namespace: ingress\n  managedFields:\n    - manager: kube-controller-manager\n      operation: Update\n      apiVersion: apps/v1\n      fieldsType: FieldsV1\n      fieldsV1:\n        f:spec:\n          f:replicas: {}\n      subresource: scale\n    - manager: Go-http-client\n      operation: Update\n      apiVersion: apps/v1\n      time: \'2020-10-10T10:20:07Z\'\n      fieldsType: FieldsV1\n      fieldsV1:\n        f:metadata:\n          f:annotations:\n            .: {}\n            f:meta.helm.sh/release-name: {}\n            f:meta.helm.sh/release-namespace: {}\n          f:labels:\n            .: {}\n            f:app: {}\n            f:app.kubernetes.io/component: {}\n            f:app.kubernetes.io/managed-by: {}\n            f:chart: {}\n            f:heritage: {}\n            f:release: {}\n        f:spec:\n          f:progressDeadlineSeconds: {}\n          f:revisionHistoryLimit: {}\n          f:selector: {}\n          f:strategy:\n            f:rollingUpdate:\n              .: {}\n              f:maxSurge: {}\n              f:maxUnavailable: {}\n            f:type: {}\n          f:template:\n            f:metadata:\n              f:labels:\n                .: {}\n                f:app: {}\n                f:app.kubernetes.io/component: {}\n                f:component: {}\n                f:release: {}\n            f:spec:\n              f:containers:\n                k:{"name":"nginx-ingress-controller"}:\n                  .: {}\n                  f:args: {}\n                  f:env:\n                    .: {}\n                    k:{"name":"POD_NAME"}:\n                      .: {}\n                      f:name: {}\n                      f:valueFrom:\n                        .: {}\n                        f:fieldRef: {}\n                    k:{"name":"POD_NAMESPACE"}:\n                      .: {}\n                      f:name: {}\n                      f:valueFrom:\n                        .: {}\n                        f:fieldRef: {}\n                  f:image: {}\n                  f:imagePullPolicy: {}\n                  f:livenessProbe:\n                    .: {}\n                    f:failureThreshold: {}\n                    f:httpGet:\n                      .: {}\n                      f:path: {}\n                      f:port: {}\n                      f:scheme: {}\n                    f:initialDelaySeconds: {}\n                    f:periodSeconds: {}\n                    f:successThreshold: {}\n                    f:timeoutSeconds: {}\n                  f:name: {}\n                  f:ports:\n                    .: {}\n                    k:{"containerPort":80,"protocol":"TCP"}:\n                      .: {}\n                      f:containerPort: {}\n                      f:name: {}\n                      f:protocol: {}\n                    k:{"containerPort":443,"protocol":"TCP"}:\n                      .: {}\n                      f:containerPort: {}\n                      f:name: {}\n                      f:protocol: {}\n                  f:readinessProbe:\n                    .: {}\n                    f:failureThreshold: {}\n                    f:httpGet:\n                      .: {}\n                      f:path: {}\n                      f:port: {}\n                      f:scheme: {}\n                    f:initialDelaySeconds: {}\n                    f:periodSeconds: {}\n                    f:successThreshold: {}\n                    f:timeoutSeconds: {}\n                  f:resources:\n                    .: {}\n                    f:limits: {}\n                    f:requests: {}\n                  f:securityContext:\n                    .: {}\n                    f:allowPrivilegeEscalation: {}\n                    f:capabilities:\n                      .: {}\n                      f:add: {}\n                      f:drop: {}\n                    f:runAsUser: {}\n                  f:terminationMessagePath: {}\n                  f:terminationMessagePolicy: {}\n              f:dnsPolicy: {}\n              f:restartPolicy: {}\n              f:schedulerName: {}\n              f:securityContext: {}\n              f:serviceAccount: {}\n              f:serviceAccountName: {}\n              f:terminationGracePeriodSeconds: {}\n    - manager: kube-controller-manager\n      operation: Update\n      apiVersion: apps/v1\n      time: \'2022-01-24T01:23:22Z\'\n      fieldsType: FieldsV1\n      fieldsV1:\n        f:status:\n          f:conditions:\n            .: {}\n            k:{"type":"Available"}:\n              .: {}\n              f:type: {}\n            k:{"type":"Progressing"}:\n              .: {}\n              f:type: {}\n    - manager: Mozilla\n      operation: Update\n      apiVersion: apps/v1\n      time: \'2022-01-28T23:18:41Z\'\n      fieldsType: FieldsV1\n      fieldsV1:\n        f:spec:\n          f:template:\n            f:spec:\n              f:containers:\n                k:{"name":"nginx-ingress-controller"}:\n                  f:resources:\n                    f:limits:\n                      f:cpu: {}\n                      f:memory: {}\n                    f:requests:\n                      f:cpu: {}\n                      f:memory: {}\n    - manager: kube-controller-manager\n      operation: Update\n      apiVersion: apps/v1\n      time: \'2022-01-28T23:29:49Z\'\n      fieldsType: FieldsV1\n      fieldsV1:\n        f:metadata:\n          f:annotations:\n            f:deployment.kubernetes.io/revision: {}\n        f:status:\n          f:conditions:\n            k:{"type":"Available"}:\n              f:lastTransitionTime: {}\n              f:lastUpdateTime: {}\n              f:message: {}\n              f:reason: {}\n              f:status: {}\n            k:{"type":"Progressing"}:\n              f:lastTransitionTime: {}\n              f:lastUpdateTime: {}\n              f:message: {}\n              f:reason: {}\n              f:status: {}\n          f:observedGeneration: {}\n          f:replicas: {}\n          f:unavailableReplicas: {}\n          f:updatedReplicas: {}\n      subresource: status\nspec:\n  replicas: 2\n  selector:\n    matchLabels:\n      app: nginx-ingress\n      app.kubernetes.io/component: controller\n      release: nginx-ingress\n  template:\n    metadata:\n      creationTimestamp: null\n      labels:\n        app: nginx-ingress\n        app.kubernetes.io/component: controller\n        component: controller\n        release: nginx-ingress\n    spec:\n      containers:\n        - name: nginx-ingress-controller\n          image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1\n          args:\n            - /nginx-ingress-controller\n            - \'--default-backend-service=ingress/nginx-ingress-default-backend\'\n            - \'--election-id=ingress-controller-leader\'\n            - \'--ingress-class=nginx\'\n            - \'--configmap=ingress/nginx-ingress-controller\'\n          ports:\n            - name: http\n              containerPort: 80\n              protocol: TCP\n            - name: https\n              containerPort: 443\n              protocol: TCP\n          env:\n            - name: POD_NAME\n              valueFrom:\n                fieldRef:\n                  apiVersion: v1\n                  fieldPath: metadata.name\n            - name: POD_NAMESPACE\n              valueFrom:\n                fieldRef:\n                  apiVersion: v1\n                  fieldPath: metadata.namespace\n          resources:\n            limits:\n              cpu: 300m\n              memory: 512Mi\n            requests:\n              cpu: 200m\n              memory: 256Mi\n          livenessProbe:\n            httpGet:\n              path: /healthz\n              port: 10254\n              scheme: HTTP\n            initialDelaySeconds: 10\n            timeoutSeconds: 1\n            periodSeconds: 10\n            successThreshold: 1\n            failureThreshold: 3\n          readinessProbe:\n            httpGet:\n              path: /healthz\n              port: 10254\n              scheme: HTTP\n            initialDelaySeconds: 10\n            timeoutSeconds: 1\n            periodSeconds: 10\n            successThreshold: 1\n            failureThreshold: 3\n          terminationMessagePath: /dev/termination-log\n          terminationMessagePolicy: File\n          imagePullPolicy: IfNotPresent\n          securityContext:\n            capabilities:\n              add:\n                - NET_BIND_SERVICE\n              drop:\n                - ALL\n            runAsUser: 101\n            allowPrivilegeEscalation: true\n      restartPolicy: Always\n      terminationGracePeriodSeconds: 60\n      dnsPolicy: ClusterFirst\n      serviceAccountName: nginx-ingress\n      serviceAccount: nginx-ingress\n      securityContext: {}\n      schedulerName: default-scheduler\n  strategy:\n    type: RollingUpdate\n    rollingUpdate:\n      maxUnavailable: 25%\n      maxSurge: 25%\n  revisionHistoryLimit: 10\n  progressDeadlineSeconds: 600\nstatus:\n  observedGeneration: 16\n  replicas: 3\n  updatedReplicas: 2\n  unavailableReplicas: 3\n  conditions:\n    - type: Available\n      status: \'False\'\n      lastUpdateTime: \'2022-01-28T22:58:07Z\'\n      lastTransitionTime: \'2022-01-28T22:58:07Z\'\n      reason: MinimumReplicasUnavailable\n      message: Deployment does not have minimum availability.\n    - type: Progressing\n      status: \'False\'\n      lastUpdateTime: \'2022-01-28T23:29:49Z\'\n      lastTransitionTime: \'2022-01-28T23:29:49Z\'\n      reason: ProgressDeadlineExceeded\n      message: >-\n        ReplicaSet "nginx-ingress-controller-59d9f94677" has timed out\n        progressing.\n
Run Code Online (Sandbox Code Playgroud)\n

Ryc*_*chu 15

@Philip Welz 的答案当然是正确的。v1beta1由于Kubernetes v1.22 中删除了 Ingress API 版本,因此需要升级 Ingress 控制器。但这并不是我们面临的唯一问题,因此我决定制作一个“非常非常简短”的指南,说明我们如何最终获得一个健康运行的集群(5 天后),这样它可能会帮助其他人摆脱困境。

1.升级YAML文件中的nginx-ingress-controller版本。

这里我们只是将 yaml 文件中的版本更改为:

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
Run Code Online (Sandbox Code Playgroud)

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v1.1.1
Run Code Online (Sandbox Code Playgroud)

此操作后,生成了 v1.1.1 中的新 pod。它开始得很好并且运行得很健康。不幸的是,这并没有让我们的微服务重新上线。现在我知道这可能是因为必须对现有 ingresses yaml 文件进行一些更改,以使它们与新版本的 ingress 控制器兼容。因此,现在直接进入步骤 2(下面的两个标题)。

暂时不要执行此步骤,仅当步骤 2 失败时才执行:重新安装 nginx-ingress-controller

我们决定,在这种情况下,我们将按照 Microsoft 的官方文档从头开始重新安装控制器:https://learn.microsoft.com/en-us/azure/aks/ingress-basic ?tabs=azure-cli 。请注意,这可能会更改入口控制器的外部 IP 地址。在我们的例子中,最简单的方法是删除整个ingress命名空间:

kubectl delete namespace ingress
Run Code Online (Sandbox Code Playgroud)

不幸的是,这并没有删除入口类,因此需要额外的:

kubectl delete ingressclass nginx --all-namespaces
Run Code Online (Sandbox Code Playgroud)

然后安装新的控制器:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --create-namespace --namespace ingress 
Run Code Online (Sandbox Code Playgroud)

如果您重新安装了 nginx-ingress-controller 或在步骤 1 中升级后 IP 地址发生了更改:更新您的网络安全组、负载均衡器和域 DNS

您的 AKS 资源组中应该有一个类型为 的资源Network security group。它包含入站和出站安全规则(我知道它充当防火墙)。应该有一个由 Kubernetes 自动管理的默认网络安全组,并且 IP 地址应该在那里自动刷新。

不幸的是,我们还有另外一个定制的。我们必须在那里手动更新规则。

在同一个资源组中应该有一个Load balancer类型的资源。在Frontend IP configuration选项卡中仔细检查 IP 地址是否反映了您的新 IP 地址。作为奖励,您可以在Backend pools选项卡中仔细检查那里的地址是否与您的内部节点 IP 匹配。

最后不要忘记调整您的域 DNS 记录。

2. 升级您的 ingress yaml 配置文件以匹配语法更改

我们花了一些时间来确定工作模板,但实际上从上面提到的 Microsoft 教程安装 helloworld 应用程序对我们帮助很大。我们从这里开始:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: hello-world-ingress
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/ssl-redirect: 'false'
    nginx.ingress.kubernetes.io/use-regex: 'true'
  rules:
    - http:
        paths:
          - path: /hello-world-one(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: aks-helloworld-one
                port:
                  number: 80
Run Code Online (Sandbox Code Playgroud)

在逐步引入更改之后,我们终于达到了下面的效果。但我很确定问题是我们错过了该nginx.ingress.kubernetes.io/use-regex: 'true'条目:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: example-api
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Forwarded-By: example-api";
    nginx.ingress.kubernetes.io/rewrite-target: /example-api
    nginx.ingress.kubernetes.io/ssl-redirect: 'true'
    nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
  tls:
    - hosts:
        - services.example.com
      secretName: tls-secret
  rules:
    - host: services.example.com
      http:
        paths:
          - path: /example-api
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-api
                port:
                  number: 80
Run Code Online (Sandbox Code Playgroud)

以防万一有人出于测试目的而想要安装 helloworld 应用程序,然后 yaml 如下所示:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aks-helloworld-one  
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-helloworld-one
  template:
    metadata:
      labels:
        app: aks-helloworld-one
    spec:
      containers:
      - name: aks-helloworld-one
        image: mcr.microsoft.com/azuredocs/aks-helloworld:v1
        ports:
        - containerPort: 80
        env:
        - name: TITLE
          value: "Welcome to Azure Kubernetes Service (AKS)"
---
apiVersion: v1
kind: Service
metadata:
  name: aks-helloworld-one  
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: aks-helloworld-one
Run Code Online (Sandbox Code Playgroud)

3.处理其他崩溃的应用程序...

我们集群中另一个崩溃的应用程序是cert-manager. 这是 1.0.1 版本,所以首先我们将其升级到 1.1.1 版本:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --namespace cert-manager --version 1.1 cert-manager jetstack/cert-manager
Run Code Online (Sandbox Code Playgroud)

这就创造了一个全新的健康豆荚。我们很高兴并决定保留 v1.1,因为我们有点害怕升级到更高版本时必须采取的额外措施(请查看本页底部https://cert-manager.io/docs/安装/升级/)。

现在集群终于被修复了。这是正确的?

4. ...但一定要检查兼容性图表!

好吧..现在我们知道 cert-manager 仅从 1.5 版本开始与 Kubernetes v1.22 兼容。我们非常不幸,就在那天晚上,我们的 SSL 证书已经过了 30 天的期限,因此证书经理决定续订证书!操作失败并且证书管理器崩溃。Kubernetes 后备到“Kubernetes 假证书”。由于证书无效,浏览器杀死了流量,网页再次瘫痪。修复方法是升级到 1.5 并升级 CRD:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.5.4/cert-manager.crds.yaml
helm upgrade --namespace cert-manager --version 1.5 cert-manager jetstack/cert-manager
Run Code Online (Sandbox Code Playgroud)

之后,新的 cert-manager 实例成功刷新了我们的证书。集群再次保存。

如果您需要强制续订,可以查看此问题: https: //github.com/jetstack/cert-manager/issues/2641

@ajcann 建议renewBefore向证书添加属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: replace
  path: /spec/renewBefore
  value: 1440h
' --type=json
Run Code Online (Sandbox Code Playgroud)

然后等待证书续订,然后删除该属性:

kubectl get certs --no-headers=true | awk '{print $1}' | xargs -n 1 kubectl patch certificate --patch '
- op: remove
  path: /spec/renewBefore
' --type=json
Run Code Online (Sandbox Code Playgroud)


Phi*_*elz 12

仅 NGINX Ingress Controller 1.0.0 及更高版本支持 Kubernetes 1.22 = https://github.com/kubernetes/ingress-nginx#supported-versions-table

您需要将nginx-ingress-controllerBitnami Helm Chart 升级到 9.0.0 版本Chart.yaml。然后运行一个helm upgrade nginx-ingress-controller bitnami/nginx-ingress-controller.

您还应该定期更新您的入口控制器,因为 v0.34.1 版本非常旧,入口通常是从外部到集群的唯一入口。