有状态集的 Kubectl rollout 重启

liv*_*ton 5 kubernetes kubectl kubernetes-statefulset

根据kubectl docskubectl rollout restart适用于部署、守护程序集和状态集。它可以按预期进行部署。但是对于 statefulsets,它只重启 2 个 pod 中的一个。

? k rollout restart statefulset alertmanager-main                       (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted

? k rollout status statefulset alertmanager-main                        (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...

? kgp -l app=alertmanager                                               (playground-fdp/monitoring)
NAME                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0   2/2     Running   0          21h
alertmanager-main-1   2/2     Running   0          20s
Run Code Online (Sandbox Code Playgroud)

如您所见,pod alertmanager-main-1 已重新启动,其年龄为 20 秒。而 statefulset alertmanager 中的另一个 pod,即 pod alertmanager-main-0 还没有重新启动,它的年龄是 21 小时。知道在更新了 statefulset 使用的某些配置映射后如何重新启动它吗?

[更新 1] 这是 statefulset 配置。如您所见,.spec.updateStrategy.rollingUpdate.partition 未设置。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
  creationTimestamp: "2019-12-02T07:17:49Z"
  generation: 4
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Alertmanager
    name: main
    uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
  resourceVersion: "521307"
  selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
  uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      alertmanager: main
      app: alertmanager
  serviceName: alertmanager-operated
  template:
    metadata:
      creationTimestamp: null
      labels:
        alertmanager: main
        app: alertmanager
    spec:
      containers:
      - args:
        - --config.file=/etc/alertmanager/config/alertmanager.yaml
        - --cluster.listen-address=[$(POD_IP)]:9094
        - --storage.path=/alertmanager
        - --data.retention=120h
        - --web.listen-address=:9093
        - --web.external-url=http://10.47.0.234
        - --web.route-prefix=/
        - --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
        - --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: 10.47.2.76:80/alm/alertmanager:v0.19.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/healthy
            port: web
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: alertmanager
        ports:
        - containerPort: 9093
          name: web
          protocol: TCP
        - containerPort: 9094
          name: mesh-tcp
          protocol: TCP
        - containerPort: 9094
          name: mesh-udp
          protocol: UDP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/ready
            port: web
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          requests:
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
        - mountPath: /alertmanager
          name: alertmanager-main-db
      - args:
        - -webhook-url=http://localhost:9093/-/reload
        - -volume-dir=/etc/alertmanager/config
        image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
        imagePullPolicy: IfNotPresent
        name: config-reloader
        resources:
          limits:
            cpu: 100m
            memory: 25Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccount: alertmanager-main
      serviceAccountName: alertmanager-main
      terminationGracePeriodSeconds: 120
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main
      - emptyDir: {}
        name: alertmanager-main-db
  updateStrategy:
    type: RollingUpdate
status:
  collisionCount: 0
  currentReplicas: 2
  currentRevision: alertmanager-main-59d7ccf598
  observedGeneration: 4
  readyReplicas: 2
  replicas: 2
  updateRevision: alertmanager-main-59d7ccf598
  updatedReplicas: 2
Run Code Online (Sandbox Code Playgroud)

Pjo*_*erS 12

您没有提供整个场景。这可能取决于Readiness ProbeUpdate Strategy.

\n\n

StatefulSet从索引重新启动 Pod 0 to n-1详细信息可以在这里找到。

\n\n

原因1*

\n\n

Statefulset有4种更新策略

\n\n
    \n
  • 删除时
  • \n
  • 滚动更新
  • \n
  • 分区
  • \n
  • 强制回滚
  • \n
\n\n

Partition更新中您可以找到以下信息:

\n\n
\n

如果指定了分区,则当更新 StatefulSet\xe2\x80\x99s\n 时,所有序数大于或等于该分区的 Pod 都将被更新 .spec.template。序数小于分区的所有 Pod 都不会被更新,并且即使删除它们,也会在以前的版本中重新创建。如果 StatefulSet\xe2\x80\x99s\n .spec.updateStrategy.rollingUpdate.partition大于其\n .spec.replicas,则对其的更新.spec.template不会\n 传播到其 Pod。在大多数情况下,您不需要使用分区,但如果您想要暂存更新、推出金丝雀或执行分阶段推出,它们会很有用。

\n
\n\n

因此,如果您在某个位置StatefulSet进行了设置,updateStrategy.rollingUpdate.partition: 1它将重新启动索引为 1 或更高的所有 pod。

\n\n

的例子partition: 3

\n\n
NAME    READY   STATUS    RESTARTS   AGE\nweb-0   1/1     Running   0          30m\nweb-1   1/1     Running   0          30m\nweb-2   1/1     Running   0          31m\nweb-3   1/1     Running   0          2m45s\nweb-4   1/1     Running   0          3m\nweb-5   1/1     Running   0          3m13s\n
Run Code Online (Sandbox Code Playgroud)\n\n

原因2

\n\n

的配置Readiness probe

\n\n

initialDelaySeconds如果和的值periodSeconds很高,则可能需要一段时间才能重新启动另一个。有关这些参数的详细信息可以在此处找到。

\n\n

在下面的示例中,pod 将等待 10 秒才能运行,并且readiness probe每 2 秒检查一次。取决于值,它可能是导致此行为的原因。

\n\n
    readinessProbe:\n      failureThreshold: 3\n      httpGet:\n        path: /\n        port: 80\n        scheme: HTTP\n      initialDelaySeconds: 10\n      periodSeconds: 2\n      successThreshold: 1\n      timeoutSeconds: 1\n
Run Code Online (Sandbox Code Playgroud)\n\n

原因3

\n\n

我看到每个 Pod 中有 2 个容器。

\n\n
NAME                  READY   STATUS    RESTARTS   AGE\nalertmanager-main-0   2/2     Running   0          21h\nalertmanager-main-1   2/2     Running   0          20s\n
Run Code Online (Sandbox Code Playgroud)\n\n

文档中所述:

\n\n
\n

Running- Pod 已绑定到节点,并且所有 Container 已创建。至少有一个容器仍在运行,或者正在启动或重新启动

\n
\n\n

最好检查一下两者是否一切正常containers(readinessProbe/livenessProbe、重新启动等)

\n


fg7*_*8nc 6

您需要将其删除。有状态集按照其序数索引被删除,首先是最高序数索引。

此外,您不需要重新启动 pod 来重新读取更新的配置映射。这是自动发生的(在一段时间后)。