如何确保kubernetes cronjob在失败时不会重启

Dou*_*oug 15 kubernetes

我有一个向客户发送电子邮件的cronjob.它偶尔因各种原因而失败.我不希望它重新启动,但它仍然可以.

我在GKE上运行Kubernetes.为了让它停止,我必须删除CronJob,然后杀死它手动创建的所有pod.

由于显而易见的原因,这很糟糕.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: 2018-06-21T14:48:46Z
  name: dailytasks
  namespace: default
  resourceVersion: "20390223"
  selfLink: [redacted]
  uid: [redacted]
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - command:
            - kubernetes/daily_tasks.sh
            env:
            - name: DB_HOST
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            envFrom:
            - secretRef:
                name: my-secrets
            image: [redacted]
            imagePullPolicy: IfNotPresent
            name: dailytasks
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: 0 14 * * *
  successfulJobsHistoryLimit: 3
  suspend: true
status:
  active:
  - apiVersion: batch
    kind: Job
    name: dailytasks-1533218400
    namespace: default
    resourceVersion: "20383182"
    uid: [redacted]
  lastScheduleTime: 2018-08-02T14:00:00Z
Run Code Online (Sandbox Code Playgroud)

Dou*_*oug 19

事实证明,你必须backoffLimit: 0与... restartPolicy: Never结合使用concurrencyPolicy: Forbid.

backoffLimit表示在将其视为失败之前尝试的次数.默认值为6.

concurrencyPolicy设置为Forbid表示它将运行0或1次,但不会更多.

restartPolicy设置为Never表示失败时不会重启.

你需要做所有这3件事,或者你的cronjob可能会运行多次.

spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      [ADD THIS -->]backoffLimit: 0
      template: 
      ... MORE STUFF ...
Run Code Online (Sandbox Code Playgroud)

  • `concurrencyPolicy` 与重试或失败的次数无关。它必须确定一个长时间运行的作业超过其下一个时间间隔是否会导致另一个作业启动。 (3认同)
  • 次要注释,`backoffLimit`是“重试”的数量,而不是“重试”的数量:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.13/#jobspec-v1-batch (2认同)