Redis Pod 失败

Cha*_*kar 7 redis kubernetes minikube

我的 minikube 集群上运行着 redis DB 设置。我已经关闭了我的 minikube 并在 3 天后启动,我可以看到我的 redis pod 未能从 pod 日志中出现以下错误

Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>.
Run Code Online (Sandbox Code Playgroud)

下面是我通过 Helm Chart 部署的 Redis Master 的 Stateful Set yaml 文件

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem
    status:
      phase: Pending
Run Code Online (Sandbox Code Playgroud)

请让我知道您对如何解决此问题的建议。

Mar*_*ark 12

我不是 Redis 专家,但据我所知:

kubectl describe pod red3-redis-master-0
...
Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
...
Run Code Online (Sandbox Code Playgroud)

意味着您的appendonly.aof 文件已损坏,中间有无效的字节序列。

如果 redis-master 不工作我们该怎么办?:

  • 验证pvc附加到redis-master-pod
kubectl get pvc

NAME                               STATUS   VOLUME                                    
redis-data-red3-redis-master-0     Bound    pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359  
Run Code Online (Sandbox Code Playgroud)
  • 创建新的redis-client pod机智相同pvc redis-data-red3-redis-master-0
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis-client
spec:
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: redis-data-red3-redis-master-0
  containers:
    - name: redis
      image: docker.io/bitnami/redis:6.2.3-debian-10-r0
      command: ["/bin/bash"]
      args: ["-c", "sleep infinity"]
      volumeMounts:
        - mountPath: "/tmp"
          name: data
EOF
Run Code Online (Sandbox Code Playgroud)
  • 备份您的文件:
kubectl cp redis-client:/tmp .
Run Code Online (Sandbox Code Playgroud)
  • 修复appendonly.aof文件:
kubectl exec -it redis-client -- /bin/bash

cd /tmp

# make copy of appendonly.aof file:
cp appendonly.aof appendonly.aofbackup

# verify appendonly.aof file:
redis-check-aof appendonly.aof

...
0x              38: Expected prefix '*', got: '"'
AOF analyzed: size=62, ok_up_to=56, ok_up_to_line=13, diff=6
AOF is not valid. Use the --fix option to try fixing it.
...

# repair appendonly.aof file:
redis-check-aof --fix appendonly.aof

# compare files using diff:
diff appendonly.aof appendonly.aofbackup
Run Code Online (Sandbox Code Playgroud)

笔记:

根据文档

最好的办法是运行redis-check-aof 实用程序,最初不带 --fix 选项,然后了解问题,跳转到文件中给定的偏移量,并查看是否可以手动修复文件: AOF 使用与 Redis 协议相同的格式,并且手动修复非常简单。否则,可以让该实用程序为我们修复文件,但在这种情况下,从无效部分到文件末尾的所有 AOF 部分都可能被丢弃如果损坏恰好发生,则会导致大量数据丢失。在文件的初始部分

此外,正如@Miffa Young的评论中所述,您可以使用以下方法验证数据的存储位置k8s.io/minikube-hostpath provisioner

kubectl get pv 
...
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      
pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359   8Gi        RWO            Delete           Bound    default/redis-data-red3-redis-master-0     
...

kubectl describe pv pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
...
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
...
Run Code Online (Sandbox Code Playgroud)

您的 Redis 实例出现故障,因为您的 Redis 实例appendonly.aof格式错误并永久存储在此位置下。

您可以 ssh 进入您的虚拟机:

minikube -p redis ssh 
cd /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
# from there you can backup/repair/remove your files:
Run Code Online (Sandbox Code Playgroud)

另一个解决方案是使用新名称安装此图表,在这种情况下,将创建用于 redis StatefulSets 的新 pv、pvc 集。


Mif*_*ung 1

  • 我认为您的 redis 没有正常退出,因此 AOF 文件的格式错误 什么是 AOF

  • 您应该通过命令使用 initcontainer 修复 aof 文件 (./redis-check-aof --fix 。)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      initContainers:
      - name: repair-redis
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        command: ['sh', '-c', "redis-check-aof --fix  /data/appendonly.aof"]
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem

Run Code Online (Sandbox Code Playgroud)