Kubernetes有大量处于错误状态的pod似乎无法清除

xam*_*mox 6 error-handling kubernetes kubectl

我最初试图运行一个似乎陷入CrashBackoffLoop的Job.这是服务文件:

apiVersion: batch/v1
kind: Job
metadata:
  name: es-setup-indexes
  namespace: elk-test
spec:
  template:
    metadata:
      name: es-setup-indexes
    spec:
      containers:
      - name: es-setup-indexes
        image: appropriate/curl
        command: ['curl -H  "Content-Type: application/json" -XPUT http://elasticsearch.elk-test.svc.cluster.local:9200/_template/filebeat -d@/etc/filebeat/filebeat.template.json']
        volumeMounts:
        - name: configmap-volume
          mountPath: /etc/filebeat/filebeat.template.json
          subPath: filebeat.template.json
      restartPolicy: Never

      volumes:
        - name: configmap-volume
          configMap:
            name: elasticsearch-configmap-indexes
Run Code Online (Sandbox Code Playgroud)

我尝试删除该作业,但只有在运行以下命令时它才会起作用:

kubectl delete job es-setup-indexes --cascade=false
Run Code Online (Sandbox Code Playgroud)

之后我注意到跑步时:

kubectl get pods -w
Run Code Online (Sandbox Code Playgroud)

我会得到一个处于错误状态的TON的pod,我认为没办法清理它们.以下是运行get pods时输出的一小部分示例:

es-setup-indexes-zvx9c   0/1       Error     0         20h
es-setup-indexes-zw23w   0/1       Error     0         15h
es-setup-indexes-zw57h   0/1       Error     0         21h
es-setup-indexes-zw6l9   0/1       Error     0         16h
es-setup-indexes-zw7fc   0/1       Error     0         22h
es-setup-indexes-zw9bw   0/1       Error     0         12h
es-setup-indexes-zw9ck   0/1       Error     0         1d
es-setup-indexes-zwf54   0/1       Error     0         18h
es-setup-indexes-zwlmg   0/1       Error     0         16h
es-setup-indexes-zwmsm   0/1       Error     0         21h
es-setup-indexes-zwp37   0/1       Error     0         22h
es-setup-indexes-zwzln   0/1       Error     0         22h
es-setup-indexes-zx4g3   0/1       Error     0         11h
es-setup-indexes-zx4hd   0/1       Error     0         21h
es-setup-indexes-zx512   0/1       Error     0         1d
es-setup-indexes-zx638   0/1       Error     0         17h
es-setup-indexes-zx64c   0/1       Error     0         21h
es-setup-indexes-zxczt   0/1       Error     0         15h
es-setup-indexes-zxdzf   0/1       Error     0         14h
es-setup-indexes-zxf56   0/1       Error     0         1d
es-setup-indexes-zxf9r   0/1       Error     0         16h
es-setup-indexes-zxg0m   0/1       Error     0         14h
es-setup-indexes-zxg71   0/1       Error     0         1d
es-setup-indexes-zxgwz   0/1       Error     0         19h
es-setup-indexes-zxkpm   0/1       Error     0         23h
es-setup-indexes-zxkvb   0/1       Error     0         15h
es-setup-indexes-zxpgg   0/1       Error     0         20h
es-setup-indexes-zxqh3   0/1       Error     0         1d
es-setup-indexes-zxr7f   0/1       Error     0         22h
es-setup-indexes-zxxbs   0/1       Error     0         13h
es-setup-indexes-zz7xr   0/1       Error     0         12h
es-setup-indexes-zzbjq   0/1       Error     0         13h
es-setup-indexes-zzc0z   0/1       Error     0         16h
es-setup-indexes-zzdb6   0/1       Error     0         1d
es-setup-indexes-zzjh2   0/1       Error     0         21h
es-setup-indexes-zzm77   0/1       Error     0         1d
es-setup-indexes-zzqt5   0/1       Error     0         12h
es-setup-indexes-zzr79   0/1       Error     0         16h
es-setup-indexes-zzsfx   0/1       Error     0         1d
es-setup-indexes-zzx1r   0/1       Error     0         21h
es-setup-indexes-zzx6j   0/1       Error     0         1d
kibana-kq51v   1/1       Running   0         10h
Run Code Online (Sandbox Code Playgroud)

但是,如果我看看这些工作,我就不再相关了:

$ kubectl get jobs --all-namespaces                                                                              
NAMESPACE     NAME               DESIRED   SUCCESSFUL   AGE
kube-system   configure-calico   1         1            46d
Run Code Online (Sandbox Code Playgroud)

我也注意到kubectl的反应似乎很慢.我不知道pod是不是一直试图重新启动或处于某种破坏状态,但如果有人能让我知道如何排除故障会很好,因为我没有在kubernetes中遇到过这样的问题.

Kube信息:

$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Run Code Online (Sandbox Code Playgroud)

mar*_*tvz 24

在这里,您可以快速修复它:)

kubectl get pods | grep Error | cut -d' ' -f 1 | xargs kubectl delete pod
Run Code Online (Sandbox Code Playgroud)

  • 请记住,这也会删除标题中恰好有错误的任何 Pod。下面这个答案比较靠谱。 (2认同)

小智 15

kubectl delete pods --field-selector status.phase=Failed -n <your-namespace>

...清理您的命名空间中所有失败的Pod。

  • 尝试使用--field-selector = status.phase =失败 (2认同)
  • `kubectl get pods -o name -n &lt;your-namespace&gt; --field-selector status.phase=失败 | xargs kubectl delete -n &lt;您的命名空间&gt;` (2认同)

Sai*_*ish 10

我有很多 Pod 处于以下状态

  • 容器无法运行
  • 错误
  • 图像拉回关闭

由于合理的原因,这些 Pod 处于上述状态。但即使问题后来解决了,它们也没有被自动清理。

要清理,手动执行以下操作不起作用:

# Doesn't work
kubectl get pods --field-selector status.phase=Error 

# Doesn't work
kubectl get pods \
    --field-selector=status.phase=Error

# Doesn't work
kubectl get pods \
    --field-selector="status.phase=Error"

# Doesn't work
kubectl get pods \
    --field-selector="status.phase==Error"


Run Code Online (Sandbox Code Playgroud)

以下方法可以完美地保留我们想要保留的状态的 pod

# Validate list of pods.
# Please add more status that we don't want to delete
kubectl get pods \
    --field-selector="status.phase!=Succeeded,status.phase!=Running"

# Delete pods that matches the filter
kubectl delete pods \
    --field-selector="status.phase!=Succeeded,status.phase!=Running"

Run Code Online (Sandbox Code Playgroud)


Ahm*_*sny 5

我通常Error使用此命令删除所有吊舱。 kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>


xam*_*mox 1

解决方案是@johnharris85 在评论中提到的。我必须手动删除所有 Pod。为此,我运行了以下命令:

kubectl get pods -w | tee all-pods.txt
Run Code Online (Sandbox Code Playgroud)

这会转储我所有的 pod,然后仅过滤和删除我想要的内容。

kubectl delete pod $(more all-pods.txt | grep es-setup-index | awk '{print $1}')
Run Code Online (Sandbox Code Playgroud)

注意:我有大约 9292 个 pod,大约需要 1-2 小时才能将它们全部删除。