我的kubernetes pods与"CrashLoopBackOff"一起崩溃,但我找不到任何日志

Question

我的kubernetes pods与"CrashLoopBackOff"一起崩溃,但我找不到任何日志

这就是我不断得到的:

[root@centos-master ~]# kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
nfs-server-h6nw8   1/1       Running            0          1h
nfs-web-07rxz      0/1       CrashLoopBackOff   8          16m
nfs-web-fdr9h      0/1       CrashLoopBackOff   8          16m

Run Code Online (Sandbox Code Playgroud)

下面是"describe pods" kubectl describe pods的输出

Events:
  FirstSeen LastSeen    Count   From                SubobjectPath       Type        Reason      Message
  --------- --------    -----   ----                -------------       --------    ------      -------
  16m       16m     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned nfs-web-fdr9h to centos-minion-2
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Created     Created container with docker id 495fcbb06836
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Started     Started container with docker id 495fcbb06836
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Started     Started container with docker id d56f34ae4e8f
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Created     Created container with docker id d56f34ae4e8f
  16m       16m     2   {kubelet centos-minion-2}               Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"

Run Code Online (Sandbox Code Playgroud)

我有两个pod:nfs-web-07rxz,nfs-web-fdr9h,但如果我执行"kubectl logs nfs-web-07rxz"或"-p"选项,我在两个pod中都看不到任何日志.

[root@centos-master ~]# kubectl logs nfs-web-07rxz -p
[root@centos-master ~]# kubectl logs nfs-web-07rxz

Run Code Online (Sandbox Code Playgroud)

这是我的replicationController yaml文件: replicationController yaml文件

apiVersion: v1 kind: ReplicationController metadata:   name: nfs-web spec:   replicas: 2   selector:
    role: web-frontend   template:
    metadata:
      labels:
        role: web-frontend
    spec:
      containers:
      - name: web
        image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
        ports:
          - name: web
            containerPort: 80
        securityContext:
          privileged: true

Run Code Online (Sandbox Code Playgroud)

我的Docker镜像是从这个简单的docker文件制作的:

FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common

Run Code Online (Sandbox Code Playgroud)

我在CentOs-1611,kube版本上运行我的kubernetes集群:

[root@centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Run Code Online (Sandbox Code Playgroud)

如果我通过"docker run"运行docker镜像,我能够毫无问题地运行图像,只有通过kubernetes我才能崩溃.

有人可以帮助我,如何在不看日志的情况下进行调试？

Answer 1

Ste*_*oka 51

正如@Sukumar评论的那样,您需要让Dockerfile 运行Command或让ReplicationController指定命令.

pod正在崩溃,因为它启动然后立即退出,因此Kubernetes重新启动并且循环继续.

如果我们添加了适当的 Dockerfile 但仍然出现错误，可能是什么原因？即使我正确添加了命令，我也会遇到同样的错误。当我在不使用 kubernetes deployment 的情况下测试独立的 docker 镜像时，我得到了输出。所以Dockerfile没有问题。它与部署有关吗？。在这里，我添加了我面临的整个问题，/sf/ask/3920094671/?noredirect=1#comment98718941_56001352。你能看看吗？ (2认同)
有一个非常好的博客，深入探讨了 CrashLoopBackoff 的含义以及可能发生这种情况的各种情况：https://managementkube.com/kubernetes/pod/failure/crashloopbackoff/k8sbot/troubleshooting/2019/02/12/ pod-failure-crashloopbackoff.html (2认同)

Answer 2

use*_*364 25

kubectl -n <namespace-name> describe pod <pod name>

kubectl -n <namespace-name> logs -p  <pod name>

Run Code Online (Sandbox Code Playgroud)

虽然这个命令可能(或可能不会解决)问题,但一个好的答案应该总是包含解释问题是如何解决的. (26认同)
第一个命令 kubectl -n <namespace-name>describe pod <pod name>` 是描述你的 pod，它可以用来查看 pod 创建和运行 pod 时出现的任何错误，比如资源不足等。第二个命令命令 kubectl -n <namespace-name> logs -p <pod name>` 查看 Pod 中运行的应用程序的日志。 (4认同)

Answer 3

Mar*_*les 16

如果您的应用程序启动速度较慢，则可能与就绪/活跃度探测器的初始值有关。我通过将的值initialDelaySeconds增加到 120s解决了我的问题，因为我的SpringBoot应用程序要处理大量初始化。文档没有提到默认的 0 ( https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core )

service:
  livenessProbe:
    httpGet:
      path: /health/local
      scheme: HTTP
      port: 8888
    initialDelaySeconds: 120
    periodSeconds: 5
    timeoutSeconds: 5
    failureThreshold: 10
  readinessProbe:
    httpGet:
      path: /admin/health
      scheme: HTTP
      port: 8642
    initialDelaySeconds: 150
    periodSeconds: 5
    timeoutSeconds: 5
    failureThreshold: 10

Run Code Online (Sandbox Code Playgroud)

关于这些值的一个很好的解释是什么是 initialDelaySeconds 的默认值。

运行状况或就绪检查算法的工作原理如下：

等待 initialDelaySeconds

timeoutSeconds如果继续成功的次数大于successThreshold返回成功，则执行检查并等待超时

如果持续失败的次数大于failureThreshold返回失败的次数，否则等待periodSeconds并开始新的检查

就我而言，我的应用程序现在可以以一种非常清晰的方式进行引导，因此我知道我不会定期发生崩溃循环回退，因为有时它会受到这些速率的限制。

Answer 4

Jul*_*bal 11

从这个页面，容器在正确运行所有内容后死亡，但由于所有命令结束而崩溃。要么让您的服务在前台运行，要么创建一个保持活动的脚本。通过这样做，Kubernetes 将显示您的应用程序正在运行。我们要注意，在Docker环境中，是不会遇到这个问题的。只有 Kubernetes 需要一个正在运行的应用程序。

更新（一个例子）：

以下是启动Netshoot容器时避免CrashLoopBackOff的方法：

kubectl run netshoot --image nicolaka/netshoot -- sleep infinity

Run Code Online (Sandbox Code Playgroud)

Answer 5

mat*_*yas 10

我的吊舱不断崩溃，我无法找到原因。幸运的是，kubernetes 有一个空间可以保存在我的 pod 崩溃之前发生的所有事件。
(#List Events 按时间戳排序)

要查看这些事件，请运行以下命令：

kubectl get events --sort-by=.metadata.creationTimestamp

Run Code Online (Sandbox Code Playgroud)

--namespace mynamespace如果需要，请确保向命令添加参数

命令输出中显示的事件显示了我的 pod 不断崩溃的原因。

Answer 6

hma*_*ias 9

我需要让Pod继续运行以进行后续的kubectl exec调用，并且如以上注释所指出，我的pod已被我的k8s集群杀死，因为它已经完成了其所有任务的运行。我设法用一个不会自动停止的命令来踢豆荚，从而使豆荚保持运行：

kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null

Run Code Online (Sandbox Code Playgroud)

``tailf''对我不起作用，但是（在Alpine linux上）起作用了：-命令/ usr / bin / tail--f / dev / null (4认同)

Answer 7

ami*_*omp 6

我观察到同样的问题，并在 yaml 文件中添加了命令和参数块。我正在复制 yaml 文件的示例以供参考

 apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: ubuntu
      name: ubuntu
      namespace: default
    spec:
      containers:
      - image: gcr.io/ow/hellokubernetes/ubuntu
        imagePullPolicy: Never
        name: ubuntu
        resources:
          requests:
            cpu: 100m
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo hello; sleep 10;done"]
      dnsPolicy: ClusterFirst
      enableServiceLinks: true

Run Code Online (Sandbox Code Playgroud)

Answer 8

小智 5

在您的 yaml 文件中，添加命令和 args 行：

...
containers:
      - name: api
        image: localhost:5000/image-name 
        command: [ "sleep" ]
        args: [ "infinity" ]
...

Run Code Online (Sandbox Code Playgroud)

为我工作。

Answer 9

Win*_*ing 5

正如上面的帖子中提到的，容器在创建时退出。

如果您想在不使用 yaml 文件的情况下对此进行测试，可以将 sleep 命令传递给该kubectl create deployment语句。双连字符--表示命令，相当于command:Pod 或 Deployment yaml 文件中的命令。

以下命令使用为 debian 创建部署sleep 1234，因此它不会立即退出。

kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"

Run Code Online (Sandbox Code Playgroud)

然后，您可以创建服务等，或者，为了测试容器，您可以kubectl exec -it <pod-name> -- sh（或-- bash）进入刚刚创建的容器来测试它。

归档时间：	8 年，11 月前
查看次数：	89779 次
最近记录：	6 年，1 月前