Kubernetes 在节点上达到 100% CPU，但在 Pod 上却未达到 100% CPU 使用率

Question

Kubernetes 在节点上达到 100% CPU，但在 Pod 上却未达到 100% CPU 使用率

我的 Kubernetes 集群（在 1.18 上运行）每天都会遇到一个问题，其中一个节点的 CPU 利用率将超过 100%，并且 Kubernetes 将无法将外部访问者连接到我的 Pod。（基本上是网站中断）

奇怪的是，Pod 始终处于舒适的 30%（或更低！）CPU 状态。所以应用程序本身看起来还不错。

当我访问describe有问题的节点时，我看到提到超时node-problem-detector。

Events:
  Type     Reason                  Age                      From                                     Message
  ----     ------                  ---                      ----                                     -------
  Normal   NodeNotSchedulable      10m                      kubelet                                  Node nodepoo1-vmss000007 status is now: NodeNotSchedulable
  Warning  KubeletIsDown           9m44s (x63 over 5h21m)   kubelet-custom-plugin-monitor            Timeout when running plugin "/etc/node-problem-detector.d/plugin/check_kubelet.s"
  Warning  ContainerRuntimeIsDown  9m41s (x238 over 5h25m)  container-runtime-custom-plugin-monitor  Timeout when running plugin "/etc/node-problem-detector.d/plugin/check_runtime.s"

Run Code Online (Sandbox Code Playgroud)

我当前的方法是在节点池上运行三个节点，并通过在监控中断期间封锁有问题的节点并将所有 pod 移至其他节点之一来有效地照顾 Kubernetes。15 分钟后，一旦一切恢复正常，我将解除受影响节点的封锁并再次开始循环。

这个周末我特别不幸，24 小时内出现了 3 个 CPU 峰值。

我该如何解决这个问题？我似乎无法找到有关Timeout when running plugin "/etc/node-problem-detector.d/plugin/check_kubelet.s"我所看到的问题的任何信息。

Answer 1

fja*_*mes 1

您可以尝试打开ssh与节点的连接，然后使用来检查哪些进程消耗 CPU top。如果此进程在 pod 中运行并且您已crictl安装在节点上，则可以使用https://github.com/k8s-school/pid2pod检索正在运行该进程的 pod。

归档时间：	3 年，11 月前
查看次数：	4767 次
最近记录：	3 年，10 月前