Kubernetes指标无法获取Pod /节点指标

Ben*_*n D 5 kubernetes

我已经在kubernetes v1.11.2上安装了metrics-server。

我正在使用3个节点和1个主节点运行裸机集群

在metrics-server日志中,出现以下错误:

E0907 14:29:51.774592       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to 
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup 
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03): 
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]     
E0907 14:30:01.694794       1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"                                                                                                                    
E0907 14:30:10.517886       1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"
Run Code Online (Sandbox Code Playgroud)

我也无法使用kubectl顶部节点vps01获得任何指标

与自动缩放相同,无法正常工作

  unable to get metrics for resource cpu: unable to fetch metrics from
 resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)       
Run Code Online (Sandbox Code Playgroud)

小智 10

我找到了以下解决方案:

更改metrics-server-deployment.yaml文件并添加:

command:
    - /metrics-server 
    - --kubelet-preferred-address-types=InternalIP
    - --kubelet-insecure-tls
Run Code Online (Sandbox Code Playgroud)

  • 我认为在任何环境(开发测试除外)中使用 insecure-tls 都不应该是一个有效的答案。 (2认同)
  • 为了使其工作,我需要添加另一个参数:--v=2 (2认同)

Ric*_*ico 1

您的 pod 似乎存在 DNS 问题metrics-server。您可以连接到 Pod:

kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01
Run Code Online (Sandbox Code Playgroud)

如果无法 ping 通,则无法解析您的节点。

core-dns 或 kube-dns/etc/resolv.conf也在您的节点上使用每个节点,因此我会检查您是否可以解析彼此之间的节点。比如说,您可以vps01vps02或等处 ping 通vps03吗?