Kubernetes:Pod无法解析主机名

azu*_*ake 7 dns kubernetes

我在Kubernetes上遇到问题,我的Pod无法解析主机名(例如google.comkubernetes.default)。

我目前在OpenStack的两个CentOS7实例上运行1个主节点和1个节点。我部署使用kubeadm

以下是安装的版本:

kubeadm-1.7.3-1.x86_64
kubectl-1.7.3-1.x86_64
kubelet-1.7.3-1.x86_64
kubernetes-cni-0.5.1-0.x86_64
Run Code Online (Sandbox Code Playgroud)

下面概述了一些验证步骤,以使您对我的问题有所了解。

我定义一个busybox pod:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always
Run Code Online (Sandbox Code Playgroud)

然后创建pod:

$ kubectl create -f busybox.yaml
Run Code Online (Sandbox Code Playgroud)

尝试执行name的DNS查找google.com

$ kubectl exec -ti busybox -- nslookup google.com
Server:    10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'google.com'
Run Code Online (Sandbox Code Playgroud)

尝试执行name的DNS查找kubernetes.default

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
Run Code Online (Sandbox Code Playgroud)

检查我的DNS Pod是否正在运行:

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                        READY     STATUS    RESTARTS   AGE
kube-dns-2425271678-k1nft   3/3       Running   9          5d
Run Code Online (Sandbox Code Playgroud)

检查我的DNS服务是否已启动:

$ kubectl get svc --namespace=kube-system
NAME       CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   10.96.0.10   <none>        53/UDP,53/TCP   5d
Run Code Online (Sandbox Code Playgroud)

检查DNS终结点是否公开:

$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                     AGE
kube-dns   10.244.0.5:53,10.244.0.5:53   5d
Run Code Online (Sandbox Code Playgroud)

检查/etc/resolv.conf我的容器中的内容:

$ kubectl exec -ti busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Run Code Online (Sandbox Code Playgroud)

如果我正确理解,那么Kubernetes文档指出我的Pod应该继承节点(或主节点)的DNS配置。但是,即使其中只有一行(nameserver 10.92.128.40),旋转Pod时我也会收到以下警告:

Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local mydomain.net anotherdomain.net yetanotherdomain.net
Run Code Online (Sandbox Code Playgroud)

我了解存在一个已知问题,其中只能列出这么多的项目/etc/resolv.conf。但是,上面的搜索行和我的容器中的名称服务器将从何处生成?

最后是kube-dns容器的日志:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0817 20:54:58.445280       1 dns.go:48] version: 1.14.3-4-gee838f6
I0817 20:54:58.452551       1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0817 20:54:58.452616       1 server.go:113] FLAG: --alsologtostderr="false"
I0817 20:54:58.452628       1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0817 20:54:58.452638       1 server.go:113] FLAG: --config-map=""
I0817 20:54:58.452643       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0817 20:54:58.452650       1 server.go:113] FLAG: --config-period="10s"
I0817 20:54:58.452659       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0817 20:54:58.452665       1 server.go:113] FLAG: --dns-port="10053"
I0817 20:54:58.452674       1 server.go:113] FLAG: --domain="cluster.local."
I0817 20:54:58.452683       1 server.go:113] FLAG: --federations=""
I0817 20:54:58.452692       1 server.go:113] FLAG: --healthz-port="8081"
I0817 20:54:58.452698       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0817 20:54:58.452704       1 server.go:113] FLAG: --kube-master-url=""
I0817 20:54:58.452713       1 server.go:113] FLAG: --kubecfg-file=""
I0817 20:54:58.452718       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0817 20:54:58.452727       1 server.go:113] FLAG: --log-dir=""
I0817 20:54:58.452734       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0817 20:54:58.452741       1 server.go:113] FLAG: --logtostderr="true"
I0817 20:54:58.452746       1 server.go:113] FLAG: --nameservers=""
I0817 20:54:58.452752       1 server.go:113] FLAG: --stderrthreshold="2"
I0817 20:54:58.452759       1 server.go:113] FLAG: --v="2"
I0817 20:54:58.452765       1 server.go:113] FLAG: --version="false"
I0817 20:54:58.452775       1 server.go:113] FLAG: --vmodule=""
I0817 20:54:58.452856       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0817 20:54:58.453680       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0817 20:54:58.453692       1 dns.go:147] Starting endpointsController
I0817 20:54:58.453699       1 dns.go:150] Starting serviceController
I0817 20:54:58.453841       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0817 20:54:58.453852       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0817 20:54:58.964468       1 dns.go:171] Initialized services and endpoints from apiserver
I0817 20:54:58.964523       1 server.go:129] Setting up Healthz Handler (/readiness)
I0817 20:54:58.964536       1 server.go:134] Setting up cache handler (/cache)
I0817 20:54:58.964545       1 server.go:120] Status HTTP port 8081
Run Code Online (Sandbox Code Playgroud)

dnsmasq容器。不用理会它发现了几个域名服务器,而不仅仅是我说过的域名服务器resolv.conf,因为我原来在域名服务器中确实有更多。我试图通过删除多余的东西来简单地做到这一点:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
I0817 20:55:03.295826       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0817 20:55:03.298134       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0817 20:55:03.731577       1 nanny.go:111] 
W0817 20:55:03.731609       1 nanny.go:112] Got EOF from stdout
I0817 20:55:03.731642       1 nanny.go:108] dnsmasq[9]: started, version 2.76 cachesize 1000
I0817 20:55:03.731656       1 nanny.go:108] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0817 20:55:03.731681       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0817 20:55:03.731689       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0817 20:55:03.731695       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0817 20:55:03.731704       1 nanny.go:108] dnsmasq[9]: reading /etc/resolv.conf
I0817 20:55:03.731710       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0817 20:55:03.731717       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0817 20:55:03.731723       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0817 20:55:03.731729       1 nanny.go:108] dnsmasq[9]: using nameserver 10.92.128.40#53
I0817 20:55:03.731735       1 nanny.go:108] dnsmasq[9]: using nameserver 10.92.128.41#53
I0817 20:55:03.731741       1 nanny.go:108] dnsmasq[9]: using nameserver 10.95.207.66#53
I0817 20:55:03.731747       1 nanny.go:108] dnsmasq[9]: read /etc/hosts - 7 addresses
Run Code Online (Sandbox Code Playgroud)

sidecar容器:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c sidecar
ERROR: logging before flag.Parse: I0817 20:55:04.488391       1 main.go:48] Version v1.14.3-4-gee838f6
ERROR: logging before flag.Parse: I0817 20:55:04.488612       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0817 20:55:04.488667       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0817 20:55:04.488766       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
Run Code Online (Sandbox Code Playgroud)

我主要阅读这里提供的文档。任何方向,见识或尝试的事情将不胜感激。

小智 29

我有一个类似的问题。重新启动 coredns 部署为我解决了这个问题:

kubectl -n kube-system rollout restart deployment coredns
Run Code Online (Sandbox Code Playgroud)

  • 如果 kubernetes 不自行检测并执行此操作,那么它是否会变得毫无用处?如果我们无法了解 dns 故障并从中恢复,甚至可能没有生命力或健康检查。 (4认同)
  • coredns 正在运行,我认为它一定很好,但重新启动它就像一个魅力,谢谢。 (2认同)

ate*_*lxt 10

检查 coredns pods 日志,如果您看到类似以下错误:

# kubectl logs --namespace=kube-system coredns-XXX
  ...
  [ERROR] plugin/errors ... HINFO: read udp ... read: no route to host
Run Code Online (Sandbox Code Playgroud)

然后确保主机上启用了firewalld masquerade:

# firewall-cmd --list-all
  ... 
  masquerade: yes

Enable if it's "no":
# firewall-cmd --add-masquerade --permanent
# firewall-cmd --reload
Run Code Online (Sandbox Code Playgroud)

*此后您可能需要重新启动/重新启动

  • @cryptoparty 重新启动网络,如“systemctl 重新启动网络” (3认同)

gzc*_*gzc 7

遇到了同样的问题。我按照此文档dns-debugging-resolution并检查了与 DNS 相关的 pod、服务、端点,所有这些都在运行,没有错误消息。最后,我发现我的calico服务已经死了。在我启动 calico 服务并等待几分钟后,它起作用了。


Jav*_*ron 5

想到了一些想法:

  • 所以我在我的主节点和节点上重新启动了 docker 和 kubelet 服务,你知道什么......一切开始工作......我不知道我做了什么,或者为什么它不工作,但是全面刷新这两个服务以某种方式解决了问题。我希望我一开始就这样做了!感谢一堆人在这里向我伸出援助之手。我肯定需要更多地了解 Kubernetes 的工作原理:) (2认同)