在Kubernetes中调试DNS解析

S A*_*rew 5 dns ubuntu docker kubernetes coredns

我已经Ubuntu 16.04使用以下命令初始化了kubernetes v1.13.1集群:

sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.88.142
Run Code Online (Sandbox Code Playgroud)

weave使用以下方式安装:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Run Code Online (Sandbox Code Playgroud)

我有10个raspberry pi充当工作程序节点并连接到集群。所有这些都运行良好的部署。这些节点正在运行Pod,这些Pod尝试连接到物联网中心visdwk-azure-devices.net并发布一些数据。在10个节点中,只有少数几个节点可以连接,其他节点则抛出错误unable to connect to iot hub。我进行了ping测试,发现他们在ping google的公共IP地址时无法ping google。

这使我认为coredns吊舱有问题。我按照此文档进行了以下测试。

Pod中包含以下内容 /etc/resolv.conf

nameserver 10.96.0.10
search visdwk.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Run Code Online (Sandbox Code Playgroud)

在我看来这很正常。所有coredns pod都运行良好。

coredns-86c58d9df4-42xqc               1/1     Running   8         1d11h
coredns-86c58d9df4-p6d98               1/1     Running   7         1d6h
Run Code Online (Sandbox Code Playgroud)

我也nslookup kubernetes.default从busybox容器中完成了操作,并获得了正确的响应。以下是的日志coredns-86c58d9df4-42xqc

.:53
2019-02-08T08:40:10.038Z [INFO] CoreDNS-1.2.6
2019-02-08T08:40:10.039Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
 [INFO] plugin/reload: Running configuration MD5 = 
f65c4821c8a9b7b5eb30fa4fbc167769
t
Run Code Online (Sandbox Code Playgroud)

以上日志看起来也很正常。

我也不能说由于编织产生的任何错误而导致Pod无法解析iot集线器,因为如果weave抛出错误,那么我相信Pod将永远不会启动,并且始终会处于故障状态,但实际上Pod仍然处于运行状态。如果我错了,请在这里纠正我。

DNS服务似乎也处于运行状态:

NAME                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10     <none>        53/UDP,53/TCP   1d6h
Run Code Online (Sandbox Code Playgroud)

但是,我仍然无法弄清为什么集群中的几个节点无法解析物联网中心。任何人都可以在这里给我一些建议。请帮忙。谢谢。

来自失败的Pod的日志:

 1550138544: New connection from 127.0.0.1 on port 1883.
1550138544: New client connected from 127.0.0.1 as 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (c1, k60).
1550138544: Sending CONNACK to 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (0, 0)
1550138544: Received PUBLISH from 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (1211 bytes))
1550138544: Received DISCONNECT from 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504
1550138544: Client 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 disconnected.
1550138547: Saving in-memory database to /mqtt/data/mosquitto.db.
1550138547: Bridge local.machine6 doing local SUBSCRIBE on topic devices/machine6/messages/events/#
1550138547: Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
1550138552: Error creating bridge: Try again.
1550138566: New connection from 127.0.0.1 on port 1883.
1550138566: New client connected from 127.0.0.1 as afb6cc2a-ee78-482e-aff0-fc595e06f86a (c1, k60).
1550138566: Sending CONNACK to afb6cc2a-ee78-482e-aff0-fc595e06f86a (0, 0)
1550138566: Received PUBLISH from afb6cc2a-ee78-482e-aff0-fc595e06f86a (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (1211 bytes))
1550138566: Received DISCONNECT from afb6cc2a-ee78-482e-aff0-fc595e06f86a
1550138566: Client afb6cc2a-ee78-482e-aff0-fc595e06f86a disconnected.
1550138567: New connection from 127.0.0.1 on port 1883.
1550138567: New client connected from 127.0.0.1 as 01b9e135-fbc8-4d67-9962-356e8cf9f080 (c1, k60).
1550138567: Sending CONNACK to 01b9e135-fbc8-4d67-9962-356e8cf9f080 (0, 0)
1550138567: Received PUBLISH from 01b9e135-fbc8-4d67-9962-356e8cf9f080 (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (755 bytes))
1550138567: Received DISCONNECT from 01b9e135-fbc8-4d67-9962-356e8cf9f080
1550138567: Client 01b9e135-fbc8-4d67-9962-356e8cf9f080 disconnected.
1550138578: Saving in-memory database to /mqtt/data/mosquitto.db.
1550138583: Bridge local.machine6 doing local SUBSCRIBE on topic devices/machine6/messages/events/#
1550138583: Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
1550138588: Error creating bridge: Try again.
Run Code Online (Sandbox Code Playgroud)

Pod正在运行一个试图连接到visdwk.azure-devices.net并抛出错误的mosquitto容器。

Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
Error creating bridge: Try again.
Run Code Online (Sandbox Code Playgroud)

小智 1

您的 DNS Pod 之一似乎未提供 DNS 服务。

证据就在声明中“只有少数节点能够连接,其他节点会抛出无法连接到物联网集线器的错误”

这是环路中出现故障的节点进行负载平衡的典型症状。

尝试:

  1. 删除发出该消息的 DNS 服务器 Pod:visdwk.azure-devices.net.visdwknamespace.svc.cluster.local. udp 82 false 512" NXDOMAIN qr,aa,rd,ra 175 0.000651078s where visdwk.azure-devices.net
  2. 等待更改通过集群传播。
  3. 测试连接。

如果这是正确的,他们应该都连接。

要进行确认,请重新添加该 Pod 并删除另一个。重新测试,应该都无法连接。