我试图建立Kubernetes主人,发出:
kubeadm init --pod-network-cidr = 192.168.0.0/16
问题:corednspods有CrashLoopBackOff或者Error说:
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-lflwx 2/2 Running 0 2d
coredns-576cbf47c7-nm7gc 0/1 CrashLoopBackOff 69 2d
coredns-576cbf47c7-nwcnx 0/1 CrashLoopBackOff 69 2d
etcd-suey.nknwn.local 1/1 Running 0 2d
kube-apiserver-suey.nknwn.local 1/1 Running 0 2d
kube-controller-manager-suey.nknwn.local 1/1 Running 0 2d
kube-proxy-xkgdr 1/1 Running 0 2d
kube-scheduler-suey.nknwn.local 1/1 Running 0 2d
#
Run Code Online (Sandbox Code Playgroud)
我尝试了故障排除kubeadm - Kubernetes,但我的节点没有运行SELinux,我的Docker是最新的.
# docker --version …Run Code Online (Sandbox Code Playgroud) 我有 kubernetes 集群运行在Ubuntu 16.04. 当我运行时,它nslookup kubernetes.default显示master如下:
Server: 192.168.88.21
Address: 192.168.88.21#53
** server can't find kubernetes.default: NXDOMAIN
Run Code Online (Sandbox Code Playgroud)
下面是内容/etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.88.21
nameserver 127.0.1.1
search VISDWK.local
Run Code Online (Sandbox Code Playgroud)
使用kubernetes版本
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:36:44Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Run Code Online (Sandbox Code Playgroud)
使用 weave 进行联网并使用以下命令安装:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" …Run Code Online (Sandbox Code Playgroud) 我一直在尝试在单个节点中设置 k8s,一切都安装得很好。但是当我检查我的 kube-system pod 的状态时,
CNI -> flannel pod has crashed ,reason -> Nameserver limits are exceeded, 部分nameservers 被省略,应用的nameserver 行为:xxxx xxxx xxxx
CoreDNS pods 状态为ContainerCreating。
在我的办公室中,当前服务器已配置为具有静态 ip,当我检查/etc/resolv.conf 时
这是输出
# Generated by NetworkManager
search ORGDOMAIN.BIZ
nameserver 192.168.1.12
nameserver 192.168.2.137
nameserver 192.168.2.136
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 192.168.1.10
nameserver 192.168.1.11
Run Code Online (Sandbox Code Playgroud)
我找不到根本原因,我应该看什么?
我在 master 的 CoreDNS 上有以下问题(另请参阅 master 上的 ready is 0/1):
E0321 22:54:45.590231 1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go@v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 …Run Code Online (Sandbox Code Playgroud) 我认为标题几乎是不言自明的。我做了很多实验,可悲的事实是,这coredns确实增加了20 ms集群内所有请求的开销。起初我们认为也许通过添加更多复制并平衡更多实例之间的解析请求,我们可以提高响应时间,但这根本没有帮助。(我们从 2 个 pod 扩大到 4 个 pod)
扩展到 4 个实例后,解析时间的波动有所增强。但这并不是我们所期望的,而且20 ms开销仍然存在。
我们有一些网络服务,它们的实际响应时间是,< 30 ms并且使用coredns我们将响应时间加倍,这并不酷!
在得出这个问题的结论后,我们做了一个实验来仔细检查这不是操作系统级别的开销。结果与我们的预期并没有什么不同。
我们认为也许我们可以实现/部署一个基于hostname将每个 Pod 所需映射列表放入/etc/hosts该 Pod 内部的解决方案。所以我的最后问题如下:
coredns?coredns在 k8s 环境中工作的替代解决方案?任何想法或见解都值得赞赏。提前致谢。
我在minikube集群中有一个k8s服务/部署(命名空间amq中的default名称:
D20181472:argo-k8s gms$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
argo argo-ui ClusterIP 10.97.242.57 <none> 80/TCP 5h19m
default amq LoadBalancer 10.102.205.126 <pending> 61616:32514/TCP 4m4s
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h23m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 5h23m
Run Code Online (Sandbox Code Playgroud)
我打滑了Infoblox公司/ dnstools,并试图nslookup,dig和ping的amq.default结果如下:
dnstools# nslookup amq.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: amq.default.svc.cluster.local
Address: 10.102.205.126
dnstools# ping amq.default
PING amq.default (10.102.205.126): 56 data bytes
^C
--- amq.default ping …Run Code Online (Sandbox Code Playgroud) 我已经自定义了 coredns 映像并将其推送到我的 azure 容器注册表 (ACR)。
现在,在安装 k3s 之后出现的默认 coredns pod 中,我想使用my_azure_acr_repo/proj/customize-coredns:latestimage而不是 rancher/coredns-coredns:1.8.3. 因此,我编辑了 coredns 部署kubectl edit deploy coredns -n kube-system,并将我的 acr 映像替换为 rancher one。但现在 coredns pod 无法提取我的 acr 映像并在 pod 描述中给出错误:
Failed to pull image "my_azure_acr_repo/proj/customize-coredns:latest": rpc error:
code = Unknown desc = failed to pull and unpack image "my_azure_acr_repo/proj/customize-coredns:latest":
failed to resolve reference "my_azure_acr_repo/proj/customize-coredns:latest": failed to
authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized
Run Code Online (Sandbox Code Playgroud)
如何验证 acr 映像,以便 pod 拉取它?