在没有安装网络插时的情况下
在v1.18.0版本中就是没有安装网络插件时都是NotReady状态的;这次发现有点奇怪了;
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready <none> 110m v1.21.0
k8s-node2 Ready <none> 110m v1.21.0
node NotReady control-plane,master 3h46m v1.21.0 #这是master节点
通过kubectl describe nodes node
查看该节点,看到这样的信息
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
再查看相应Pod时可以看到是coredns
启动创建成功,那么其它节点为什么是Ready状态呢?
[root@k8s-master ~]# kubectl get po -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-558bd4d5db-gsq4m 0/1 ContainerCreating 0 70m <none> k8s-node1 <none> <none>
coredns-558bd4d5db-wjmc4 0/1 ContainerCreating 0 33m <none> k8s-node1 <none> <none>
etcd-node 1/1 Running 0 5h37m 192.168.18.71 node <none> <none>
kube-apiserver-node 1/1 Running 0 5h37m 192.168.18.71 node <none> <none>
kube-controller-manager-node 1/1 Running 0 5h37m 192.168.18.71 node <none> <none>
kube-flannel-ds-clv8p 1/1 Running 0 11m 192.168.18.73 k8s-node2 <none> <none>
kube-flannel-ds-rdphh 1/1 Running 0 11m 192.168.18.72 k8s-node1 <none> <none>
kube-flannel-ds-zlg5w 1/1 Running 0 11m 192.168.18.71 node <none> <none>
kube-proxy-9hr59 1/1 Running 0 5h36m 192.168.18.71 node <none> <none>
kube-proxy-pbsht 1/1 Running 0 3h41m 192.168.18.72 k8s-node1 <none> <none>
kube-proxy-t6kvr 1/1 Running 0 3h41m 192.168.18.73 k8s-node2 <none> <none>
kube-scheduler-node 1/1 Running 0 5h37m 192.168.18.71 node <none> <none>
查看coredns pod详情发现错误了
Warning FailedCreatePodSandBox 2m35s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "efb1f4977ea3d506b79e6e2cc22b4e03e30a087068f19f54b5b414f400b0bf7e" network for pod "coredns-558bd4d5db-8dtmp": networkPlugin cni failed to set up pod "coredns-558bd4d5db-8dtmp_kube-system" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused, failed to clean up sandbox container "efb1f4977ea3d506b79e6e2cc22b4e03e30a087068f19f54b5b414f400b0bf7e" network for pod "coredns-558bd4d5db-8dtmp": networkPlugin cni failed to teardown pod "coredns-558bd4d5db-8dtmp_kube-system" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused]
直接登陆k8s-node1
节点上查看一下日志,发现都是不断重建销毁操作;不过奇怪的是我使用的flannel
网络怎么日志会出现有calico
字样的呢?
Apr 22 17:12:23 k8s-node1 kubelet[17383]: I0422 17:12:23.198399 17383 cni.go:333] "CNI failed to retrieve network namespace path" err="cannot find network namespace for the terminated container \"764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea\""
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.261949 17383 cni.go:380] "Error deleting pod from network" err="error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused" pod="kube-system/coredns-558bd4d5db-wjmc4" podSandboxID={Type:docker ID:764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea} podNetnsPath="" networkType="calico" networkName="k8s-pod-network"
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263391 17383 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"coredns-558bd4d5db-wjmc4_kube-system\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused" podSandboxID="764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea"
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263501 17383 kuberuntime_manager.go:958] "Failed to stop sandbox" podSandboxID={Type:docker ID:764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea}
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263623 17383 kuberuntime_manager.go:729] "killPodWithSyncResult failed" err="failed to \"KillPodSandbox\" for \"983d5974-dc76-4a40-8370-f098671175c9\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"coredns-558bd4d5db-wjmc4_kube-system\\\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused\""
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263718 17383 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"983d5974-dc76-4a40-8370-f098671175c9\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"coredns-558bd4d5db-wjmc4_kube-system\\\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused\"" pod="kube-system/coredns-558bd4d5db-wjmc4" podUID=983d5974-dc76-4a40-8370-f098671175c9
是根据上面情况,通过排查cni插件目录,发现/etc/cni/net.d/
目录下有些不同了;
#master节点只一个,而Node节点都有三个文件,奇怪了
[root@k8s-master ~]# ll /etc/cni/net.d/
total 4
-rw-r--r-- 1 root root 292 Apr 22 17:01 10-flannel.conflist
[root@k8s-master ~]#
#node1节点
[root@k8s-node1 ~]# ll /etc/cni/net.d/
total 12
-rw-rw-r-- 1 root root 656 Apr 12 16:56 10-canal.conflist
-rw-r--r-- 1 root root 292 Apr 22 17:01 10-flannel.conflist
-rw------- 1 root root 2587 Apr 12 16:56 calico-kubeconfig
[root@k8s-node1 ~]#
#node2节点
[root@k8s-node2 ~]# ll /etc/cni/net.d/
total 12
-rw-rw-r-- 1 root root 656 Apr 12 16:56 10-canal.conflist
-rw-r--r-- 1 root root 292 Apr 22 17:01 10-flannel.conflist
-rw------- 1 root root 2587 Apr 12 16:56 calico-kubeconfig
[root@k8s-node2 ~]#
正常情况下应该是其它的节点同步了Master节点上的cni目录的,所以这里把其它节点上的10-canal.conflist
和calico-kubeconfig
删除。当这两个文件删除后,再回到master节点查看发现相应的Pod已经正常运行了。
[root@k8s-master ~]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-gsq4m 1/1 Running 0 80m
coredns-558bd4d5db-wjmc4 1/1 Running 0 43m
etcd-node 1/1 Running 0 5h46m
kube-apiserver-node 1/1 Running 0 5h46m
kube-controller-manager-node 1/1 Running 0 5h46m
kube-flannel-ds-clv8p 1/1 Running 0 20m
kube-flannel-ds-rdphh 1/1 Running 0 20m
kube-flannel-ds-zlg5w 1/1 Running 0 20m
kube-proxy-9hr59 1/1 Running 0 5h46m
kube-proxy-pbsht 1/1 Running 0 3h50m
kube-proxy-t6kvr 1/1 Running 0 3h50m
kube-scheduler-node 1/1 Running 0 5h46m
[root@k8s-master ~]#