封尘网

让学习成为一种习惯!

安装kubernetes v1.20.0版本时遇到的一个异常

在没有安装网络插时的情况下

在v1.18.0版本中就是没有安装网络插件时都是NotReady状态的;这次发现有点奇怪了;

[root@k8s-master ~]# kubectl get nodes
NAME        STATUS     ROLES                  AGE     VERSION
k8s-node1   Ready      <none>                 110m    v1.21.0
k8s-node2   Ready      <none>                 110m    v1.21.0
node        NotReady   control-plane,master   3h46m   v1.21.0       #这是master节点

通过kubectl describe nodes node查看该节点,看到这样的信息

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

再查看相应Pod时可以看到是coredns启动创建成功,那么其它节点为什么是Ready状态呢?

[root@k8s-master ~]# kubectl get po -n kube-system  -o wide
NAME                           READY   STATUS              RESTARTS   AGE     IP              NODE        NOMINATED NODE   READINESS GATES
coredns-558bd4d5db-gsq4m       0/1     ContainerCreating   0          70m     <none>          k8s-node1   <none>           <none>
coredns-558bd4d5db-wjmc4       0/1     ContainerCreating   0          33m     <none>          k8s-node1   <none>           <none>
etcd-node                      1/1     Running             0          5h37m   192.168.18.71   node        <none>           <none>
kube-apiserver-node            1/1     Running             0          5h37m   192.168.18.71   node        <none>           <none>
kube-controller-manager-node   1/1     Running             0          5h37m   192.168.18.71   node        <none>           <none>
kube-flannel-ds-clv8p          1/1     Running             0          11m     192.168.18.73   k8s-node2   <none>           <none>
kube-flannel-ds-rdphh          1/1     Running             0          11m     192.168.18.72   k8s-node1   <none>           <none>
kube-flannel-ds-zlg5w          1/1     Running             0          11m     192.168.18.71   node        <none>           <none>
kube-proxy-9hr59               1/1     Running             0          5h36m   192.168.18.71   node        <none>           <none>
kube-proxy-pbsht               1/1     Running             0          3h41m   192.168.18.72   k8s-node1   <none>           <none>
kube-proxy-t6kvr               1/1     Running             0          3h41m   192.168.18.73   k8s-node2   <none>           <none>
kube-scheduler-node            1/1     Running             0          5h37m   192.168.18.71   node        <none>           <none>

查看coredns pod详情发现错误了

  Warning  FailedCreatePodSandBox  2m35s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "efb1f4977ea3d506b79e6e2cc22b4e03e30a087068f19f54b5b414f400b0bf7e" network for pod "coredns-558bd4d5db-8dtmp": networkPlugin cni failed to set up pod "coredns-558bd4d5db-8dtmp_kube-system" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused, failed to clean up sandbox container "efb1f4977ea3d506b79e6e2cc22b4e03e30a087068f19f54b5b414f400b0bf7e" network for pod "coredns-558bd4d5db-8dtmp": networkPlugin cni failed to teardown pod "coredns-558bd4d5db-8dtmp_kube-system" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused]

直接登陆k8s-node1节点上查看一下日志,发现都是不断重建销毁操作;不过奇怪的是我使用的flannel网络怎么日志会出现有calico字样的呢?

Apr 22 17:12:23 k8s-node1 kubelet[17383]: I0422 17:12:23.198399   17383 cni.go:333] "CNI failed to retrieve network namespace path" err="cannot find network namespace for the terminated container \"764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea\""
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.261949   17383 cni.go:380] "Error deleting pod from network" err="error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused" pod="kube-system/coredns-558bd4d5db-wjmc4" podSandboxID={Type:docker ID:764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea} podNetnsPath="" networkType="calico" networkName="k8s-pod-network"
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263391   17383 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"coredns-558bd4d5db-wjmc4_kube-system\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused" podSandboxID="764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea"
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263501   17383 kuberuntime_manager.go:958] "Failed to stop sandbox" podSandboxID={Type:docker ID:764830fc792a18935f6e984c91defa1916433f8b98e763857fc64d62ce8e04ea}
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263623   17383 kuberuntime_manager.go:729] "killPodWithSyncResult failed" err="failed to \"KillPodSandbox\" for \"983d5974-dc76-4a40-8370-f098671175c9\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"coredns-558bd4d5db-wjmc4_kube-system\\\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused\""
Apr 22 17:12:23 k8s-node1 kubelet[17383]: E0422 17:12:23.263718   17383 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"983d5974-dc76-4a40-8370-f098671175c9\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"coredns-558bd4d5db-wjmc4_kube-system\\\" network: error getting ClusterInformation: Get https://127.0.0.1:6443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 127.0.0.1:6443: connect: connection refused\"" pod="kube-system/coredns-558bd4d5db-wjmc4" podUID=983d5974-dc76-4a40-8370-f098671175c9

 

是根据上面情况,通过排查cni插件目录,发现/etc/cni/net.d/目录下有些不同了;

#master节点只一个,而Node节点都有三个文件,奇怪了
[root@k8s-master ~]# ll /etc/cni/net.d/
total 4
-rw-r--r-- 1 root root 292 Apr 22 17:01 10-flannel.conflist
[root@k8s-master ~]# 

#node1节点
[root@k8s-node1 ~]# ll /etc/cni/net.d/
total 12
-rw-rw-r-- 1 root root  656 Apr 12 16:56 10-canal.conflist
-rw-r--r-- 1 root root  292 Apr 22 17:01 10-flannel.conflist
-rw------- 1 root root 2587 Apr 12 16:56 calico-kubeconfig
[root@k8s-node1 ~]# 

#node2节点
[root@k8s-node2 ~]# ll /etc/cni/net.d/
total 12
-rw-rw-r-- 1 root root  656 Apr 12 16:56 10-canal.conflist
-rw-r--r-- 1 root root  292 Apr 22 17:01 10-flannel.conflist
-rw------- 1 root root 2587 Apr 12 16:56 calico-kubeconfig
[root@k8s-node2 ~]# 

正常情况下应该是其它的节点同步了Master节点上的cni目录的,所以这里把其它节点上的10-canal.conflistcalico-kubeconfig删除。当这两个文件删除后,再回到master节点查看发现相应的Pod已经正常运行了。

[root@k8s-master ~]# kubectl get po -n kube-system 
NAME                           READY   STATUS    RESTARTS   AGE
coredns-558bd4d5db-gsq4m       1/1     Running   0          80m
coredns-558bd4d5db-wjmc4       1/1     Running   0          43m
etcd-node                      1/1     Running   0          5h46m
kube-apiserver-node            1/1     Running   0          5h46m
kube-controller-manager-node   1/1     Running   0          5h46m
kube-flannel-ds-clv8p          1/1     Running   0          20m
kube-flannel-ds-rdphh          1/1     Running   0          20m
kube-flannel-ds-zlg5w          1/1     Running   0          20m
kube-proxy-9hr59               1/1     Running   0          5h46m
kube-proxy-pbsht               1/1     Running   0          3h50m
kube-proxy-t6kvr               1/1     Running   0          3h50m
kube-scheduler-node            1/1     Running   0          5h46m
[root@k8s-master ~]# 

 

提醒:本文最后更新于 884 天前,文中所描述的信息可能已发生改变,请谨慎使用。