Issue
We are deploying Jenkins on the K8s env, with 1 master and 4 worker nodes using calico network plugin, the pods are created on the time of Job run in Jenkins, but the issue is hostnames don't resolve, no error logs in Jenkins, on checking the pods, calico pod on master node is down, not sure if this is cause for the above problem.
[root@kmaster-1 ~]# kubectl get pod calico-node-lvvx4 -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-node-lvvx4 0/1 Running 9 9d x0.x1.x5.x6 kmaster-1.b.x.x.com <none> <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 107s (x34333 over 3d23h) kubelet, kmaster-1.b.x.x.com (combined from similar events): Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 10.x1.2x.x23,10.x1.x7.x53,10.x1.1x.1x5,10.x1.2x.1x22020-04-12 08:40:48.567 [INFO][27813] health.go 156: Number of node(s) with BGP peering established = 0
10.x1.2x.x23,10.x1.x7.x53,10.x1.1x.1x5,10.x1.2x.1x2 are the IPs of the worker pods, they are connected among themselves as netstat shows BGP established, but not with the master. Port 179 is open on the master,not sure why BGP peering doesn't establish, Kindly advice.
Solution
What Sanjay M. P. shared worked for me, however I want to clarify what caused the problem, and why the solution work with some more detail.
First of all, I am running an ubuntu env, so what Piknik shared does not work, firewalld is only on centos / rhel systems. Even still, ufw was disabled on all nodes.
I was able to narrow down the exact error I was receiving to cause this problem by doing a kubectl describe pod calico-node-*****
. What I found was the calico BIRD service could not connect to peers. What also showed was the IP addresses the calico-node was trying to use to pair to for it's BGP peers. It was using the wrong interface, thereby wrong ips.
To define the problem for myself, all of my node host vms have multiple interfaces. If you don't explicitly specify which interface to use, calico "automatically" picks one, weather you want that interface or not.
The solution was to specify the specific interface when you build your calico overlay network in the calico.yaml file. Sanjay M. P. uses a regex, which MAY work if you have different named interfaces, however, as I am running Ubuntu Server, the string "ens" starts for all interfaces, so the same problem happens.
I have stripped out most of the calico.yaml file to show the exact location of where this setting should be (~line 675) Add the setting there, I also left the CALICO_IPV4POOL_CIDR as well as this setting needs to be set appropriately to the same subnet range specified on kubeadm initialization:
spec:
template:
spec:
containers:
- name: calico-node
image: calico/node:v3.14.2
env:
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/22"
- name: IP_AUTODETECTION_METHOD
value: "interface=ens224"
Unfortunately I did not find a way to roll back older configurations, so I just rebuilt the whole cluster, and redeployed the calico overlay (Thank god for VM snapshots).
kubeadm init your cluster.
Then run kubectl create -f calico.yaml
with the setting added to build out the overlay network.
Confirm overlay network is working
- run
watch -n1 kube-system get pods -o wide
, and then add your nodes. Make sure all calico-nodes being build on newly added kube nodes come up as "1/1 Running". - Download and install calicoctl, and run
calicoctl node status
, make sure the correct network is being used for BGP.
You can read more about IP_AUTODETECTION_METHOD here.
Answered By - Dave
Answer Checked By - Robin (JavaFixing Admin)