GCE and self hosted k8s 1.6 no route created
There is an annoying bug in Kubernetes 1.6 running on GCE with Calico/Flannel networking operating via CNI plugin interface.
You may experience same issue if kubelet on your nodes runs with options
--network-plugin=cni --cloud-provider=gce
When a new node is added to the k8s cluster it’s recognized as ‘Ready’ but no pods except Calico/Flannel pod are being scheduled there.
It looks like Calico/Flannel creates all needed routes in GCE but fails to properly report this to the k8s master or whatnot.
Here are related bugs:
If you check status of the problem node you’ll see NetworkUnavailable
condition set as True
:
kubectl get node <node_name> -o yaml
...
status:
conditions:
...
- type: NetworkUnavailable
reason: NoRouteCreated
status: "True"
...
One possible workaround is to manually override problem node status, as suggested here
You may run the following script from k8s master node (it queries k8s API on localhost). Array of hostnames has to be updated with actual names of problem nodes of course. Review code carefully and make sure you understand what it does before applying to production!
#!/bin/bash
hostnames=(minion-0 minion-1 minion-2)
for i in "${hostnames[@]}"; do
curl http://localhost:8080/api/v1/nodes/$i/status > a.json
cat a.json | tr -d '\n' | sed 's/{[^}]\+NetworkUnavailable[^}]\+}/{"type": "NetworkUnavailable","status": "False","reason": "RouteCreated","message": "Manually set through k8s api"}/g' > b.json
curl -X PUT http://localhost:8080/api/v1/nodes/$i/status -H "Content-Type: application/json" -d @b.json
done
it basically dumps node status into a file, overrides “NetworkUnavailable” condition and pushes the node status back. Quite a dirty hack have to admit but a working one.
Once the workaround is applied pods should start to be scheduled properly.