Kubernetes the Not So Hard Way With Ansible at Scaleway Part 9 - Upgrading Kubernetes

Upgrading Kubernetes services: kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy and kubelet

November 5, 2017

If you followed my Kubernetes the Not So Hard Way With Ansible blog posts so far and have a Kubernetes cluster running you’ll sooner or later want to upgrade to the next version. With this setup it’s pretty easy.

Prerequisite

The first thing you should do is to read the CHANGELOG of the version you want to upgrade. E.g. if you upgrade from v1.8.0 to v1.8.1 you only need to read CHANGELOG-1.8. Watch out for Action Required headlines. E.g. between v1.8.0 and v1.8.1 there was a change that requires action. That shouldn’t happen for minor releases but sometimes it can’t be avoided. If you want to upgrade the major version e.g. from v1.7.x to v1.8.0 read the CHANGELOG-1.8. The same advice as above applies of course.

As the whole Kubernetes cluster state is stored in etcd you should also consider creating a backup of the etcd data. Have a look at the etcd Admin Guide how to do this. This is especially true if you upgrading to a new major release. Also Heptio’s Ark could be a option. Heptio Ark is a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.

Upgrading

If you considered above prerequisites we’re ready to go. If you do a minor release update (v1.8.0 -> v1.8.1 e.g.) or a major release update (v1.7.x -> v1.8.0) the steps are basically the same. First we update the controller nodes node by node and afterwards the worker nodes.

One additional hint: Upgrading a major release while skipping one major release is a bad idea and calls for trouble ;-) So if you want upgrade from v1.6.x to v1.8.0 your upgrade steps should be v1.6.x -> v1.7.x -> v1.8.0.

Also please upgrade/use the roles ansible-role-kubernetes-controller and ansible-role-kubernetes-worker with version/tag v1.0.0_v1.8.2 or above.

Controller nodes

First update your inventory cache with ansible -m setup all.

The next thing to do is to set k8s_release. Let’s assume we currently have set k8s_release: "1.8.0" and want to upgrade to 1.8.2 so we set k8s_release: "1.8.2" in group_vars/k8s.yml.

Next we deploy the controller role one by one to every controller node e.g.:

ansible-playbook --tags=role-kubernetes-controller --limit=controller1.your.tld k8s.yml

Of course replace controller1.your.tld with the hostname of your first controller node. This will download the Kubernetes binaries, updates the old one and finally restarts kube-apiserver, kube-controller-manager and kube-scheduler. As in our current setup all worker services communicate only with the Kubernetes controller1 (we have no loadbalancer for the kube-apiserver yet) the API server will be shortly unavailable. But that only affects deployments/updates that would take place in this short time. All pods running on the worker are working as usual.

After the role is deployed you should have a look at the logfiles (with journalctl e.g.) on controller1 to verify everything worked well. Also check if the services are still listen in the ports they usually do (netstat -tlpn e.g.). You could also do a small Kubernetes test deployment via kubectl if this still works.

If everything is ok go ahead and update controller2 and controller3 e.g.:

ansible-playbook --tags=role-kubernetes-controller --limit=controller2.your.tld k8s.yml

# Wait until controller role is deployed on controller2...

ansible-playbook --tags=role-kubernetes-controller --limit=controller3.your.tld k8s.yml

Now your controller nodes should be up2date!

Worker nodes

For the worker nodes it’s basically the same as with the controller nodes. We start with worker1

ansible-playbook --tags=role-kubernetes-worker --limit=worker1.your.tld k8s.yml

Of course replace worker1.your.tld with the hostname of your first worker node. This will download the Kubernetes binaries, updates the old one and finally restarts kube-proxy and kubelet. While the two services are updated they won’t be able to start new pods or change network settings. But that’s only true while the services are restarted which takes only a few seconds and they will catch up the changes afterwards. Shouldn’t be a big deal as long as you don’t have a few thousand pods running ;-)

Again check the logs and if everything is ok continue with the other nodes:

ansible-playbook --tags=role-kubernetes-worker --limit=worker2.your.tld k8s.yml

# Wait until controller role is deployed on controller2...

ansible-playbook --tags=role-kubernetes-worker --limit=worker3.your.tld k8s.yml

If the worker role is deployed to all worker nodes we’re basically done with the Kubernetes upgrade!