Kubernetes CSI upgrade notes

A while ago I installed the Hetzner CSI driver for Kubernetes and all CSI components needed and blogged about it: Kubernetes the not so hard way with Ansible - Persistent storage - Part 1. At that time I used Kubernetes v1.14 and meanwhile my K8s cluster is running v1.17. During that time I didn’t upgrade any of the CSI components (of course the K8s internal CSI components changed quite a bit). That worked without problems. The Kubernetes and CSI Sidecar Compatibility matrix shows which minimum and maximum K8s version the side car containers support and also specifies a recommended version. For the CSI external-provisioner I used v1.3.1 but the maximum K8s version for that is K8s v1.19.x. So it’s a good time to upgrade now.

I’m using Ansible to manage all CSI components as described in Kubernetes the not so hard way with Ansible - Persistent storage - Part 1. Also see my playbook here: Install Hetzner CSI driver. But the procedure is basically the same if you use plain YAML files, helm or something like that. So the starting point regarding used side car container versions was:

# DaemonSet:
hcloud_csi_node_driver_registrar: "1.1.0"

# StatefulSet:
hcloud_csi_attacher: "1.2.0"
hcloud_csi_provisioner: "1.3.0"
hcloud_csi_cluster_driver_registrar: "1.0.1"

# Hetzner CSI driver
hcloud_csi_driver: "1.1.4"

The CSI node-driver-registrar runs as DaemonSet on every node which is a sidecar container that fetches driver information (using NodeGetInfo) from a CSI endpoint and registers it with the kubelet on that node:

hcloud_csi_node_driver_registrar: "1.1.0"

The CSI external-attacher (watches the Kubernetes API server for VolumeAttachment objects and triggers ControllerPublishVolume|ControllerUnpublishVolume operations against a CSI endpoint), CSI external-provisioner (watches the Kubernetes API server for PersistentVolumeClaim objects and calls CreateVolume against the specified CSI endpoint to provision a new volume) and the CSI cluster-driver-registrar container run as StatefulSet in one pod:

hcloud_csi_attacher: "1.2.1"
hcloud_csi_provisioner: "1.3.0"
hcloud_csi_cluster_driver_registrar: "1.0.1"

One bigger change here was that the CSI cluster-driver-registrar is now deprecated. It was replaced by CSIDriver Object. More on that later.

And finally the Hetzner CSI driver:

hcloud_csi_driver: "1.1.4"

The first thing I did was changing the RBAC rules of the ClusterRule. I removed the permissions for the deprecated CSI cluster-driver-registrar and added a few permissions. Then I rolled the RBAC changes out already (you may need to specify --become-user=... argument):

ansible-playbook --tags=install-clusterrole hetzner-csi.yml

Now I removed the hcloud_csi_cluster_driver_registrar variable as it’s no longer used. But I introduced a new one:

hcloud_csi_resizer: "0.5.0"

The StatefulSet now contains a new CSI external-resizer container. The CSI external-resizer is a sidecar container that watches the Kubernetes API server for PersistentVolumeClaim object edits and triggers ControllerExpandVolume operations against a CSI endpoint if the user requested more storage on PersistentVolumeClaim object. So it allows you to grow a persistent volume.

Since upgrading or changing a StatefulSet isn’t that straight forward I decided to re-create it (also see StatefulSet upgrade strategy). That means deleting it and creating it again afterwards. I didn’t configured HA support for CSI external-attacher so I only have one pod running (e.g. hcloud-csi-controller-0). But even with HA support configured a rolling upgrade may not work if you have a look at the CHANGELOG for version 2.0.

In general it’s a good idea to read the CHANGELOG’s for all Kubernetes CSI Sidecar Containers. This page contains links to all containers which in turn contains links to the CHANGELOG’s.

So I use my Ansible playbook now to delete the StatefulSet:

ansible-playbook -e delete=true --tags=delete-statefulset hetzner-csi.yml

Check with kubectl get pods -n kube-system if the StatefulSet is gone (it should be called hcloud-csi-controller-0). I’ll install it again later. But for now I install the CSI Driver object with Ansible:

ansible-playbook --tags=install-csidriver-crd hetzner-csi.yml

Then I change all CSI related variable values used by Ansible (e.g. in group_vars/all.yml):

# DaemonSet
hcloud_csi_node_driver_registrar: "1.2.0"

# StatefulSet
hcloud_csi_attacher: "2.1.0"
hcloud_csi_provisioner: "1.5.0"
hcloud_csi_resizer: "0.3.0"

# Hetzner CSI driver
hcloud_csi_driver: "1.2.3"

Next I install the StatefulSet again:

ansible-playbook --tags=install-statefulset hetzner-csi.yml

By running

1> kubectl get statefulset -n kube-system

NAME                    READY   AGE
hcloud-csi-controller   1/1     4d22h

and

1> kubectl get pods hcloud-csi-controller-0 -n kube-system

NAME                      READY   STATUS    RESTARTS   AGE
hcloud-csi-controller-0   4/4     Running   0          2m

I can verify that it’s up and running.

The next thing to upgrade is the DaemonSet:

ansible-playbook --tags=install-daemonset hetzner-csi.yml

After a while everything should be up and running e.g.:

1> kubectl get daemonset hcloud-csi-node -n kube-system

NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
hcloud-csi-node   3         3         3       3            3           <none>          313d

and

1> kubectl get pods -l app=hcloud-csi -n kube-system

NAME                    READY   STATUS    RESTARTS   AGE
hcloud-csi-node-7rs8x   2/2     Running   0          2m
hcloud-csi-node-c8ll9   2/2     Running   0          2m
hcloud-csi-node-wr9s4   2/2     Running   2          2m

Finally the StorageClass can be updated:

ansible-playbook --tags=install-storageclass hetzner-csi.ym

Running

1> kubectl describe storageclass hcloud-volumes -n kube-system

Name:                  hcloud-volumes
IsDefaultClass:        Yes
Annotations:           storageclass.kubernetes.io/is-default-class=true
Provisioner:           csi.hetzner.cloud
Parameters:            <none>
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>

I can now see that AllowVolumeExpansion is set to true which was <unset> before.

That’s basically it. Happy upgrading! ;-)