Kubernetes the not so hard way with Ansible - Persistent storage - Part 1 - (K8s v1.18)

2019-07-11 2020-08-12 5313 words 25 minutes

CHANGELOG

2020-08-12

updated for Kubernetes v1.18
update hcloud_csi_node_driver_registrar to v1.3.0
update hcloud_csi_attacher to v2.2.0
update hcloud_csi_provisioner to v1.6.0
update hcloud_csi_driver to 1.4.0
added hcloud_csi_resizer: "0.5.0"
added hcloud_csi_livenessprobe: "1.1.0"
removed hcloud_csi_cluster_driver_registrar variable (no longer needed see CSI cluster-driver-registrar
added allowVolumeExpansion: true to StorageClass

While stateless workloads with Kubernetes is quite common now stateful workloads like databases are beginning to become more common since Container Storage Interface (CSI) was introduced. The Container Storage Interface (CSI). Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads. CSI support was introduced as alpha in Kubernetes v1.9, moved to beta in Kubernetes v1.10, and is GA in Kubernetes v1.13.

Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the CSI volume type to attach, mount, etc. the volumes exposed by the CSI driver. See Drivers section at Kubernetes CSI documentation for a list of available CSI drivers.

I’ve currently four block storage workloads that I want to migrate from host storage (which basically means mounting a Kubernetes worker node local storage into the pod via hostPath) to CSI based storage: PostgreSQL, an old CMS that has no object storage support, Redis and my Postfix mailserver.

If you can’t find a storage driver on the CSI Drivers list or if you are on-premise you can also use storage solutions like Rook or OpenEBS among others. Rook is basically an operator for Ceph which not only provides block storage but also object and file storage. I’ll cover this in the next part. OpenEBS is an open-source project for container-attached and container-native storage on Kubernetes. OpenEBS adopts Container Attached Storage (CAS) approach, where each workload is provided with a dedicated storage controller.

Luckily for Hetzner Cloud there exits a CSI driver. As I’m currently running Kubernetes 1.18 all needed feature gates for kube-apiserver and kubelet already in beta so that means that they’re enabled by default. The driver needs at least Kubernetes 1.13.

I’ve created a Ansible playbook to install all resources needed for Hetzner CSI driver.

The README of the playbook contains all information on how to install and use the playbook. It basically should just work and there should be no need for further changes. If you don’t care about further details you can basically stop reading here and just run the playbook and enjoy K8s persistence ;-)

Personally I often try to get to the bottom of things. So if you’re interested in further details read on ;-) I was curious how everything fits together, what all the pods are responsible for and so on. So this is what I found out and the information I collected about CSI and the Hetzner CSI driver in general so far. If you find any errors please let me know.

For the following K8s resources we assume that you’ve set this variable values in group_vars/all.yml (for more information see further down the text and the README of my Ansible Hetzner CSI playbook ):

hcloud_namespace: "kube-system"
hcloud_resource_prefix: "hcloud"
hcloud_is_default_class: "true"
hcloud_volume_binding_mode: "WaitForFirstConsumer"
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"

# DaemonSet:
hcloud_csi_node_driver_registrar: "1.3.0"

# StatefulSet:
hcloud_csi_attacher: "2.2.0"
hcloud_csi_provisioner: "1.6.0"
hcloud_csi_resizer: "0.5.0"
hcloud_csi_livenessprobe: "1.1.0"

# Hetzner CSI driver
hcloud_csi_driver: "1.4.0"

So let’s see what resources the Ansible playbook will install:

---
apiVersion: v1
kind: Secret
metadata:
  name: {{ hcloud_resource_prefix }}-csi
  namespace: {{ hcloud_namespace }}
stringData:
  token: {{ hcloud_csi_token }}

The first resource is a Secret. The secret contains the token you created in the Hetzner Cloud Console. Its needed for the driver to actually have the authority to interact wit the Hetzner API. The secret is called hcloud-csi by default (depends on how you set the hcloud_resource_prefix variable of course) and will be placed into the kube-system namespace by default. All {{ ... }} placeholder are of course variables that Ansible will replace during execution. So make sure to set them accordingly as mentioned in the Ansible playbook for the CSI driver.

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  namespace: {{ hcloud_namespace }}
  name: {{ hcloud_resource_prefix }}-volumes
  annotations:
    storageclass.kubernetes.io/is-default-class: "{{ hcloud_is_default_class }}"
provisioner: csi.hetzner.cloud
volumeBindingMode: {{ hcloud_volume_binding_mode }}
allowVolumeExpansion: true

A StorageClass provides a way for administrators to describe the classes of storage they offer. The name and the parameters are significant as they can’t be changed later. The name should reflect what the user can expect from this storage class. As we’ve only one storage type at Hetzner a generic name like hcloud-volumes can be used. Other storage classes for AWS EBS or Google GCP persistent disks offers additional parameters. E.g. type: pd-ssd for a GCP persistent disk would allocate a fast SSD disk instead of a standard disk. The storage class name will be later used if you create a PersistentVolumeClaim where you need to provide a storageClassName. The annotation storageclass.kubernetes.io/is-default-class: true makes this storage class the default storage class if no storage class was defined. The volumeBindingMode field controls when volume binding and dynamic provisioning should occur. The default value is Immediate. The Immediate mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created. The WaitForFirstConsumer mode which will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created. PersistentVolumes will be selected or provisioned conforming to the topology that is specified by the Pod’s scheduling constraints. These include, but are not limited to, resource requirements, node selectors, pod affinity and anti-affinity, and taints and toleration’s.

Also you can add a few parameters in the storageClass manifest that later handled by the external-provisioner (see further down). E.g. you can additionally specify

parameters:
  csi.storage.k8s.io/fstype: ext4

If the PVC VolumeMode is set to Filesystem, and the value of csi.storage.k8s.io/fstype is specified, it is used to populate the FsType in CreateVolumeRequest.VolumeCapabilities[x].AccessType and the AccessType is set to Mount.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ hcloud_resource_prefix }}-csi
  namespace: {{ hcloud_namespace }}

A ServiceAccount provides an identity for processes that run in a Pod. As you’ll see below there will be a few pods running to make CSI work. This pods contain one or more containers which contains the various CSI processes needed. As some of the processes needs to be able to receive various information from the Kubernetes API server they’ll use the service account we defined above.

But the service account also needs permissions or roles assigned what resources the service account is allowed to get from the API server. For this we need a ClusterRole:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: {{ hcloud_resource_prefix }}-csi
rules:
  # attacher
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["csi.storage.k8s.io"]
    resources: ["csinodeinfos"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["csinodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["volumeattachments"]
    verbs: ["get", "list", "watch", "update", "patch"]
  # provisioner
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete", "patch"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims", "persistentvolumeclaims/status"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots"]
    verbs: ["get", "list"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotcontents"]
    verbs: ["get", "list"]
  # node
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

Here you can see what permissions (the verbs) the various processes need to be able to retrieve the needed information from the APIs. The permissions (verbs) should be self-explanatory and the resources define what information can be accessed. Since it is a ClusterRole which is not namespaced it also allows to get information about nodes e.g. apiGroups: [""] indicates the core API group (also see API groups.

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: {{ hcloud_resource_prefix }}-csi
subjects:
  - kind: ServiceAccount
    name: {{ hcloud_resource_prefix }}-csi
    namespace: "{{ hcloud_namespace }}"
roleRef:
  kind: ClusterRole
  name: {{ hcloud_resource_prefix }}-csi
  apiGroup: rbac.authorization.k8s.io

The ClusterRoleBinding is basically the “glue” between the ServiceAccount and ClusterRole. Here we basically map the ClusterRole (and therefore its permissions) to the ServiceAccount we created above.

---
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: {{ hcloud_resource_prefix }}-csi-controller
  namespace: {{ hcloud_namespace }}
spec:
  selector:
    matchLabels:
      app: {{ hcloud_resource_prefix }}-controller
  serviceName: {{ hcloud_resource_prefix }}-controller
  replicas: 1
  template:
    metadata:
      labels:
        app: {{ hcloud_resource_prefix }}-controller
    spec:
      serviceAccount: {{ hcloud_resource_prefix }}-csi
      containers:
        - name: csi-attacher
          image: quay.io/k8scsi/csi-attacher:v{{ hcloud_csi_attacher }}
          args:
            - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
            - --v=5
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
        - name: csi-resizer
          image: quay.io/k8scsi/csi-resizer:v{{ hcloud_csi_resizer }}
          args:
            - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
            - --v=5
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
        - name: csi-provisioner
          image: quay.io/k8scsi/csi-provisioner:v{{ hcloud_csi_provisioner }}
          args:
            - --provisioner=csi.hetzner.cloud
            - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
            - --feature-gates=Topology=true
            - --v=5
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
        - name: hcloud-csi-driver
          image: hetznercloud/hcloud-csi-driver:{{ hcloud_csi_driver }}
          imagePullPolicy: Always
          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
            - name: METRICS_ENDPOINT
              value: 0.0.0.0:9189
            - name: HCLOUD_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hcloud-csi
                  key: token
          volumeMounts:
            - name: socket-dir
              mountPath: /var/lib/csi/sockets/pluginproxy/
          ports:
            - containerPort: 9189
              name: metrics
            - name: healthz
              containerPort: 9808
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            timeoutSeconds: 3
            periodSeconds: 2
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
        - name: liveness-probe
          imagePullPolicy: Always
          image: quay.io/k8scsi/livenessprobe:v{{ hcloud_csi_livenessprobe }}
          args:
            - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
          volumeMounts:
            - mountPath: /var/lib/csi/sockets/pluginproxy/
              name: socket-dir
      volumes:
        - name: socket-dir
          emptyDir: {}

Next we’ve a StatefulSet called hcloud-csi-controller. Like a Deployment , a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling. In the specification above the pod consists of four containers and we’ve only one replica that means that StatefulSet will scheduled only on one of the worker nodes. This is normally sufficient for smaller K8s cluster.

So if you deploy the manifest above later you’ll see something like this (I deployed everything to kube-system namespace so you may change to that namespace first or specifying one via -n flag):

kubectl get statefulsets -o wide

NAME                    READY   AGE    CONTAINERS                                                                  IMAGES
hcloud-csi-controller   1/1     2m4s   csi-attacher,csi-resizer,csi-provisioner,hcloud-csi-driver,liveness-probe   quay.io/k8scsi/csi-attacher:v2.2.0,quay.io/k8scsi/csi-resizer:v0.5.0,quay.io/k8scsi/csi-provisioner:v1.6.0,hetznercloud/hcloud-csi-driver:1.4.0,quay.io/k8scsi/livenessprobe:v1.1.0

As you can see the StatefulSet consists of a pod with five containers: csi-attacher, csi-provisioner, csi-resizer, hcloud-csi-driver, livenessprobe.

As we specified that we want one replica for the StatefulSet we should now see at least one pod with the five containers running (in fact you’ll see other pods that start with hcloud prefix but we don’t care about them yet):

kubectl get pods -o wide | grep hcloud-csi-controller

NAME                               READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
hcloud-csi-controller-0            5/5     Running   0          4m27s   10.200.0.100   worker02   <none>           <none>

As expected there is now a pod called hcloud-csi-controller-0 running on worker02. StatefulSet pods have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage. The identity sticks to the Pod, regardless of which node it’s (re)scheduled on. So in our case the pods name consists of the StatefulSet’s name plus an integer ordinal which starts at 0 for the first pod. If we have a look inside the pod we’ll again see that it consists of the four container that were specified:

kubectl get pod hcloud-csi-controller-0 -o custom-columns='CONTAINER:.spec.containers[*].name'

CONTAINER
csi-attacher,csi-resizer,csi-provisioner,hcloud-csi-driver,liveness-probe

So let’s have some fun and delete pod hcloud-csi-controller-0:

kubectl delete pod hcloud-csi-controller-0

pod "hcloud-csi-controller-0" deleted

And after a while the pod gets recreated:

kubectl get pods -o wide | grep hcloud-csi-controller

hcloud-csi-controller-0            5/5     Running   0          1m27s   10.200.0.100   worker02   <none>           <none

Unlike normal pods the name doesn’t change even if you delete the pod. For some applications like databases e.g. that’s quite useful as this means that also the DNS entry won’t change.

Before we figure out what the containers are doing lets have a look at the last part of the whole CSI deployment and that’s a DaemonSet (the Ansible playbook installed this one too of course). A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. So here is the specification:

---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: {{ hcloud_resource_prefix }}-csi-node
  namespace: {{ hcloud_namespace }}
  labels:
    app: {{ hcloud_resource_prefix }}-csi
spec:
  selector:
    matchLabels:
      app: {{ hcloud_resource_prefix }}-csi
  template:
    metadata:
      labels:
        app: {{ hcloud_resource_prefix }}-csi
    spec:
      tolerations:
        - effect: NoExecute
          operator: Exists
        - effect: NoSchedule
          operator: Exists
        - key: CriticalAddonsOnly
          operator: Exists
      serviceAccount: {{ hcloud_resource_prefix }}-csi
      containers:
        - name: csi-node-driver-registrar
          image: quay.io/k8scsi/csi-node-driver-registrar:v{{ hcloud_csi_node_driver_registrar }}
          args:
            - --v=5
            - --csi-address=/csi/csi.sock
            - --kubelet-registration-path={{ k8s_worker_kubelet_conf_dir }}/plugins/csi.hetzner.cloud/csi.sock
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: plugin-dir
              mountPath: /csi
            - name: registration-dir
              mountPath: /registration
          securityContext:
            privileged: true
        - name: hcloud-csi-driver
          image: hetznercloud/hcloud-csi-driver:{{ hcloud_csi_driver }}
          imagePullPolicy: Always
          env:
            - name: CSI_ENDPOINT
              value: unix:///csi/csi.sock
            - name: METRICS_ENDPOINT
              value: 0.0.0.0:9189
            - name: HCLOUD_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hcloud-csi
                  key: token
          volumeMounts:
            - name: kubelet-dir
              mountPath: {{ k8s_worker_kubelet_conf_dir }}
              mountPropagation: "Bidirectional"
            - name: plugin-dir
              mountPath: /csi
            - name: device-dir
              mountPath: /dev
          securityContext:
            privileged: true
          ports:
            - containerPort: 9189
              name: metrics
            - name: healthz
              containerPort: 9808
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            timeoutSeconds: 3
            periodSeconds: 2
        - name: liveness-probe
          imagePullPolicy: Always
          image: quay.io/k8scsi/livenessprobe:v{{ hcloud_csi_livenessprobe }}
          args:
            - --csi-address=/csi/csi.sock
          volumeMounts:
            - mountPath: /csi
              name: plugin-dir
      volumes:
        - name: kubelet-dir
          hostPath:
            path: {{ k8s_worker_kubelet_conf_dir }}
            type: Directory
        - name: plugin-dir
          hostPath:
            path: {{ k8s_worker_kubelet_conf_dir }}/plugins/csi.hetzner.cloud/
            type: DirectoryOrCreate
        - name: registration-dir
          hostPath:
            path: {{ k8s_worker_kubelet_conf_dir }}/plugins_registry/
            type: Directory
        - name: device-dir
          hostPath:
            path: /dev
            type: Directory

So let’s see how this looks like when the DaemonSet is deployed:

kubectl get daemonset -o wide | grep hcloud

NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE    CONTAINERS                                    IMAGES                                                                                 SELECTOR
hcloud-csi-node              1         1         1       1            1           kubernetes.io/hostname=worker02   7m38s   csi-node-driver-registrar,hcloud-csi-driver,liveness-probe   quay.io/k8scsi/csi-node-driver-registrar:v1.3.0,hetznercloud/hcloud-csi-driver:1.4.0,quay.io/k8scsi/livenessprobe:v1.1.0   app=hcloud-csi

Lets see what pods we have:

kubectl get pods -o wide | grep hcloud-csi-node

hcloud-csi-node-zl6z7      3/3     Running   0          8m58s   10.200.0.249   worker02   <none>           <none>

As you can see there is one pod with three containers running on every node now.

We can also get a list of CSI drivers we’ve installed (which is of course the Hetzner CSI driver at the moment if you don’t have other CSI drivers installed):

kubectl get CSIDriver

NAME                ATTACHREQUIRED   PODINFOONMOUNT   MODES        AGE
csi.hetzner.cloud   true             true             Persistent   12m

CSI drivers generate node specific information. Instead of storing this in the Kubernetes Node API Object (which we can query with kubectl describe node, a new CSI specific Kubernetes CSINode object was created. With kubectl describe csinodes (or fully qualified kubectl describe csinodes.storage.k8s.io we can get a little bit more information about our CSINode’s:

kubectl describe csinodes

Name:               worker02
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Sat, 08 Aug 2020 23:58:45 +0200
Spec:
  Drivers:
    csi.hetzner.cloud:
      Allocatables:
        Count:        16
      Node ID:        6956103
      Topology Keys:  [csi.hetzner.cloud/location]
Events:               <none>

Now let’s try creating a PersistentVolumeClaim. A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV (persistent volume) resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).

While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs there is the StorageClass resource. But again at Hetzner currently only one StorageClass exists.

Let’s have a look at the log output of the containers in the StatefulSet if we create a PersistentVolumeClaim. You can either watch all logs at once or each container individually:

# Watch all container logs
kubectl logs -f --since=1s hcloud-csi-controller-0 --all-containers

# OR every container log separately:
kubectl logs -f --since=1s hcloud-csi-controller-0 csi-attacher
kubectl logs -f --since=1s hcloud-csi-controller-0 csi-provisioner
kubectl logs -f --since=1s hcloud-csi-controller-0 csi-cluster-driver-registrar
kubectl logs -f --since=1s hcloud-csi-controller-0 hcloud-csi-driver

Here is the manifest for the PersistentVolumeClaim we want to create (10Gi is the smallest size possible):

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: hcloud-volumes

Save the content above to a file called pvc.yml e.g. The name of the PVC will be csi-pvc. You need this name in the next step when we create a pod that consumes the volume. Make sure that the storageClassName matches the name of the storageClass defined above. Basically you can also remove it here as we specified the annotation storageclass.kubernetes.io/is-default-class: true above and therefore the hcloud-volumes storageClass would have taken anyways. accessModes ReadWriteOnce means the volume can be mounted as read-write by a single node.

Now create the PVC with kubectl create -f pvc.yml. If you now have a look at the logs you will see that only the csi-provisioner created some output:

kubectl logs -f --since=50s hcloud-csi-controller-0 --all-containers

I0812 21:57:35.068229       1 reflector.go:432] k8s.io/client-go/informers/factory.go:135: Watch close - *v1.PersistentVolume total 0 items received
I0812 21:57:54.111686       1 reflector.go:432] k8s.io/client-go/informers/factory.go:135: Watch close - *v1beta1.CSINode total 0 items received
I0812 21:57:37.131925       1 controller.go:225] Started PVC processing "kube-system/csi-pvc"
I0812 21:57:37.131968       1 controller.go:244] No need to resize PVC "kube-system/csi-pvc"

So not really much happened ;-) You’ll see in your Hetzner Cloud console that no volume was created yet. The reason for that is that we specified volumeBindingMode: WaitForFirstConsumer for the StorageClass. So as long as no pod requests the volume it won’t be created. So if you want your pods to startup more quickly you may reconsider this setting or simply create an additional StorageClass with a different setting.

To finally create the volume we need a consumer which means a pod in our case. So let’s create one:

---
kind: Pod
apiVersion: v1
metadata:
  name: my-csi-app
spec:
  containers:
    - name: my-frontend
      image: busybox
      volumeMounts:
      - mountPath: "/data"
        name: my-csi-volume
      command: [ "sleep", "1000000" ]
  volumes:
    - name: my-csi-volume
      persistentVolumeClaim:
        claimName: csi-pvc

Save the content above in a file called pod.yml e.g. This specification will create a pod called my-csi-app. It will launch a container called my-frontend using the busybox container image. This container image is quite small so the pod should become ready quite quickly. You also see that we want that container to have a volumeMount at /data. So our PVC will be available at /data later. The name of the volume is my-csi-volume. So far we haven’t specified the PVC anywhere. For this we need volumes (you can of course mount different volumes into a container). Now the two name directives in the container and volumes specification need to match. And finally we specify the persistentVolumeClaim which has the claimName: csi-pvc of course to match the name of the PVC we created above.

Now we can create the pod:

kubectl create -f pod.yml

A few seconds later we should see that the pod is ready. This is normally bad practice but since we don’t care about that let’s log into the new container:

kubectl exec -it my-csi-app sh

df -h /data/

Filesystem                Size      Used Available Use% Mounted on
/dev/disk/by-id/scsi-0HC_Volume_2816975
                          9.8G     36.0M      9.7G   0% /data

And there it is the new volume! :-) You can now create a file on that new volume, log out, delete the pod and re-create it. If you now login again you’ll see that the file that you just created is still there. You now also see the volume at Hetzner Cloud console. At the console you’ll also see that the volume was attached to the node where the pod was scheduled.

You can even see at the node systemd journal that the volume was attached to the node e.g.:

ansible -m command -a 'journalctl --since=-1h | grep "kernel: s"' worker02

Jun 30 23:00:38 worker01 kernel: scsi 2:0:0:1: Direct-Access     HC       Volume           2.5+ PQ: 0 ANSI: 5
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: Power-on or device reset occurred
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: Attached scsi generic sg2 type 0
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: [sdb] Write Protect is off
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: [sdb] Mode Sense: 63 00 00 08
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jun 30 23:00:38 worker01 kernel: sd 2:0:0:1: [sdb] Attached SCSI disk

Next we’ve the logs for the csi-attacher:

I0630 21:00:35.702906       1 controller.go:175] Started VA processing "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.702998       1 csi_handler.go:87] CSIHandler: processing VA "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.703021       1 csi_handler.go:114] Attaching "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.703040       1 csi_handler.go:253] Starting attach operation for "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.703266       1 csi_handler.go:214] Adding finalizer to PV "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189"
I0630 21:00:35.716703       1 csi_handler.go:222] PV finalizer added to "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189"
I0630 21:00:35.716776       1 csi_handler.go:509] Found NodeID 123456 in CSINode worker01
I0630 21:00:35.716957       1 csi_handler.go:175] VA finalizer added to "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.716980       1 csi_handler.go:189] NodeID annotation added to "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:35.718444       1 controller.go:205] Started PV processing "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189"
I0630 21:00:35.718493       1 csi_handler.go:412] CSIHandler: processing PV "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189"
I0630 21:00:35.718511       1 csi_handler.go:416] CSIHandler: processing PV "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189": no deletion timestamp, ignoring
I0630 21:00:35.725351       1 csi_handler.go:199] VolumeAttachment "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed" updated with finalizer and/or NodeID annotation
I0630 21:00:35.725409       1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I0630 21:00:35.725417       1 connection.go:181] GRPC request: {"node_id":"123456","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1561841327391-8081-csi.hetzner.cloud"},"volume_id":"2816975"}
I0630 21:00:37.666022       1 reflector.go:370] k8s.io/client-go/informers/factory.go:133: Watch close - *v1beta1.VolumeAttachment total 2 items received
I0630 21:00:39.121391       1 connection.go:183] GRPC response: {}
I0630 21:00:39.122173       1 connection.go:184] GRPC error: <nil>
I0630 21:00:39.122183       1 csi_handler.go:127] Attached "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.122193       1 util.go:32] Marking as attached "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152060       1 util.go:42] Marked as attached "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152098       1 csi_handler.go:133] Fully attached "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152140       1 csi_handler.go:103] CSIHandler: finished processing "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152186       1 controller.go:175] Started VA processing "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152225       1 csi_handler.go:87] CSIHandler: processing VA "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"
I0630 21:00:39.152252       1 csi_handler.go:109] "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed" is already attached
I0630 21:00:39.152265       1 csi_handler.go:103] CSIHandler: finished processing "csi-c5ad589c45b49b4a688fdb2235f92497f853d14ff1de408ff3cdc27fc505b6ed"

The external-attacher is an external controller that monitors VolumeAttachment objects created by controller-manager and attaches/detaches volumes to/from nodes (i.e. calls ControllerPublish/ControllerUnpublish). And indeed we see messages like Started VA processing ... (VA -> VolumeAttachment) or GRPC call: /csi.v1.Controller/ControllerPublishVolume. If you have closer look you also see {"fs_type":"ext4"}. So the volume will be formatted as ext4 filesystem by default.

If you’re wondering why this is called external-attacher: The external-attacher is a sidecar container that attaches volumes to nodes by calling ControllerPublish and ControlerUnpublish functions of CSI drivers. It is necessary because internal Attach/Detach controller running in Kubernetes controller-manager does not have any direct interfaces to CSI drivers.

BTW: There is a matrix which version of the container is recommended for which Kubernetes version. It can be found here.

Next the logs for csi-provisioner:

I0630 21:00:32.655824       1 controller.go:1196] provision "kube-system/csi-pvc" class "hcloud-volumes": started
I0630 21:00:32.682110       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"csi-pvc", UID:"0cf5cabc-9b7a-11e9-b3f4-9600000d4189", APIVersion:"v1", ResourceVersion:"42222536", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "kube-system/csi-pvc"
I0630 21:00:32.705535       1 controller.go:442] CreateVolumeRequest {Name:pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"csi.hetzner.cloud/location" value:"fsn1" > > preferred:<segments:<key:"csi.hetzner.cloud/location" value:"fsn1" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0630 21:00:32.705801       1 connection.go:180] GRPC call: /csi.v1.Controller/CreateVolume
I0630 21:00:32.705817       1 connection.go:181] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"csi.hetzner.cloud/location":"fsn1"}}],"requisite":[{"segments":{"csi.hetzner.cloud/location":"fsn1"}}]},"capacity_range":{"required_bytes":10737418240},"name":"pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0630 21:00:35.405281       1 connection.go:183] GRPC response: {"volume":{"accessible_topology":[{"segments":{"csi.hetzner.cloud/location":"fsn1"}}],"capacity_bytes":10737418240,"volume_id":"2816975"}}
I0630 21:00:35.416118       1 connection.go:184] GRPC error: <nil>
I0630 21:00:35.416148       1 controller.go:486] create volume rep: {CapacityBytes:10737418240 VolumeId:2816975 VolumeContext:map[] ContentSource:<nil> AccessibleTopology:[segments:<key:"csi.hetzner.cloud/location" value:"fsn1" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0630 21:00:35.416238       1 controller.go:558] successfully created PV {GCEPersistentDisk:nil AWSElasticBlockStore:nil HostPath:nil Glusterfs:nil NFS:nil RBD:nil ISCSI:nil Cinder:nil CephFS:nil FC:nil Flocker:nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil PortworxVolume:nil ScaleIO:nil Local:nil StorageOS:nil CSI:&CSIPersistentVolumeSource{Driver:csi.hetzner.cloud,VolumeHandle:2816975,ReadOnly:false,FSType:ext4,VolumeAttributes:map[string]string{storage.kubernetes.io/csiProvisionerIdentity: 1561841327391-8081-csi.hetzner.cloud,},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,}}
I0630 21:00:35.416349       1 controller.go:1278] provision "kube-system/csi-pvc" class "hcloud-volumes": volume "pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189" provisioned
I0630 21:00:35.416379       1 controller.go:1295] provision "kube-system/csi-pvc" class "hcloud-volumes": succeeded
I0630 21:00:35.416389       1 volume_store.go:147] Saving volume pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189
I0630 21:00:35.437775       1 volume_store.go:150] Volume pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 saved

The external-provisioner is an external controller that monitors PersistentVolumeClaim objects created by user and creates/deletes volumes for them. The external-provisioner is a sidecar container that dynamically provisions volumes by calling ControllerCreateVolume and ControllerDeleteVolume functions of CSI drivers.

In the beginning of the logs we see an Event (Event(v1.ObjectReference{Kind:"PersistentVolumeClaim"...) which triggers a CreateVolumeRequest which causes GRPC call: /csi.v1.Controller/CreateVolume to be called. You’ll also see object parameter like "csi.hetzner.cloud/location":"fsn1". This is needed for Topology awareness. The value fsn1 in this case means the Hetzner data center located at Falkenstein. As Hetzner has different data center it of course makes sense to have the storage in the same data center as the pods ;-).

And finally the logs of the last container which is hcloud-csi-driver which is part of the StatefulSet:

level=debug ts=2019-06-30T21:00:32.720427572Z component=grpc-server msg="handling request" req="name:\"pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189\" capacity_range:<required_bytes:10737418240 > volume_capabilities:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > accessibility_requirements:<requisite:<segments:<key:\"csi.hetzner.cloud/location\" value:\"fsn1\" > > preferred:<segments:<key:\"csi.hetzner.cloud/location\" value:\"fsn1\" > > > "
level=info ts=2019-06-30T21:00:32.72228972Z component=idempotent-volume-service msg="creating volume" name=pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 min-size=10 max-size=0 location=fsn1
level=info ts=2019-06-30T21:00:32.722340541Z component=api-volume-service msg="creating volume" volume-name=pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 volume-size=10 volume-location=fsn1
level=info ts=2019-06-30T21:00:35.401583756Z component=idempotent-volume-service msg="volume created" volume-id=2816975
level=info ts=2019-06-30T21:00:35.402348851Z component=driver-controller-service msg="created volume" volume-id=2816975 volume-name=pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189
level=debug ts=2019-06-30T21:00:35.403048521Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T21:00:35.728688237Z component=grpc-server msg="handling request" req="volume_id:\"2816975\" node_id:\"898689\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1561841327391-8081-csi.hetzner.cloud\" > "
level=info ts=2019-06-30T21:00:35.728781078Z component=api-volume-service msg="attaching volume" volume-id=2816975 server-id=898689
level=debug ts=2019-06-30T21:00:39.120803804Z component=grpc-server msg="finished handling request"

That one is basically coordinating the volume creation process. That were the container that are part the the StatefulSet.

Now lets see what the DaemonSet has to offer. In our case the DaemonSet resources gets deployed on every worker node which is basically a pod with two containers.

We can fetch the logs like

1> kubectl logs hcloud-csi-node-h824m csi-node-driver-registrar

and

1> kubectl logs hcloud-csi-node-h824m hcloud-csi-driver

So have a look at the csi-node-driver-registrar logs:

I0630 20:57:40.840045       1 main.go:110] Version: v1.1.0-0-g80a94421
I0630 20:57:40.840124       1 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0630 20:57:40.840143       1 connection.go:151] Connecting to unix:///csi/csi.sock
I0630 20:57:45.125805       1 main.go:127] Calling CSI driver to discover driver name
I0630 20:57:45.125841       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0630 20:57:45.125850       1 connection.go:181] GRPC request: {}
I0630 20:57:45.129031       1 connection.go:183] GRPC response: {"name":"csi.hetzner.cloud","vendor_version":"1.1.4"}
I0630 20:57:45.130010       1 connection.go:184] GRPC error: <nil>
I0630 20:57:45.130023       1 main.go:137] CSI driver name: "csi.hetzner.cloud"
I0630 20:57:45.130410       1 node_register.go:54] Starting Registration Server at: /registration/csi.hetzner.cloud-reg.sock
I0630 20:57:45.130521       1 node_register.go:61] Registration Server started at: /registration/csi.hetzner.cloud-reg.sock
I0630 20:57:45.132094       1 main.go:77] Received GetInfo call: &InfoRequest{}
I0630 20:57:45.169465       1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

The CSI node-driver-registrar is a sidecar container that fetches driver information (using NodeGetInfo) from a CSI endpoint and registers it with the kubelet on that node using the kubelet plugin registration mechanism. This is necessary because kubelet is responsible for issuing CSI NodeGetInfo, NodeStageVolume, NodePublishVolume calls. The node-driver-registrar registers your CSI driver with kubelet so that it knows which Unix domain socket to issue the CSI calls on.

And finally we’ve the logs from hcloud-csi-driver:

level=debug ts=2019-06-30T20:57:44.383529379Z msg="getting instance id from metadata service"
level=debug ts=2019-06-30T20:57:44.389617131Z msg="fetching server"
level=info ts=2019-06-30T20:57:44.762089045Z msg="fetched server" server-name=worker01
level=debug ts=2019-06-30T20:57:45.128497624Z component=grpc-server msg="handling request" req=
level=debug ts=2019-06-30T20:57:45.128599355Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T20:57:45.137641968Z component=grpc-server msg="handling request" req=
level=debug ts=2019-06-30T20:57:45.137719407Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T21:00:43.431530684Z component=grpc-server msg="handling request" req=
level=debug ts=2019-06-30T21:00:43.431721975Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T21:00:43.449264791Z component=grpc-server msg="handling request" req="volume_id:\"2816975\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1561841327391-8081-csi.hetzner.cloud\" > "
level=debug ts=2019-06-30T21:00:43.536347081Z component=linux-mount-service msg="staging volume" volume-name=pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 staging-target-path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount fs-type=ext4
E0630 21:00:43.540339       1 mount_linux.go:151] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_2816975 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_2816975 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount failed: Invalid argument
level=debug ts=2019-06-30T21:00:44.344371915Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T21:00:44.349042447Z component=grpc-server msg="handling request" req=
level=debug ts=2019-06-30T21:00:44.349115338Z component=grpc-server msg="finished handling request"
level=debug ts=2019-06-30T21:00:44.372982401Z component=grpc-server msg="handling request" req="volume_id:\"2816975\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount\" target_path:\"/var/lib/kubelet/pods/196815b9-9b7a-11e9-b3f4-9600000d4189/volumes/kubernetes.io~csi/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1561841327391-8081-csi.hetzner.cloud\" > "
level=debug ts=2019-06-30T21:00:44.431227055Z component=linux-mount-service msg="publishing volume" volume-name=pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189 target-path=/var/lib/kubelet/pods/196815b9-9b7a-11e9-b3f4-9600000d4189/volumes/kubernetes.io~csi/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/mount staging-target-path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0cf5cabc-9b7a-11e9-b3f4-9600000d4189/globalmount fs-type=ext4 readonly=false additional-mount-options="unsupported value type"
level=debug ts=2019-06-30T21:00:44.43677855Z component=grpc-server msg="finished handling request"

We already used the hcloud-csi-driver container image in the StatefulSet and now its used again in the DaemonSet. But here different GRPC entrypoints of the driver are used. The Kubernetes kubelet runs on every node and is responsible for making the CSI Node service calls. These calls mount and unmount the storage volume from the storage system, making it available to the pod to consume. kubelet makes calls to the CSI driver through a UNIX domain socket shared on the host via a HostPath volume. There is also a second UNIX domain socket that the node-driver-registrar uses to register the CSI driver to kubelet.

If you have a look at your worker nodes you’ll find /var/lib/kubelet/plugins/csi.hetzner.cloud/csi.sock which is the socket mentioned above which kubelet and CSI driver are sharing (granted that your kubelet config directory is /var/lib/kubelet/ of course). csi.hetzner.cloud is the CSI plugin name in this case.

Now finally to a actual use-case I had. I have a blog server called Apache Roller. It packaged as .war file and runs with good old Apache Tomcat. As I needed persistent storage where Roller can store assets like uploaded images e.g. I used hostPath. A hostPat volume mounts a file or directory from the host node’s filesystem into your Pod. That’s of course nothing that you normally should do but was the only option for me at that time. So the containers spec in Deployment looked like this:

apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
  replicas: 1
  selector:
      ...
  template:
    metadata:
      ...
    spec:
      volumes:
      - hostPath:
          path: /opt/roller/logs
        name: logs
      - hostPath:
          path: /opt/roller/rolleridx
        name: rolleridx
      - hostPath:
          path: /opt/roller/docs/resources
        name: resources
      containers:
      - name: roller
        image: your-docker-registry:5000/roller:5.2.3
        ...
        volumeMounts:
        - mountPath: /usr/local/tomcat/logs
          name: logs
        - mountPath: /usr/local/tomcat/rolleridx
          name: rolleridx
        - mountPath: /usr/local/tomcat/resources
          name: resources

As you can see this config will mount three host directories (volumes) “into” the pod (volumeMounts). Now I needed to migrate the data over to the new CSI volume. There’re a few ways to do this but I decided to do it this way: First create a persistentVolume with the help of CSI and mount it temporary to /data in the pod. Then create the directories needed once after logging into the container manually and copy the data into that directories.

So for the Deployment I needed to adjust spec.template.spec. I added this configuration:

volumes:
- name: data
  persistentVolumeClaim:
    claimName: pvc

Also in spec.template.containers.volumeMounts I needed to mount the new volume:

volumeMounts:
- mountPath: /data
  name: data

The change needs to be applied with kubectl apply -f deployment.yml.

Lucky me in my case I used a Tomcat container that is based on Debian. So I had at least a few common commands like cp, mkdir, chown and bash already installed. So I logged into the pod via

kubectl exec -it tomcat-xxxxxxxx-xxxxx bash

Then I realized that my shiny new /data mount was owned by root and also the group root:

ls -al /
...
drwxr-xr-x   6 root root 4096 Jul 17 20:27 data
...

That was bad as Tomcat was running with the permissions of user www-data (and the same group). The www-data group had the id 33. What’s needed in this case is a securityContext for the deployment. And this looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
  replicas: 1
  selector:
      ...
  template:
    metadata:
      ...
    spec:
      securityContext:
        fsGroup: 33
...

As already mentioned 33 is the group id of the group www-data which is also used by the Tomcat process. Now if I deploy the new setting it looks like this:

ls -al /
...
drwxrwsr-x   6 root www-data 4096 Jul 17 20:27 data
...

Much better :-) So now I was able to create the directories like /data/resources and copied the data from the old to the new mount e.g. cd data; cp --archive /usr/local/tomcat/resources .. The --archive options includes --preserve which preserve the specified attributes (default: mode,ownership,timestamps), if possible additional attributes: context, links, xattr, all (a little bit like rsync -av ...).

Now that everything is copied to the new directories I changed the Deployment manifest accordingly so that the CSI volumes were used and removed the hostPath entries. Now the pods are no longer “tied” to a specific Kubernetes worker node as it was the case with HostPath.

So this blog post is quite long and if you still read: Congratulations! ;-)