Kubernetes upgrade notes: 1.31.x to 1.32.x

If you used my Kubernetes the Not So Hard Way With Ansible blog posts to setup a Kubernetes (K8s) cluster this notes might be helpful for you (and maybe for others too that manage a K8s cluster on their own e.g.). I’ll only mention changes that might be relevant because they will either be interesting for most K8s administrators anyways (even in case they run a fully managed Kubernetes deployment) or if it’s relevant if you manage your own bare-metal/VM based on-prem Kubernetes deployment. I normally skip changes that are only relevant for GKE, AWS EKS, Azure or other cloud providers.

I’ve a general upgrade guide Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes that worked quite well for me for the last past K8s upgrades. So please read that guide if you want to know HOW the components are updated. This post here is esp. for the 1.31.x to 1.32.x upgrade and WHAT was interesting for me.

As usual I don’t update a production system before the .2 release of a new major version is released. In my experience the .0 and .1 are just too buggy (well, it got way better meanwhile but you don’t want do experiments in production, right? 😉). Nevertheless it’s important to test new releases (and even beta or release candidates if possible) already in development environments and report bugs!

I only upgrade from the latest version of the former major release. At the time writing this blog post 1.31.11 was the latest 1.31.x release. After reading the 1.31 CHANGELOG to figure out if any important changes where made between the current 1.31.x and latest 1.31.11 release I didn’t see anything that prevented me updating and I don’t needed to change anything.

So I did the 1.31.11 update first. If you use my Ansible roles that basically only means to change k8s_ctl_release variable from 1.31.x to 1.31.11 (for the controller nodes) and the same for k8s_worker_release (for the worker nodes). Deploy the changes for the control plane and worker nodes as described in my upgrade guide.

Hint: To save some time, IMHO it should be good enough to only update the controller nodes to the latest 1.31.x release as it’s mostly the kube-apiserver that stores the state of the Kubernetes cluster in etcd and that state is quite important. That’s what I normally do. Upgrading to the next major release can then be done for all nodes as usual. But if you want to be absolutely sure just upgrade the whole cluster to the latest 1.31.x release first.

After that everything still worked as expected, I continued with the next step.

As it’s normally no problem (and actually the supported method) to have a newer kubectl utility that is only one major version ahead of the server version I updated kubectl from 1.31.x to latest 1.32.x using my kubectl Ansible role.

This time I did quite a few updates of various components together with the K8s 1.32.x release. As this changes quite a lot on the worker nodes, it makes sense to Safely Drain a Node so that the workload gets migrated to other nodes. Then the software can be safely upgraded on the drained node. Maybe also have a look at my blog post Upgrading Kubernetes for further information.

While my roles are not using kubeadm to manage my K8s cluster, it’s recommended to have at least etcd 3.5.16 running. I updated my etcd role to current 3.5.22 and updated my etcd deployment accordingly. See Upgrading Kubernetes - etcd for more information how to upgrade etcd. Also Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache.

Note: There is already etcd 3.6.x available. But for K8s 1.32 I’ll stay with etcd 3.5 for now. etcd 3.6.x has some breaking changes that needs to be addressed first.

containerd was updated from 2.0.2 to 2.1.3. I updated my containerd role accordingly. Please read the CHANGELOG for potential breaking changes. From my experience the upgrade “just works”.

containerd >= 2.0 is also a prerequisite to use user namespace isolation which got enabled by default as of Kubernetes 1.33 (the next release).

runc was upgraded from 1.2.4 to 1.3.0. I’ve updated my runc role accordingly. Release notes for runc are in runc releases but shouldn’t be that interesting.

Finally with runc >= 1.2 and containerd >= 2.0 (as mentioned above) User Namespaces are finally supported. Together with next Kubernetes release 1.33 user namespace isolation finally arrived out of the box.

And finally the CNI plugins are updated from 1.6.2 to 1.7.1. Again I updated my CNI role accordingly. The release notes for CNI 1.7.1 might be worth a read but only if you want to go deeper 😉

Since K8s 1.14 there are also searchable release notes available. You can specify the K8s version and a K8s area/component (e.g. kubelet, apiserver, …) and immediately get an overview what changed in that regard. Quite nice! 😉

I guess most users wont be affected by any Urgent Upgrade Notes. It’s basically only:

  • ServiceAccount metadata.annotations[kubernetes.io/enforce-mountable-secrets]: deprecated since v1.32
  • Reverted the DisableNodeKubeProxyVersion feature gate to default-off

All important stuff is listed in the Kubernetes v1.32: Penelope release announcement. There was also a Kubernetes v1.32 sneak peek.

The following list of changes and features only contains stuff that I found useful and interesting. That means I’m normally not mentioning any Kubernetes internals that have changed but mostly stuff that is interesting for administrators and operation. This is mainly to remember myself what changed 😉 See the full Kubernetes v1.32 Changelog for all changes.

  • Annotation batch.kubernetes.io/cronjob-scheduled-timestamp added to Job objects scheduled from CronJobs is promoted to stable
  • Field status.hostIPs added for Pod: Applications that originally used IPv4 migrated to IPv6 during the dual-stack transition phase, For smooth migration, IP-related attributes should have both IPv4 and IPv6. See KEP-2681: Field status.hostIPs added for Pod
  • Support to size memory backed volumes: Kubernetes supports emptyDir volumes whose backing storage is memory (i.e. tmpfs). The size of this memory backed volume is defaulted to 50% of the memory on a Linux host. The coupling of default memory backed volume size with the host that runs the pod makes pod definitions less portable across node instance types and providers. This impacts workloads that make heavy use of /dev/shm or other use cases oriented around memory backed volume usage (AI/ML, etc.). See KEP-1967: Sizable memory backed volumes
  • Add support for a drop-in kubelet configuration directory. A common pattern for software configuration in linux is support for a drop-in configuration directory. The location of this directory is often based on a corresponding configuration file. For instance, /etc/security/limits can be overridden by files in /etc/security/limits.d. This pattern is useful for a number of reasons, though a large motivation here is to allow files to be owned by a single owner. If multiple processes are vying for changing the same file, then they could stamp over each other’s changes and possibly race against each other, creating TOCTOU problems. See KEP-3983: Add support for a drop-in kubelet configuration directory
  • Kubelet OpenTelemetry Tracing. This Kubernetes Enhancement Proposal (KEP) is to enhance the kubelet to allow tracing gRPC and HTTP API requests. The kubelet is the integration point of a node’s operating system and kubernetes. From control plane-node communication documentation, a primary communication path from the control plane to the nodes is from the apiserver to the kubelet process running on each node. The kubelet communicates with the container runtime over gRPC, where kubelet is the gRPC client and the Container Runtime Interface is the gRPC server. The CRI then sends the creation request to the container runtime installed on the node. Traces gathered from the kubelet will provide critical information to monitor and troubleshoot interactions at the node level. This KEP proposes using OpenTelemetry libraries, and exports in the OpenTelemetry format. See KEP-2831: Kubelet Tracing
  • Custom Resource field selectors
  • Bound service account token improvement The inclusion of the node name in the service account token claims allows users to use such information during authorization and admission (ValidatingAdmissionPolicy).
  • Structured authorization configuration
  • Always Honor PersistentVolume Reclaim Policy. Reclaim policy associated with the Persistent Volume is currently ignored under certain circumstance. For a Bound PV-PVC pair the ordering of PV-PVC deletion determines whether the PV delete reclaim policy is honored. The PV honors the reclaim policy if the PVC is deleted prior to deleting the PV, however, if the PV is deleted prior to deleting the PVC then the Reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the external infrastructure is not removed. See KEP-2644: Honor Persistent Volume Reclaim Policy
  • KEP-3104: Introduce kuberc - This proposal introduces an optional kuberc file that is used to separate cluster credentials and server configuration from user preferences.
  • Pod-level resource specifications. Currently resource allocation on PodSpec is container-centric, allowing users to specify resource requests and limits for each container. The scheduler uses the aggregate of the requested resources by all containers in a pod to find a suitable node for scheduling the pod. The kubelet then enforces these resource constraints by translating the requests and limits into corresponding cgroup settings both for containers and for the pod (where pod level values are aggregates of container level values derived using the formula in KEP#753). The existing Pod API lacks the ability to set resource constraints at pod level, limiting the flexibility and ease of resource management for pods as a whole. This limitation is particularly problematic when users want to focus on controlling the overall resource consumption of a pod without the need to meticulously configure resource specifications for each individual container in it. See KEP-2837: Pod Level Resource Specifications
  • New statusz and flagz endpoints for core components You can enable two new HTTP endpoints, /statusz and /flagz, for core components. These enhance cluster debuggability by gaining insight into what versions (e.g. Golang version) that component is running as, along with details about its uptime, and which command line flags that component was executed with; making it easier to diagnose both runtime and configuration issues. Also see Added a /flagz endpoint for kube-apiserver endpoint
  • ACTION REQUIRED for custom scheduler plugin developers: PodEligibleToPreemptOthers in the preemption interface now includes ctx in the parameters. Please update your plugins’ implementation accordingly.
  • Annotation batch.kubernetes.io/cronjob-scheduled-timestamp added to Job objects scheduled from CronJobs is promoted to stable
  • A new feature that allows unsafe deletion of corrupt resources has been added, it is disabled by default, and it can be enabled by setting the option --feature-gates=AllowUnsafeMalformedObjectDeletion=true. It comes with an API change, a new delete option ignoreStoreReadErrorWithClusterBreakingPotential has been introduced, it is not set by default, this maintains backward compatibility. WARNING: this may break the workload associated with the resource being unsafe-deleted, if it relies on the normal deletion flow, so cluster breaking consequences apply.
  • Added singleProcessOOMKill flag to the kubelet configuration. Setting that to true enable single process OOM killing in cgroups v2. In this mode, if a single process is OOM killed within a container, the remaining processes will not be OOM killed.
  • Revised the kubelet API Authorization with new subresources, that allow finer-grained authorization checks and access control for kubelet endpoints. Provided you enable the KubeletFineGrainedAuthz feature gate, you can access kubelet’s /healthz endpoint by granting the caller nodes/healthz permission in RBAC. Similarly you can also access kubelet’s /pods endpoint to fetch a list of Pods bound to that node by granting the caller nodes/pods permission in RBAC. Similarly you can also access kubelet’s /configz endpoint to fetch kubelet’s configuration by granting the caller nodes/configz permission in RBAC. You can still access kubelet’s /healthz, /pods and /configz by granting the caller nodes/proxy permission in RBAC but that also grants the caller permissions to exec, run and attach to containers on the nodes and doing so does not follow the least privilege principle. Granting callers more permissions than they need can give attackers an opportunity to escalate privileges.
  • Added --concurrent-daemonset-syncs command line flag to kube-controller-manager. This value sets the number of workers for the daemonset controller.
  • Added a /statusz endpoint for the kube-apiserver endpoint.
  • Added kubelet support for systemd watchdog integration. With this enabled, systemd can automatically recover a hung kubelet.
  • Changed OOM score adjustment calculation for sidecar containers: the OOM adjustment for these containers will match or fall below the OOM score adjustment of regular containers in the Pod.
  • Label apps.kubernetes.io/pod-index added to Pod from StatefulSets is promoted to stable Label batch.kubernetes.io/job-completion-index added to Pods from Indexed Jobs is promoted to stable.
  • The Job controller now considers sidecar container restart counts when removing pods.
  • etcd: Updated to v3.5.16

If you use CSI then also check the CSI Sidecar Containers documentation. Every sidecar container contains a matrix which version you need at a minimum, maximum and which version is recommend to use with whatever K8s version.

Nevertheless if your K8s update to v1.32 worked fine I would recommend to also update the CSI sidecar containers sooner or later.

Now I finally upgraded the K8s controller and worker nodes to version 1.32.x as described in Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes.

That’s it for today! Happy upgrading! 😉