Kubernetes the not so hard way with Ansible - The worker - (K8s v1.23)

2022-02-02

  • update k8s_release to 1.23.3

2022-01-09

2022-01-08

  • update k8s_release to 1.21.8
  • kubernetes-worker role no longer installs CNI plugins. So the variables k8s_cni_dir, k8s_cni_bin_dir, k8s_cni_conf_dir, k8s_cni_plugin_version and k8s_cni_plugin_checksum are no longer relevant and are ignored. Ansible role containerd is now used to install containerd, runc and CNI plugins. Also see Kubernetes: Replace dockershim with containerd and runc
  • Docker/dockershim is no longer used as it’s deprecated and will be removed in Kubernetes v1.24. Instead containerd is used.
  • Content of k8s_worker_kubelet_settings variable changed: The previous settings image-pull-progress-deadline, network-plugin, cni-conf-dir and cni-bin-dir will all be removed with the dockershim removal. cloud-provider will be removed in Kubernetes v1.23, in favor of removing cloud provider code from Kubelet. container-runtime has only two possible values and changed from docker to remote. And finally one new setting is needed which is container-runtime-endpoint which points to containerd's socket.

2021-09-12

  • update k8s_release to 1.21.4

2021-07-05

  • update k8s_release to 1.20.8

2020-12-07

  • update k8s_release to 1.19.4

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.

It makes sense to use a recent Linux kernel in general. Container runtimes like containerd and also Cilium (which comes later) profit a lot if a recent kernel is available. I recommend to use a kernel >=4.9.17 if possible. Ubuntu 18.04 provides a linux-image-5.3.0-45-generic package with Kernel 5.3 e.g. or install the Hardware Enablement Stack (HWE) (apt-get install linux-generic-hwe-18.04) which contains kernel 5.3 or even newer kernels. Ubuntu 20.04 already uses Kernel 5.4 by default (which contains WireGuard module by default btw.). As of writing this blog post there is already Kernel 5.13 available for Ubuntu 20.04.

Before containerd a lot of Kubernetes installations most probably used Docker as container runtime. But Docker/dockershim was deprecated with Kubernetes v1.21 and will be removed with Kubernetes v1.24. Behind the scene Docker already used containerd. So Docker at the end was just an additional “layer” that is no longer needed for Kubernetes. So containerd together with runc is kind of a replacement for Docker so to say. I’ve written a blog post how to migrate from Docker/dockershim to containerd: Kubernetes: Replace dockershim with containerd and runc.

A container runtime is needed to execute workloads that you deploy to Kubernetes. A workload is normally a Docker container image (which you build locally, on a Jenkins server or whatever build pipeline you have in place) which runs a webserver or any other service that listens on a port.

So the first thing that I gonna install is containerd which is a modern replacement for Docker with the help of my Ansible role for containerd. containerd is a container runtime which will be installed into each Kubernetes worker node in the cluster so that Pods (the actually workload distributed as container images) can run there.

So first install the Ansible role for containerd:

ansible-galaxy install githubixx.containerd

By default this role only installs containerd binaries which isn’t enough for a Kubernetes worker node. So we need to change a few Ansible variables for this role:

containerd_flavor: "k8s"
containerd_runc_binary_directory: "/usr/local/sbin"
containerd_crictl_config_file: "crictl.yaml"
containerd_crictl_config_directory: "/etc"
containerd_cni_binary_directory: "/opt/cni/bin"

With containerd_flavor: "k8s" set the role to not only installs a minimal set of containerd binaries but also runc and CNI plugins. runc is a CLI tool for spawning and running containers on Linux according to the OCI specification. CNI, a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted.

In general the default variables of this role should be just fine. But containerd_runc_binary_directory needs to be defined as otherwise runc binary won’t be installed. The same is true for CNI plugins which needs containerd_cni_binary_directory to be defined. ctr is a little CLI tool like the docker command line tool to manage Docker images. But it really only provides very basic commands. On a production cluster there is normally no need to manually manage containers but sometimes it might be handy for debugging purposes. For more information see Using ctr/nerdctl instead of docker CLI command. So to install ctr utility the variables containerd_crictl_config_file and containerd_crictl_config_directory needs to be defined.

For all variables the containerd role offers please see default.yml.

A common place for these variables is group_vars/all.yml if you want to change variables. Also add the role to our playbook file k8s.yml e.g.:

-
  hosts: k8s_worker
  roles:
    -
      role: githubixx.containerd
      tags: role-containerd

If everything is in place the role can be deployed on all worker nodes:

ansible-playbook --tags=role-containerd k8s.yml

In Kubernetes control plane I installed Kubernetes API server, Scheduler and Controller Manager on the controller nodes. For the worker I’ve also prepared an Ansible role which installs Kubernetes worker components. The Kubernetes part of a worker node needs a kubelet and a kube-proxy daemon. The worker do the “real” work. They run the Pods (which are container deployed via container images). So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-)

kubelet is responsible to create a pod/container on a worker node if the scheduler had chosen that node to run a pod on. The kube-proxy cares about routes. E.g. if a pod or a service was added kube-proxy takes care to update routing rules with iptables (by default) or IPVS on newer Kubernetes installations (which is the default in my roles).

The worker depend on the infrastructure that I installed in the control plane blog post. The role uses the following variables:

# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"

# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"

# K8s release
k8s_release: "1.23.3"

# The interface on which the K8s services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
k8s_interface: "wg0"

# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"

# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"

# K8s worker binaries to download
k8s_worker_binaries:
  - kube-proxy
  - kubelet
  - kubectl

# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
  - ca-k8s-apiserver.pem
  - ca-k8s-apiserver-key.pem
  - cert-k8s-apiserver.pem
  - cert-k8s-apiserver-key.pem

# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"

# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"

# kubelet settings
#
# If you want to enable the use of "RuntimeDefault" as the default seccomp
# profile for all workloads add these settings:
#
# "feature-gates": "SeccompDefault=true"
# "seccomp-default": ""
#
# These settings are Alpha but may be worth adding. Also see:
# https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
#
k8s_worker_kubelet_settings:
  "config": "{{k8s_worker_kubelet_conf_dir}}/kubelet-config.yaml"
  "node-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "container-runtime": "remote"
  "container-runtime-endpoint": "unix:///run/containerd/containerd.sock"
  "kubeconfig": "{{k8s_worker_kubelet_conf_dir}}/kubeconfig"
  "register-node": "true"

# kubelet kubeconfig
k8s_worker_kubelet_conf_yaml: |
  kind: KubeletConfiguration
  apiVersion: kubelet.config.k8s.io/v1beta1
  address: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  authentication:
    anonymous:
      enabled: false
    webhook:
      enabled: true
    x509:
      clientCAFile: "{{k8s_conf_dir}}/ca-k8s-apiserver.pem"
  authorization:
    mode: Webhook
  clusterDomain: "cluster.local"
  clusterDNS:
    - "10.32.0.254"
  failSwapOn: true
  healthzBindAddress: "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  healthzPort: 10248
  runtimeRequestTimeout: "15m"
  serializeImagePulls: false
  tlsCertFile: "{{k8s_conf_dir}}/cert-{{inventory_hostname}}.pem"
  tlsPrivateKeyFile: "{{k8s_conf_dir}}/cert-{{inventory_hostname}}-key.pem"
  cgroupDriver: "systemd"  

# Directory to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"

# kube-proxy settings
k8s_worker_kubeproxy_settings:
  "config": "{{k8s_worker_kubeproxy_conf_dir}}/kubeproxy-config.yaml"

k8s_worker_kubeproxy_conf_yaml: |
  kind: KubeProxyConfiguration
  apiVersion: kubeproxy.config.k8s.io/v1alpha1
  bindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  clientConnection:
    kubeconfig: "{{k8s_worker_kubeproxy_conf_dir}}/kubeconfig"
  healthzBindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}:10256
  mode: "ipvs"
  ipvs:
    minSyncPeriod: 0s
    scheduler: ""
    syncPeriod: 2s
  iptables:
    masqueradeAll: true
  clusterCIDR: "10.200.0.0/16"  

The role will search for the certificates I created in certificate authority blog post in the directory specified in k8s_ca_conf_directory on my local machine (could be a network share of course). The files used here are listed in k8s_certificates.

The Kubernetes worker binaries needed are listed in k8s_worker_binaries.

The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI plugins needed were installed with the containerd role which was already mentioned above.

If you created a different VPN interface (e.g. peervpn0) change k8s_interface accordingly. As I use WireGuard I’ll use wg0 as variable value.

Now I add an entry for the worker hosts into Ansible’s hosts file e.g.:

[k8s_worker]
worker0[1:3].i.domain.tld

Then I install the role via

ansible-galaxy install githubixx.kubernetes-worker

Next I add the role to k8s.yml file e.g.:

  hosts: k8s_worker
  roles:
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

After that the role gets deployed on all worker nodes:

ansible-playbook --tags=role-kubernetes-worker k8s.yml

So by now it should already be possible to fetch the state of the worker nodes:

kubectl get nodes -o wide

NAME       STATUS      ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
worker01   NotReady    <none>   2d21h   v1.23.3   10.8.0.203    <none>        Ubuntu 20.04.3 LTS   5.13.0-27-generic  containerd://1.5.9
worker02   NotReady    <none>   2d21h   v1.23.3   10.8.0.204    <none>        Ubuntu 20.04.3 LTS   5.13.0-27-generic  containerd://1.5.9

In STATUS column it shows NotReady. Looking at the logs on the worker nodes there will be some errors like this:

ansible -m command -a 'journalctl -t kubelet -n 50' k8s_worker

...
May 13 11:40:40 worker01 kubelet[12132]: E0513 11:40:40.646202   12132 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 13 11:40:44 worker01 kubelet[12132]: W0513 11:40:44.981728   12132 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
...

This will be fixed next.

What’s missing is the software that makes it possible that pods on different hosts can communicate. Previously I used flannel. Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes. But as time moves on other interesting projects pop up and one of them is Cilium.

That’s basically a one stop thing for everything which is needed for Kubernetes networking. So there is no need e.g. to install additional software for Network Policies. Cilium brings API-aware network security filtering to Linux container frameworks like Docker and Kubernetes. Using a new Linux kernel technology called BPF, Cilium provides a simple and efficient way to define and enforce both network-layer and application-layer security policies based on container/pod identity. That thing has really everything like overlay networking, native routing, IPv4/v6 support, load balancing, direct server return (DSR), monitoring and troubleshooting, Hubble as an observability platform, network policies, CNI and libnetwork integration, and so on. Use of BFP and XDP makes it also very fast as most of the processing is happening in the Linux kernel and not in userspace. Also documentation is just great and of course there is also a blog.

Ok, enough Cilium praise ;-) Lets install it. I prepared an Ansible Cilium role. Download via

ansible-galaxy install githubixx.cilium_kubernetes

Everything you need to know is documented in README including all variables. The default variables are configured to use the already exiting etcd server which is also used by Kubernetes API daemon. The certificate files should also be ready to use as they were created already in the certificate authority blog post.

Only one setting needs to be adjusted as I use WireGuard and etcd is listening on the WireGuard interface only. So cilium_etcd_interface: "wg0" needs to be set or you can do something like cilium_etcd_interface: {{ etcd_interface }} as etcd_interface is already set and so we can keep that in sync.

You also need to have Helm 3 binary installed on that host where ansible-playbook runs. You can either try to use your favorite package manager if your distribution includes helm in its repository or use one of the Ansible Helm roles (e.g. gantsign/helm) or directly download the binary from Helm releases and put it into /usr/local/bin/ directory e.g. For Archlinux Helm can be installed via sudo pacman -S helm e.g.

Now Cilium can be installed on the worker nodes:

ansible-playbook --tags=role-cilium-kubernetes -e cilium_install=true k8s.yml

After a while there should be some Cilium pods running:

kubectl -n cilium get pods -o wide

NAME                               READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
cilium-2qdc9                       1/1     Running   0          2d10h   10.8.0.205   worker01   <none>           <none>
cilium-nfj6z                       1/1     Running   0          2d10h   10.8.0.203   worker02   <none>           <none>
cilium-operator-7f9745f9b6-jqczr   1/1     Running   0          2d21h   10.8.0.204   worker01   <none>           <none>
cilium-operator-7f9745f9b6-p7wnb   1/1     Running   0          2d21h   10.8.0.203   worker02   <none>           <none>

You can also check the logs of the pods with kubectl -n cilium --tail=500 logs cilium-.... e.g.

To resolve Kubernetes cluster internal DNS entries (like *.local) which is also used for auto-discovery of services CoreDNS can be used. And that’s also the one I cover here.

If you cloned the ansible-kubernetes-playbooks repository already you find a coredns directory in there with a playbook file called coredns.yml. I’ve added a detailed README to the playbook repository so please follow the instructions there to install CoreDNS.

Now that we’ve installed basically everything needed for running pods,deployments,services, and so on we should be able to do a sample deployment. On your laptop run:

kubectl -n default apply -f https://k8s.io/examples/application/deployment.yaml

This will deploy 2 pods running nginx. To get a overview of what’s running e.g. pods, services, deployments, and so on run:

kubectl -n default get all -o wide

NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
pod/nginx-deployment-6b474476c4-jktcp   1/1     Running   0          3m23s   10.200.1.23   worker01   <none>           <none>
pod/nginx-deployment-6b474476c4-qdsvz   1/1     Running   0          3m22s   10.200.1.8    worker02   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.32.0.1    <none>        443/TCP   3d21h   <none>

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES         SELECTOR
deployment.apps/nginx-deployment   2/2     2            2           3m23s   nginx        nginx:1.14.2   app=nginx

NAME                                          DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES         SELECTOR
replicaset.apps/nginx-deployment-6b474476c4   2         2         2       3m23s   nginx        nginx:1.14.2   app=nginx,pod-template-hash=6b474476c4

Or kubectl -n default describe deployment nginx-deployment also does the job.

You should be also able get the default nginx page on every worker node from one of the two nginx webservers. We can use Ansible’s get_url module here and you should see something similar like this (I truncated the output a bit):

ansible -m get_url -a "url=http://10.200.1.23 dest=/tmp/test.html" k8s_worker

worker01 | CHANGED => {
    "changed": true,
    "checksum_dest": null,
    "checksum_src": "7dd71afcfb14e105e80b0c0d7fce370a28a41f0a",
    "dest": "/tmp/test.html",
    "elapsed": 0,
    "gid": 0,
    "group": "root",
    "md5sum": "e3eb0a1df437f3f97a64aca5952c8ea0",
    "mode": "0600",
    "msg": "OK (612 bytes)",
    "owner": "root",
    "size": 612,
    "state": "file",
    "status_code": 200,
    "uid": 0,
    "url": "http://10.200.1.23"
}
worker02 | CHANGED => {
  ...
}

This should give a valid result no matter on which node the page is fetched. Cilium “knows” on which node the pod with the IP 10.200.1.23 is located and the request gets routed accordingly. If you’re done you can delete the nginx deployment again with kubectl -n default delete deployment nginx-deployment (but maybe wait a little bit as the deployment is convenient for further testing…).

You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:

kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'

10.8.0.203 10.200.0.0/24
10.8.0.204 10.200.1.0/24

The IP addresses 10.8.0.203/204 are addresses I assigned to the VPN interface (wg0 in my case) to worker01/02. That’s important since all communication should travel though the VPN interfaces.

If you just want to see if the worker nodes are ready use:

kubectl get nodes -o wide

NAME       STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
worker01   Ready    <none>   542d    v1.23.3   10.8.0.203    <none>        Ubuntu 20.04.3 LTS   5.13.0-27-generic  containerd://1.5.9
worker02   Ready    <none>   542d    v1.23.3   10.8.0.204    <none>        Ubuntu 20.04.3 LTS   5.13.0-27-generic  containerd://1.5.9

If you want to test network connectivity, DNS and stuff like that a little bit we can deploy kind of a debug container which is just the slim version of a Docker Debian image e.g.:

kubectl -n default run debug-pod -it --rm --image=debian:stable-slim -- bash

This may take a little bit until the container image was downloaded. After entering the container few utilities should be installed:

apt-get update && apt-get install iputils-ping iproute dnsutils

Now it should be possible to do something like this:

root@debug-pod:/# ping kubernetes
PING kubernetes.default.svc.cluster.local (10.32.0.1) 56(84) bytes of data.
64 bytes from kubernetes.default.svc.cluster.local (10.32.0.1): icmp_seq=1 ttl=63 time=0.174 ms
...

or

dig www.microsoft.com

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> www.microsoft.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31420
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.microsoft.com.             IN      A

;; ANSWER SECTION:
www.microsoft.com.      5       IN      CNAME   www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net. 5 IN CNAME   www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. 5 IN CNAME e13678.dspb.akamaiedge.net.
e13678.dspb.akamaiedge.net. 5   IN      A       2.18.233.62

;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE  rcvd: 133

or resolve the IP address of a pod

root@debug-pod:/# dig 10-200-3-193.default.pod.cluster.local

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> 10-200-3-193.default.pod.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62473
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 89939a94c5c2785d (echoed)
;; QUESTION SECTION:
;10-200-3-193.default.pod.cluster.local.        IN A

;; ANSWER SECTION:
10-200-3-193.default.pod.cluster.local. 5 IN A  10.200.3.193

;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE  rcvd: 133

In both cases the DNS query was resolved by CoreDNS at 10.32.0.254. So resolving external and internal cluster.local DNS queries works as expected.

At this state the Kubernetes cluster basically fully functional :-) But of course there are lots more that could be done…

There’re a lot more things that could/should be done now but running Sonobuoy could be a good next step. Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests (ensuring CNCF conformance) in an accessible and non-destructive manner.

Also you may have a look at Velero. It’s a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.

You may also want to have some monitoring e.g. by using Prometheus + Alertmanager and creating some nice Dashboards with Grafana. Also having a nice a Kubernetes Dashboard like Lens might be helpful.

Having centralized logs from containers and the Kubernetes nodes is also something very useful. For this Loki and again Grafana might be an option but there are also various “logging stacks” like ELK ElasticSearch, Logstash and Kibana out there that could make life easier.

But I’ll do something completely different first ;-) Up until now nobody from the outside can access any service that runs on the Kubernetes cluster. For this something called Ingress is needed. So lets continue with Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 1). In this blog post I’ll install Traefik ingress controller and cert-manager to automatically fetch and renew TLS certificates from Lets Encrypt.