Kubernetes the not so hard way with Ansible - The worker - (K8s v1.24)
2022-08-31
- update
k8s_release
to1.24.4
2022-02-02
- update
k8s_release
to1.23.3
2022-01-09
- update
k8s_release
to1.22.5
- default the
cgroupDriver
value in theKubeletConfiguration
tosystemd
askubelet
runs as asystemd
service. See configure-cgroup-driver for more details. Before that the default wascgroupfs
. Also see Migrating to the systemd driver
2022-01-08
- update
k8s_release
to1.21.8
- kubernetes-worker role no longer installs
CNI plugins
. So the variablesk8s_cni_dir
,k8s_cni_bin_dir
,k8s_cni_conf_dir
,k8s_cni_plugin_version
andk8s_cni_plugin_checksum
are no longer relevant and are ignored. Ansible role containerd is now used to installcontainerd
,runc
andCNI plugins
. Also see Kubernetes: Replace dockershim with containerd and runc Docker/dockershim
is no longer used as it’s deprecated and will be removed in Kubernetes v1.24. Instead containerd is used.- Content of
k8s_worker_kubelet_settings
variable changed: The previous settingsimage-pull-progress-deadline
,network-plugin
,cni-conf-dir
andcni-bin-dir
will all be removed with the dockershim removal.cloud-provider
will be removed in Kubernetesv1.23
, in favor of removing cloud provider code from Kubelet.container-runtime
has only two possible values and changed fromdocker
toremote
. And finally one new setting is needed which iscontainer-runtime-endpoint
which points tocontainerd's
socket.
2021-09-12
- update
k8s_release
to1.21.4
2021-07-05
- update
k8s_release
to1.20.8
2020-12-07
- update
k8s_release
to1.19.4
This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.
General information
It makes sense to use a recent Linux kernel in general. Container runtimes like containerd
and also Cilium
(which comes later) profit a lot if a recent kernel is available. I recommend to use a kernel >=4.9.17
if possible. Ubuntu 18.04
provides a linux-image-5.3.0-45-generic
package with Kernel 5.3
e.g. or install the Hardware Enablement Stack (HWE)
(apt-get install linux-generic-hwe-18.04
) which contains kernel 5.3
or even newer kernels. Ubuntu 20.04
already uses Kernel 5.4
by default (which contains WireGuard
module by default btw.). As of writing this blog post there is already Kernel 5.15
available for Ubuntu 20.04
.
containerd, runc and CNI plugins
Before containerd a lot of Kubernetes installations most probably used Docker as container runtime. But Docker/dockershim
was deprecated with Kubernetes v1.21
and will be removed with Kubernetes v1.24
. Behind the scene Docker
already used containerd
. So Docker
at the end was just an additional “layer” that is no longer needed for Kubernetes. So containerd
together with runc
is kind of a replacement for Docker
so to say. I’ve written a blog post how to migrate from Docker/dockershim
to containerd
: Kubernetes: Replace dockershim with containerd and runc.
A container runtime is needed to execute workloads that you deploy to Kubernetes. A workload is normally a Docker container image (which you build locally, on a Jenkins server or whatever build pipeline you have in place) which runs a webserver or any other service that listens on a port.
So the first thing that I gonna install is containerd which is a modern replacement for Docker
with the help of my Ansible role for containerd. containerd
is a container runtime which will be installed into each Kubernetes worker node in the cluster so that Pods
(the actually workload distributed as container images) can run there.
So first install the Ansible role for containerd:
ansible-galaxy install githubixx.containerd
By default this role only installs containerd
binaries which isn’t enough for a Kubernetes worker node. So we need to change a few Ansible variables for this role:
containerd_flavor: "k8s"
containerd_runc_binary_directory: "/usr/local/sbin"
containerd_crictl_config_file: "crictl.yaml"
containerd_crictl_config_directory: "/etc"
containerd_cni_binary_directory: "/opt/cni/bin"
With containerd_flavor: "k8s"
set the role to not only installs a minimal set of containerd
binaries but also runc and CNI plugins. runc
is a CLI tool for spawning and running containers on Linux according to the OCI specification. CNI
, a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted.
In general the default variables of this role should be just fine. But containerd_runc_binary_directory
needs to be defined as otherwise runc
binary won’t be installed. The same is true for CNI plugins
which needs containerd_cni_binary_directory
to be defined. ctr
is a little CLI tool like the docker
command line tool to manage Docker images. But it really only provides very basic commands. On a production cluster there is normally no need to manually manage containers but sometimes it might be handy for debugging purposes. For more information see Using ctr/nerdctl instead of docker CLI command. So to install ctr
utility the variables containerd_crictl_config_file
and containerd_crictl_config_directory
needs to be defined.
For all variables the containerd role offers please see default.yml.
A common place for these variables is group_vars/all.yml
if you want to change variables. Also add the role to our playbook file k8s.yml
e.g.:
-
hosts: k8s_worker
roles:
-
role: githubixx.containerd
tags: role-containerd
If everything is in place the role can be deployed on all worker nodes:
ansible-playbook --tags=role-containerd k8s.yml
Kubernetes worker
In Kubernetes control plane I installed Kubernetes API server
, Scheduler
and Controller Manager
on the controller nodes. For the worker I’ve also prepared an Ansible role which installs Kubernetes worker components. The Kubernetes part of a worker node needs a kubelet
and a kube-proxy
daemon. The worker do the “real” work. They run the Pods
(which are container deployed via container images). So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-)
kubelet
is responsible to create a pod/container on a worker node if the scheduler had chosen that node to run a pod on. The kube-proxy
cares about routes. E.g. if a pod or a service was added kube-proxy
takes care to update routing rules with iptables
(by default) or IPVS
on newer Kubernetes installations (which is the default in my roles).
The worker depend on the infrastructure that I installed in the control plane blog post. The role uses the following variables:
# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"
# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"
# K8s release
k8s_release: "1.24.4"
# The interface on which the K8s services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
k8s_interface: "wg0"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"
# K8s worker binaries to download
k8s_worker_binaries:
- kube-proxy
- kubelet
- kubectl
# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
- ca-k8s-apiserver.pem
- ca-k8s-apiserver-key.pem
- cert-k8s-apiserver.pem
- cert-k8s-apiserver-key.pem
# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"
# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"
# kubelet settings
#
# If you want to enable the use of "RuntimeDefault" as the default seccomp
# profile for all workloads add these settings:
#
# "feature-gates": "SeccompDefault=true"
# "seccomp-default": ""
#
# These settings are Alpha but may be worth adding. Also see:
# https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads
#
k8s_worker_kubelet_settings:
"config": "{{ k8s_worker_kubelet_conf_dir }}/kubelet-config.yaml"
"node-ip": "{{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}"
"container-runtime": "remote"
"container-runtime-endpoint": "unix:///run/containerd/containerd.sock"
"kubeconfig": "{{ k8s_worker_kubelet_conf_dir }}/kubeconfig"
"register-node": "true"
# kubelet kubeconfig
k8s_worker_kubelet_conf_yaml: |
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "{{ k8s_conf_dir }}/ca-k8s-apiserver.pem"
authorization:
mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
- "10.32.0.254"
failSwapOn: true
healthzBindAddress: "{{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}"
healthzPort: 10248
runtimeRequestTimeout: "15m"
serializeImagePulls: false
tlsCertFile: "{{ k8s_conf_dir }}/cert-{{ inventory_hostname }}.pem"
tlsPrivateKeyFile: "{{ k8s_conf_dir }}/cert-{{ inventory_hostname }}-key.pem"
cgroupDriver: "systemd"
# Directory to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"
# kube-proxy settings
k8s_worker_kubeproxy_settings:
"config": "{{ k8s_worker_kubeproxy_conf_dir }}/kubeproxy-config.yaml"
k8s_worker_kubeproxy_conf_yaml: |
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}
clientConnection:
kubeconfig: "{{ k8s_worker_kubeproxy_conf_dir }}/kubeconfig"
healthzBindAddress: {{ hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address }}:10256
mode: "ipvs"
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 2s
iptables:
masqueradeAll: true
clusterCIDR: "10.200.0.0/16"
The role will search for the certificates I created in certificate authority blog post in the directory specified in k8s_ca_conf_directory
on my local machine (could be a network share of course). The files used here are listed in k8s_certificates
.
The Kubernetes worker binaries needed are listed in k8s_worker_binaries
.
The Kubelet can use CNI
(the Container Network Interface) to manage machine level networking requirements. The CNI plugins
needed were installed with the containerd
role which was already mentioned above.
If you created a different VPN interface (e.g. peervpn0
) change k8s_interface
accordingly. As I use WireGuard
I’ll use wg0
as variable value.
Now I add an entry for the worker hosts into Ansible’s hosts
file e.g.:
[k8s_worker]
worker0[1:3].i.domain.tld
Then I install the role via
ansible-galaxy install githubixx.kubernetes-worker
Next I add the role to k8s.yml
file e.g.:
hosts: k8s_worker
roles:
-
role: githubixx.kubernetes-worker
tags: role-kubernetes-worker
After that the role gets deployed on all worker nodes:
ansible-playbook --tags=role-kubernetes-worker k8s.yml
So by now it should already be possible to fetch the state of the worker nodes:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker01 NotReady <none> 2d21h v1.24.4 10.8.0.203 <none> Ubuntu 20.04.5 LTS 5.15.0-46-generic containerd://1.6.6
worker02 NotReady <none> 2d21h v1.24.4 10.8.0.204 <none> Ubuntu 20.04.5 LTS 5.15.0-46-generic containerd://1.6.6
In STATUS
column it shows NotReady
. Looking at the logs on the worker nodes there will be some errors like this:
ansible -m command -a 'journalctl -t kubelet -n 50' k8s_worker
...
May 13 11:40:40 worker01 kubelet[12132]: E0513 11:40:40.646202 12132 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 13 11:40:44 worker01 kubelet[12132]: W0513 11:40:44.981728 12132 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
...
This will be fixed next.
Cilium
What’s missing is the software that makes it possible that pods on different hosts can communicate. Previously I used flannel. Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes. But as time moves on other interesting projects pop up and one of them is Cilium.
That’s basically a one stop thing for everything which is needed for Kubernetes networking. So there is no need e.g. to install additional software for Network Policies. Cilium brings API-aware network security filtering to Linux container frameworks like Docker and Kubernetes. Using a new Linux kernel technology called BPF, Cilium provides a simple and efficient way to define and enforce both network-layer and application-layer security policies based on container/pod identity. That thing has really everything like overlay networking, native routing, IPv4/v6 support, load balancing, direct server return (DSR), monitoring and troubleshooting, Hubble as an observability platform, network policies, CNI and libnetwork integration, and so on. Use of BFP and XDP makes it also very fast as most of the processing is happening in the Linux kernel and not in userspace. Also documentation is just great and of course there is also a blog.
Ok, enough Cilium praise ;-) Lets install it. I prepared an Ansible Cilium role. Download via
ansible-galaxy install githubixx.cilium_kubernetes
Everything you need to know is documented in README including all variables. The default variables are configured to use the already exiting etcd
server which is also used by Kubernetes API daemon. The certificate files should also be ready to use as they were created already in the certificate authority blog post.
Only one setting needs to be adjusted as I use WireGuard
and etcd
is listening on the WireGuard
interface only. So cilium_etcd_interface: "wg0"
needs to be set or you can do something like cilium_etcd_interface: {{ etcd_interface }}
as etcd_interface
is already set and so we can keep that in sync.
You also need to have Helm 3 binary installed on that host where ansible-playbook
runs. You can either try to use your favorite package manager if your distribution includes helm
in its repository or use one of the Ansible Helm
roles (e.g. gantsign/helm) or directly download the binary from Helm releases and put it into /usr/local/bin/
directory e.g. For Archlinux Helm
can be installed via sudo pacman -S helm
e.g.
Now Cilium
can be installed on the worker nodes:
ansible-playbook --tags=role-cilium-kubernetes -e cilium_install=true k8s.yml
After a while there should be some Cilium
pods running:
kubectl -n cilium get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-2qdc9 1/1 Running 0 2d10h 10.8.0.205 worker01 <none> <none>
cilium-nfj6z 1/1 Running 0 2d10h 10.8.0.203 worker02 <none> <none>
cilium-operator-7f9745f9b6-jqczr 1/1 Running 0 2d21h 10.8.0.204 worker01 <none> <none>
cilium-operator-7f9745f9b6-p7wnb 1/1 Running 0 2d21h 10.8.0.203 worker02 <none> <none>
You can also check the logs of the pods with kubectl -n cilium --tail=500 logs cilium-....
e.g.
CoreDNS
To resolve Kubernetes cluster internal DNS entries (like *.local
) which is also used for auto-discovery of services CoreDNS can be used. And that’s also the one I cover here.
If you cloned the ansible-kubernetes-playbooks repository already you find a coredns
directory in there with a playbook file called coredns.yml
. I’ve added a detailed README to the playbook repository so please follow the instructions there to install CoreDNS
.
Make a test deployment
Now that we’ve installed basically everything needed for running pods,deployments,services, and so on we should be able to do a sample deployment. On your laptop run:
kubectl -n default apply -f https://k8s.io/examples/application/deployment.yaml
This will deploy 2 pods running nginx. To get a overview of what’s running e.g. pods, services, deployments, and so on run:
kubectl -n default get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-deployment-6b474476c4-jktcp 1/1 Running 0 3m23s 10.200.1.23 worker01 <none> <none>
pod/nginx-deployment-6b474476c4-qdsvz 1/1 Running 0 3m22s 10.200.1.8 worker02 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 3d21h <none>
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-deployment 2/2 2 2 3m23s nginx nginx:1.14.2 app=nginx
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-deployment-6b474476c4 2 2 2 3m23s nginx nginx:1.14.2 app=nginx,pod-template-hash=6b474476c4
Or kubectl -n default describe deployment nginx-deployment
also does the job.
You should be also able get the default nginx page on every worker node from one of the two nginx webservers. We can use Ansible’s get_url module here and you should see something similar like this (I truncated the output a bit):
ansible -m get_url -a "url=http://10.200.1.23 dest=/tmp/test.html" k8s_worker
worker01 | CHANGED => {
"changed": true,
"checksum_dest": null,
"checksum_src": "7dd71afcfb14e105e80b0c0d7fce370a28a41f0a",
"dest": "/tmp/test.html",
"elapsed": 0,
"gid": 0,
"group": "root",
"md5sum": "e3eb0a1df437f3f97a64aca5952c8ea0",
"mode": "0600",
"msg": "OK (612 bytes)",
"owner": "root",
"size": 612,
"state": "file",
"status_code": 200,
"uid": 0,
"url": "http://10.200.1.23"
}
worker02 | CHANGED => {
...
}
This should give a valid result no matter on which node the page is fetched. Cilium
“knows” on which node the pod with the IP 10.200.1.23
is located and the request gets routed accordingly. If you’re done you can delete the nginx deployment again with kubectl -n default delete deployment nginx-deployment
(but maybe wait a little bit as the deployment is convenient for further testing…).
You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:
kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'
10.8.0.203 10.200.0.0/24
10.8.0.204 10.200.1.0/24
The IP addresses 10.8.0.203/204
are addresses I assigned to the VPN interface (wg0
in my case) to worker01/02. That’s important since all communication should travel though the VPN interfaces.
If you just want to see if the worker nodes are ready use:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker01 Ready <none> 2d21h v1.24.4 10.8.0.203 <none> Ubuntu 20.04.5 LTS 5.15.0-46-generic containerd://1.6.6
worker02 Ready <none> 2d21h v1.24.4 10.8.0.204 <none> Ubuntu 20.04.5 LTS 5.15.0-46-generic containerd://1.6.6
If you want to test network connectivity, DNS and stuff like that a little bit we can deploy kind of a debug container which is just the slim version of a Docker Debian image e.g.:
kubectl -n default run debug-pod -it --rm --image=debian:stable-slim -- bash
This may take a little bit until the container image was downloaded. After entering the container few utilities should be installed:
apt-get update && apt-get install iputils-ping iproute2 dnsutils
Now it should be possible to do something like this:
root@debug-pod:/# ping kubernetes
PING kubernetes.default.svc.cluster.local (10.32.0.1) 56(84) bytes of data.
64 bytes from kubernetes.default.svc.cluster.local (10.32.0.1): icmp_seq=1 ttl=63 time=0.174 ms
...
or
dig www.microsoft.com
; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> www.microsoft.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31420
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.microsoft.com. IN A
;; ANSWER SECTION:
www.microsoft.com. 5 IN CNAME www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net. 5 IN CNAME www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. 5 IN CNAME e13678.dspb.akamaiedge.net.
e13678.dspb.akamaiedge.net. 5 IN A 2.18.233.62
;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE rcvd: 133
or resolve the IP address of a pod
root@debug-pod:/# dig 10-200-3-193.default.pod.cluster.local
; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> 10-200-3-193.default.pod.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62473
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 89939a94c5c2785d (echoed)
;; QUESTION SECTION:
;10-200-3-193.default.pod.cluster.local. IN A
;; ANSWER SECTION:
10-200-3-193.default.pod.cluster.local. 5 IN A 10.200.3.193
;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE rcvd: 133
In both cases the DNS query was resolved by CoreDNS
at 10.32.0.254
. So resolving external and internal cluster.local
DNS queries works as expected.
At this state the Kubernetes cluster basically fully functional :-) But of course there are lots more that could be done…
What’s next
There’re a lot more things that could/should be done now but running Sonobuoy could be a good next step. Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests (ensuring CNCF conformance) in an accessible and non-destructive manner.
Also you may have a look at Velero. It’s a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.
You may also want to have some monitoring e.g. by using Prometheus + Alertmanager and creating some nice Dashboards with Grafana. Also having a nice a Kubernetes Dashboard like Lens might be helpful.
Having centralized logs from containers and the Kubernetes nodes is also something very useful. For this Loki and again Grafana might be an option but there are also various “logging stacks” like ELK ElasticSearch, Logstash and Kibana out there that could make life easier.
But I’ll do something completely different first ;-) Up until now nobody from the outside can access any service that runs on the Kubernetes cluster. For this something called Ingress is needed. So lets continue with Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 1). In this blog post I’ll install Traefik ingress controller and cert-manager to automatically fetch and renew TLS certificates from Lets Encrypt.