Kubernetes the not so hard way with Ansible - The worker - (K8s v1.18)

Installing Docker, kubelet, kube-proxy, CNI, Cilium and CoreDNS

August 8, 2020

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.

Docker

The first thing that I gonna install is Docker. Docker is needed at least on the Kubernetes worker nodes to execute the workload later as all Kubernetes workload is distributed as (Docker) container images which then gets executed in an isolated environment.

At time of writing this blog post Docker 18.09.x was the latest Docker version which is supported/recommended by Kubernetes. If you want to use my Docker playbook you can install it via

ansible-galaxy install githubixx.docker

The playbook has the following default variables:

# Directory to store downloaded Docker archive and unarchived binary files.
docker_download_dir: "/opt/tmp"

# Docker version to download and use.
docker_version: "18.09.9"
docker_user: "docker"
docker_group: "docker"
docker_uid: 666
docker_gid: 666
# Directory to store Docker binaries. Should be in your search PATH!
docker_bin_dir: "/usr/local/bin"

# Settings for "dockerd" daemon. Will be provided as paramter to "dockerd" in
# systemd service file for Docker. This settings are used to work out of the
# box with Kubernetes and Flannel network overlay. If you don't need this
# and just want to use "default" Docker networking see below (`dockerd_settings_user`
# variable):
dockerd_settings:
  "host": "unix:///run/docker.sock"
  "log-level": "error"
  "storage-driver": "overlay2"
  "iptables": "false"
  "ip-masq": "false"
  "bip": ""
  "mtu": "1472"

These settings for dockerd daemon defined in dockerd_settings can be overridden by defining a variable called dockerd_settings_user. You can also add additional settings by using this variable. E.g. to override mtu default value and add debug add the following settings to group_vars/all.yml or where ever it fit’s best for you:

dockerd_settings_user:
  "mtu": "1450"
  "debug": ""

There should be no need to change any of this default values besides storage-driver maybe. If you don’t use my Docker role pay attention to set at least the last four default settings mentioned above correct.

Optional: If you run your own Docker registry it may make sense to distribute the certificate authority file to your worker nodes to make sure that your worker nodes trust the SSL certificate that the registry offers (e.g. if you created a self signed certificate). The role allows you to distribute the CA file:

# The directory from where to copy the Docker CA certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/docker-ca-certificates". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "docker_ca_certificates_src_dir" will have a value of
# "/home/da_user/docker-ca-certificates".
docker_ca_certificates_src_dir: "{{ '~/docker-ca-certificates' | expanduser }}"

# The directory where the program "update-ca-certificates" searches for CA certificate
# files (besides other locations).
docker_ca_certificates_dst_dir: "/usr/local/share/ca-certificates"

# If you've a Docker registry with a self signed certificate you can copy the
# certificate authority (CA) file to the remote host to the CA certificate store.
# This way Docker will trust the SSL certificate of your Docker registry.
# It's important to mention that the CA files needs a ".crt" extension!
# "docker_ca_certificates" is a list so you can specify as much CA files as
# you want. The Ansible role will lookup for the files specified here in
# "docker_ca_certificates_src_dir" (see above). If "docker_ca_certificates"
# is not specified the task will be ignored.
docker_ca_certificates:
  - ca-docker.crt

As usual place these variables in group_vars/all.yml if you want to change variables. Add the role to our playbook file k8s.yml e.g. (in this case Docker will be installed on the Kubernetes worker AND controller nodes so you might want to adjust the hosts group):

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.docker
      tags: role-docker

A word about storage-driver: It makes sense to use a recent kernel for Docker in general (and also for Cilium which comes later). I recommend to use a kernel >=4.9.17 if possible. Ubuntu 18.04 provides a linux-image-5.3.0-45-generic package with Kernel 5.3 e.g. Ubuntu 20.04 already uses Kernel 5.4 by default (which contains WireGuard module by default btw.).

I’ll configure Docker to use overlayfs2 by default because it’s one of the best choices (also see Supported storage drivers per Linux distribution). But you can change the storage driver via the storage-driver setting if you like.

Now I deploy Docker role on all nodes using

ansible-playbook --tags=role-docker k8s.yml

Kubernetes worker

In Kubernetes control plane I installed Kubernetes API server, Scheduler and Controller manager on the controller nodes. For the worker I’ve also prepared an Ansible role which installs Kubernetes worker components. The Kubernetes part of a worker node needs a kubelet and a kube-proxy daemon. The worker do the “real” work. They run the pods and the Docker container. So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-)

kubelet is responsible to create a pod/container on a worker node if the scheduler had chosen that node to run a pod on. The kube-proxy cares about routes. E.g. if a pod or a service was added kube-proxy takes care to update routing rules with iptables e.g. (or maybe IPVS on newer Kubernetes installations) accordingly.

The worker depend on the infrastructure that I installed in the control plane blog post. The role uses the following variables:

# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"

# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"

# K8s release
k8s_release: "1.18.6"

# The interface on which the K8s services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
k8s_interface: "wg0"

# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"

# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"

# K8s worker binaries to download
k8s_worker_binaries:
  - kube-proxy
  - kubelet
  - kubectl

# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
  - ca-k8s-apiserver.pem
  - ca-k8s-apiserver-key.pem
  - cert-k8s-apiserver.pem
  - cert-k8s-apiserver-key.pem

# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"

# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"

# kubelet settings
k8s_worker_kubelet_settings:
  "config": "{{k8s_worker_kubelet_conf_dir}}/kubelet-config.yaml"
  "node-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "container-runtime": "docker"
  "image-pull-progress-deadline": "2m"
  "kubeconfig": "{{k8s_worker_kubelet_conf_dir}}/kubeconfig"
  "network-plugin": "cni"
  "cni-conf-dir": "{{k8s_cni_conf_dir}}"
  "cni-bin-dir": "{{k8s_cni_bin_dir}}"
  "cloud-provider": ""
  "register-node": "true"

# kublet kubeconfig
k8s_worker_kubelet_conf_yaml: |
  kind: KubeletConfiguration
  apiVersion: kubelet.config.k8s.io/v1beta1
  address: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  authentication:
    anonymous:
      enabled: false
    webhook:
      enabled: true
    x509:
      clientCAFile: "{{k8s_conf_dir}}/ca-k8s-apiserver.pem"
  authorization:
    mode: Webhook
  clusterDomain: "cluster.local"
  clusterDNS:
    - "10.32.0.254"
  failSwapOn: true
  healthzBindAddress: "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  healthzPort: 10248
  runtimeRequestTimeout: "15m"
  serializeImagePulls: false
  tlsCertFile: "{{k8s_conf_dir}}/cert-{{inventory_hostname}}.pem"
  tlsPrivateKeyFile: "{{k8s_conf_dir}}/cert-{{inventory_hostname}}-key.pem"

# Directory to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"

# kube-proxy settings
k8s_worker_kubeproxy_settings:
  "config": "{{k8s_worker_kubeproxy_conf_dir}}/kubeproxy-config.yaml"

k8s_worker_kubeproxy_conf_yaml: |
  kind: KubeProxyConfiguration
  apiVersion: kubeproxy.config.k8s.io/v1alpha1
  bindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  clientConnection:
    kubeconfig: "{{k8s_worker_kubeproxy_conf_dir}}/kubeconfig"
  healthzBindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}:10256
  mode: "ipvs"
  ipvs:
    minSyncPeriod: 0s
    scheduler: ""
    syncPeriod: 2s
  iptables:
    masqueradeAll: true
  clusterCIDR: "10.200.0.0/16"

# CNI network plugin settings
k8s_cni_dir: "/opt/cni"
k8s_cni_bin_dir: "{{k8s_cni_dir}}/bin"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_cni_plugin_version: "0.8.6"
# SHA512 checksum (see https://github.com/containernetworking/plugins/releases)
k8s_cni_plugin_checksum: "76b29cc629449723fef45db6a6999b0617e6c9084678a4a3361caf3fc5e935084bc0644e47839b1891395e3cec984f7bfe581dd9455c4991ddeee1c78392e538"

The role will search for the certificates I created in certificate authority blog post in the directory specified in k8s_ca_conf_directory on my local machine (could be a network share of course). The files used here are listed in k8s_certificates.

The Kubernetes worker binaries needed are listed in k8s_worker_binaries.

The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI archive to download is specified in k8s_cni_plugin_version and will be placed in k8s_cni_dir.

If you created a different VPN interface (e.g. peervpn0) change k8s_interface accordingly.

Now I add an entry for the worker hosts into Ansible’s hosts file e.g.:

[k8s_worker]
worker0[1:3].i.domain.tld

Then I install the role via

ansible-galaxy install githubixx.kubernetes-worker

Next I add the role to k8s.yml file e.g.:

  hosts: k8s_worker
  roles:
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

After that I run the playbook via

ansible-playbook --tags=role-kubernetes-worker k8s.yml

So now it should be already possible to fetch the state of the worker nodes:

kubectl get nodes -o wide

NAME       STATUS      ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
worker01   NotReady    <none>   2d21h   v1.18.6   10.8.0.203    <none>        Ubuntu 20.04.1 LTS   5.4.0-42-generic   docker://18.9.9
worker02   NotReady    <none>   2d21h   v1.18.6   10.8.0.204    <none>        Ubuntu 20.04.1 LTS   5.4.0-42-generic   docker://18.9.9

In STATUS column it shows NotReady. Looking at the logs on the worker nodes there will be some errors like this:

ansible -m command -a 'journalctl -t kubelet -n 50' k8s_worker

...
May 13 11:40:40 worker01 kubelet[12132]: E0513 11:40:40.646202   12132 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 13 11:40:44 worker01 kubelet[12132]: W0513 11:40:44.981728   12132 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
...

Cilium

What’s missing is the software that makes it possible that pods on different hosts can communicate. Previously I used flannel. Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes. But as time moves on other interesting projects pop up and one of them is Cilium.

That’s basically a one stop thing for everything which is needed for Kubernetes networking. So there is no need e.g. to install additional software for Network Policies. Cilium brings API-aware network security filtering to Linux container frameworks like Docker and Kubernetes. Using a new Linux kernel technology called BPF, Cilium provides a simple and efficient way to define and enforce both network-layer and application-layer security policies based on container/pod identity. That thing has really everything like overlay networking, native routing, IPv4/v6 support, load balancing, direct server return (DSR), monitoring and troubleshooting, Hubble as an observability platform, network policies, CNI and libnetwork integration, and so on. Use of BFP and XDP makes it also very fast as most of the processing is happening in the Linux kernel and not in userspace. Also documentation is just great and of course there is also a blog.

Ok, enough Cilium praise ;-) Lets install it. I prepared an Ansible Cilium role. Download via

ansible-galaxy install githubixx.cilium_kubernetes

Everything you need to know is documented in README including all variables. The default variables are configured to use the already exiting etcd server which is also used by Kubernetes API daemon. The certificate files should also be ready to use as they were created already in the certificate authority blog post.

Only one setting needs to be adjusted as I use WireGuard and etcd is listening on the WireGuard interface only. So cilium_etcd_interface: "wg0" needs to be set or you can do something like cilium_etcd_interface: {{ etcd_interface }} as etcd_interface is already set and so we can keep that in sync.

You also need to have Helm 3 binary installed on that host where ansible-playbook runs. You can either try to use your favorite package manager if your distribution includes helm in its repository or use one of the Ansible Helm roles (e.g. https://galaxy.ansible.com/gantsign/helm) or directly download the binary from https://github.com/helm/helm/releases and put it into /usr/local/bin/ directory e.g. For Archlinux Helm can be installed via sudo pacman -S helm e.g.

Now Cilium can be installed on the worker nodes:

ansible-playbook --tags=role-cilium-kubernetes -e cilium_install=true k8s.yml

After a while there should be some Cilium pods running:

kubectl -n cilium get pods -o wide

NAME                               READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
cilium-2qdc9                       1/1     Running   0          2d10h   10.8.0.205   worker01   <none>           <none>
cilium-nfj6z                       1/1     Running   0          2d10h   10.8.0.203   worker02   <none>           <none>
cilium-operator-69664fcff5-9xljg   1/1     Running   0          2d21h   10.8.0.205   worker01   <none>           <none>

You can also check the logs of the pods with kubectl -n cilium --tail=500 logs cilium-.... e.g.

CoreDNS

To resolve Kubernetes cluster internal DNS entries (like *.local) which is also used for auto-discovery of services CoreDNS can be used. And that’s also the one I cover here.

If you cloned the ansible-kubernetes-playbooks repository already you find a coredns directory in there with a playbook file called coredns.yml. I’ve added a detailed README to the playbook repository so please follow the instructions there to install CoreDNS.

Make a test deployment

Now that we’ve installed basically everything needed for running pods,deployments,services, and so on we should be able to do a sample deployment. On your laptop run:

kubectl -n default apply -f https://k8s.io/examples/application/deployment.yaml

This will deploy 2 pods running nginx. To get a overview of what’s running e.g. pods, services, deployments, and so on run:

kubectl -n default get all -o wide

NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
pod/nginx-deployment-6b474476c4-jktcp   1/1     Running   0          3m23s   10.200.1.23   worker01   <none>           <none>
pod/nginx-deployment-6b474476c4-qdsvz   1/1     Running   0          3m22s   10.200.1.8    worker02   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.32.0.1    <none>        443/TCP   3d21h   <none>

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES         SELECTOR
deployment.apps/nginx-deployment   2/2     2            2           3m23s   nginx        nginx:1.14.2   app=nginx

NAME                                          DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES         SELECTOR
replicaset.apps/nginx-deployment-6b474476c4   2         2         2       3m23s   nginx        nginx:1.14.2   app=nginx,pod-template-hash=6b474476c4

Or kubectl -n default describe deployment nginx-deployment also does the job.

You should be also able get the default nginx page on every worker node from one of the two nginx webservers. We can use Ansible’s get_url module here and you should see somthing similar like this (I truncated the output a bit):

ansible -m get_url -a "url=http://10.200.1.23 dest=/tmp/test.html" k8s_worker

worker01 | CHANGED => {
    "changed": true,
    "checksum_dest": null,
    "checksum_src": "7dd71afcfb14e105e80b0c0d7fce370a28a41f0a",
    "dest": "/tmp/test.html",
    "elapsed": 0,
    "gid": 0,
    "group": "root",
    "md5sum": "e3eb0a1df437f3f97a64aca5952c8ea0",
    "mode": "0600",
    "msg": "OK (612 bytes)",
    "owner": "root",
    "size": 612,
    "state": "file",
    "status_code": 200,
    "uid": 0,
    "url": "http://10.200.1.23"
}
worker02 | CHANGED => {
  ...
}

This should give a valid result no matter on which node the page is fetched. Cilium “knows” on which node the pod with the IP 10.200.1.23 is located and the request gets routed accordingly. If you’re done you can delete the nginx deployment again with kubectl -n default delete deployment nginx-deployment (but maybe wait a little bit as the deployment is convenient for further testing…).

You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:

kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'

10.8.0.203 10.200.0.0/24
10.8.0.204 10.200.1.0/24

The IP addresses 10.8.0.203/204 are addresses I assigned to the VPN interface (wg0 in my case) to worker01/02. That’s important since all communication should travel though the VPN interfaces.

If you just want to see if the worker nodes are ready use:

kubectl get nodes -o wide

NAME       STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
worker01   Ready    <none>   2d22h   v1.18.6   10.8.0.203    <none>        Ubuntu 20.04.1 LTS   5.4.0-42-generic   docker://18.9.9
worker02   Ready    <none>   2d22h   v1.18.6   10.8.0.204    <none>        Ubuntu 20.04.1 LTS   5.4.0-42-generic   docker://18.9.9

If you want to test network connectivity, DNS and stuff like that a little bit we can deploy kind of a debug container which is just the slim version of a Docker Debian image e.g.:

kubectl -n default run debug-pod -it --image=debian:stable-slim -- bash

This may take a little bit until the container image was downloaded. After we entered the container we install a few utilities:

apt-get update && apt-get install iputils-ping iproute dnsutils

Now it should be possible to do something like this:

root@debug-pod:/# ping kubernetes
PING kubernetes.default.svc.cluster.local (10.32.0.1) 56(84) bytes of data.
64 bytes from kubernetes.default.svc.cluster.local (10.32.0.1): icmp_seq=1 ttl=63 time=0.174 ms
...

or

dig www.microsoft.com

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> www.microsoft.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31420
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.microsoft.com.             IN      A

;; ANSWER SECTION:
www.microsoft.com.      5       IN      CNAME   www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net. 5 IN CNAME   www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. 5 IN CNAME e13678.dspb.akamaiedge.net.
e13678.dspb.akamaiedge.net. 5   IN      A       2.18.233.62

;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE  rcvd: 133

or resolve the IP address of a pod

root@debug-pod:/# dig 10-200-3-193.default.pod.cluster.local

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> 10-200-3-193.default.pod.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62473
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 89939a94c5c2785d (echoed)
;; QUESTION SECTION:
;10-200-3-193.default.pod.cluster.local.        IN A

;; ANSWER SECTION:
10-200-3-193.default.pod.cluster.local. 5 IN A  10.200.3.193

;; Query time: 1 msec
;; SERVER: 10.32.0.254#53(10.32.0.254)
;; WHEN: Tue Aug 11 20:56:06 UTC 2020
;; MSG SIZE  rcvd: 133

In both cases the DNS query was resolved by CoreDNS at 10.32.0.254. So resolving external and internal cluster.lcal DNS queres works as expected.

At this state the Kubernetes cluster basically fully functional :-) But of course there are lots more that could be done…

What’s next

There’re a lot more things that could/should be done now but running Sonobuoy could be a good next step. Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests (ensuring CNCF conformance) in an accessible and non-destructive manner.

Also you may have a look at Velero. It’s a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.

You may also want to have some montoring e.g. by using Prometheus + Alertmanager and creating some nice Dashboards with Grafana.

Having centralized logs from containers and the Kubernetes nodes is also something very useful. For this Loki and again Grafana might be an option but there are also various “logging stacks” (ELK ElasticSearch, Logstash and Kibana e.g.) out there that could make life easier.

But I’ll do something completely different first ;-) Up until now nobody from the outside can access any service that runs on the Kubernetes cluster. That’s might be an option but otherwise we need to allow some ingress traffic. So lets continue with Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik