Kubernetes the not so hard way with Ansible - The worker (updated for K8s v1.11.x)

Installing Flannel, Docker, kube-apiserver, kube-controller-manager, kube-scheduler and kube-dns/CoreDNS

September 7, 2018

2019-10-03

  • Since Kubernetes v1.11 CoreDNS has reached General Availability (GA) for Kubernetes Cluster DNS as an alternative to the kube-dns addon. While you still can use kube-dns I added the possibility to use CoreDNS as a kube-dns replacement in the text.

2018-09-30

  • upgrade to k8s_release to 1.11.3 for Kubernetes v1.11.3
  • Docker role now allows to distribute CA file for Docker registry with self signed certificate
  • kube-proxy now uses IPVS (Linux Virtual Server) to manage Kubernetes service IPs via virtual server table in the Linux kernel. Use ipvsadm -Ln on the worker nodes to have a look which entries kube-proxy created.

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.

To allow easy communication between the hosts and their services (etcd, API server, kubelet, …) we installed WireGuard VPN. This gives us some kind of a unified and secure network for our Kubernetes hosts (like a AWS VPC or Google Cloud Engine VPC). Now we need something similar for our pods we want to run in our cluster (let’s call it the pod network - which doesn’t really exists as it is basically mostly routing and NAT’ing/IPVS stuff but it makes things easier to name ;-) ). One part of this puzzle is flannel. flannel is a network fabric for containers, designed for Kubernetes.

First we need a big IP range for that. The default value in my flannel role ansible-role-flanneld is 10.200.0.0/16 (flannel_ip_range). This range is stored in etcd which we already use for kube-apiserver. Flannel will use a /24 subnet for every host where flanneld runs on out of that big IP range we configured. Further every pod on a worker node will get a IP address out of the /24 subnet which flannel uses for a specific host. On the flannel site you can see a diagram that shows pretty good how this works.

As already mentioned I created a role for installing flannel: ansible-role-flanneld. Install the role via

ansible-galaxy install githubixx.flanneld

We use the following settings (most of them are defaults anyway):

# The interface on which the K8s services should listen on. As all cluster
# communication should use the VPN interface the interface name is
# normally "wg0", "tap0" or "peervpn0".
k8s_interface: "wg0"
# Directory where the K8s certificates and other configuration are stored
# on the Kubernetes hosts.
k8s_conf_dir: "/var/lib/kubernetes"
# CNI network plugin directory
k8s_cni_conf_dir: "/etc/cni/net.d"
# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"

etcd_bin_dir: "/usr/local/bin"
etcd_client_port: 2379
etcd_certificates:
  - ca-etcd.pem
  - ca-etcd-key.pem
  - cert-etcd.pem
  - cert-etcd-key.pem

flannel_version: "v0.10.0"
flannel_etcd_prefix: "/kubernetes-cluster/network"
flannel_ip_range: "10.200.0.0/16"
flannel_backend_type: "vxlan"
flannel_cni_interface: "cni0"
flannel_subnet_file_dir: "/run/flannel"
flannel_options_dir: "/etc/flannel"
flannel_bin_dir: "/usr/local/sbin"
flannel_cni_conf_file: "10-flannel"

flannel_systemd_restartsec: "5"
flannel_systemd_limitnofile: "40000"
flannel_systemd_limitnproc: "1048576"
# "ExecStartPre" directive in flannel's systemd service file. This command
# is executed before flannel service starts.
flannel_systemd_execstartpre: "/bin/mkdir -p {{flannel_subnet_file_dir}}"
# "ExecStartPost" directive in flannel's systemd service file. This command
# is execute after flannel service is started. If you run in Hetzner cloud
# this may be important. In this case it changes the TX checksumming offload
# parameter for the "flannel.1" interface. It seems that there is a 
# (kernel/driver) checksum offload bug with flannel vxlan encapsulation 
# (all inside UDP) inside WireGuard encapsulation.
# flannel_systemd_execstartpost: "/sbin/ethtool -K flannel.1 tx off"

flannel_settings:
  "etcd-cafile": "{{k8s_conf_dir}}/ca-etcd.pem"
  "etcd-certfile": "{{k8s_conf_dir}}/cert-etcd.pem"
  "etcd-keyfile": "{{k8s_conf_dir}}/cert-etcd-key.pem"
  "etcd-prefix": "{{flannel_etcd_prefix}}"
  "iface": "{{k8s_interface}}"
  "public-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "subnet-file": "{{flannel_subnet_file_dir}}/subnet.env"
  "ip-masq": "true"
  "healthz-ip": "0.0.0.0"
  "healthz-port": "0" # 0 = disable

flannel_cni_conf: |
  {
    "name": "{{flannel_cni_interface}}",
    "plugins": [
      {
        "type": "flannel",
        "delegate": {
          "hairpinMode": true,
          "isDefaultGateway": true
        }
      },
      {
        "type": "portmap",
        "capabilities": {
          "portMappings": true
        }
      }
    ]
  }

The settings for flannel daemon defined in flannel_settings can be overriden by defining a variable called flannel_settings_user. You can also add additional settings by using this variable. E.g. to override healthz-ip default value and add kubeconfig-file setting add the following settings to group_vars/all.yml or where ever it fit’s best for you:

flannel_settings_user:
  "healthz-ip": "1.2.3.4"
  "kubeconfig-file": "/etc/k8s/k8s.cfg"

Basically there should be no need to change any of the settings if you used mostly the default settings of my other roles so far. Maybe there’re two settings which you may choose to change. etcd-prefix is the path in etcd where flannel will store it’s config object. So with the default above the whole path to the flannel config object in etcd would be /kubernetes-cluster/network/config. Next flannel_ip_range is the big IP range I mentioned above. Don’t make it too small! For every host flannel will choose a /24 subnet out of this range.

Next we extend our k8s.yml playbook file and add the role e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.flanneld
      tags: role-kubernetes-flanneld

As you can see flanneld will be installed on all nodes (group k8s:children includes controller, worker and etcd in my case). I decided to do so because I’ll have Docker running on every host so it makes sense to have one unified network setup for all hosts that run Docker. Be aware that flanneld needs to run BEFORE Docker (the systemd Docker file takes care for you but for now it’s important to install flannel before Docker)! Now you can apply the role to all specifed hosts:

ansible-playbook --tags=role-kubernetes-flanneld k8s.yml

Now we need to install Docker on all of our nodes (you don’t need Docker on the etcd hosts if you used separate nodes for etcd and controller). You can use whatever Ansible Docker playbook you want to install Docker (you should find quite a few out there ;-) ). I created my own because I wanted to use the official Docker binaries archive, overlay FS storage driver and a custom systemd unit file. Be aware that you need to set a few options to make Docker work with flannel overlay network. Also we use Docker 17.03.2-ce. At time of writing this is the latest Docker version which is supported by Kubernetes v1.11.x. If you want to use my Docker playbook you can install it via

ansible-galaxy install githubixx.docker

The playbook has the following default variables:

docker_download_dir: "/opt/tmp"

docker_version: "17.03.2-ce"
docker_user: "docker"
docker_group: "docker"
docker_uid: 666
docker_gid: 666
docker_bin_dir: "/usr/local/bin"

dockerd_settings:
  "host": "unix:///var/run/docker.sock"
  "log-level": "error"
  "storage-driver": "overlay"
  "iptables": "false"
  "ip-masq": "false"
  "bip": ""
  "mtu": "1472"

The settings for dockerd daemon defined in dockerd_settings can be overriden by defining a variable called dockerd_settings_user. You can also add additional settings by using this variable. E.g. to override mtu default value and add debug add the following settings to group_vars/all.yml or where ever it fit’s best for you:

dockerd_settings_user:
  "mtu": "1450"
  "debug": ""

There should be no need to change any of this default values besides maybe storage-driver. If you don’t use my Docker role pay attention to set at least the last four default settings mentioned above correct.

Optional: If you run your own Docker registry it my make sense to distribute the certificate authority file to your worker nodes to make sure that your worker nodes trust the SSL certificate that the registry offers (e.g. if you created a self signed certificate). The role allows you to distribute the CA file:

# The directory from where to copy the Docker CA certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/docker-ca-certificates". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "docker_ca_certificates_src_dir" will have a value of
# "/home/da_user/docker-ca-certificates".
docker_ca_certificates_src_dir: "{{ '~/docker-ca-certificates' | expanduser }}"
# The directory where the program "update-ca-certificates" searches for CA certificate
# files (besides other locations).
docker_ca_certificates_dst_dir: "/usr/local/share/ca-certificates"

As usual place the variables in group_vars/all.yml if you want to change variables. Add the role to our playbook file k8s.yml e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.docker
      tags: role-docker

A word about storage-driver: It makes sense to use a recent kernel for Docker in general. I recommend to use a kernel >=4.14.x if possible. Verify that you have overlayfs filesystem available on your worker instances (execute cat /proc/filesystems | grep overlay. If you see an output you should be fine). If the kernel module isn’t compiled into the kernel you can normally load it via modprobe -v overlay (-v gives us a little bit more information). We’ll configure Docker to use overlayfs by default because it’s one of the best choises (Docker 1.13.x started to use overlayfs by default if available). But you can change the storage driver via the storage-driver setting if you like. Again: Use kernel >=4.14.x if possible!

Now you can roll out Docker role on all nodes using

ansible-playbook --tags=role-docker k8s.yml

In Kubernetes control plane we installed Kubernetes API server, Scheduler and Controller manager on the controller nodes. For the worker I’ve also prepared a Ansible role which installes Kubernetes worker. The Kubernetes part of a worker node needs a kubelet and a kube-proxy daemon. The worker do the “real” work. They run the pods and the Docker container. So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-) The kubelet is responsible to create a pod/container on a worker node if the scheduler had choosen that node to run a pod on. The kube-proxy cares about routes. E.g. if a pod or a service was added kube-proxy takes care to update routing rules in iptables (or maybe IPVS on newer Kubernetes installations) accordingly.

The worker depend on the infrastructure we installed in control plane. The role uses the following variables:

# The directory to store the K8s certificates and other configuration
k8s_conf_dir: "/var/lib/kubernetes"
# The directory to store the K8s binaries
k8s_bin_dir: "/usr/local/bin"
# K8s release
k8s_release: "1.11.3"
# The interface on which the K8s services should listen on. As all cluster
# communication should use a VPN interface the interface name is
# normally "wg0" (WireGuard),"peervpn0" (PeerVPN) or "tap0".
k8s_interface: "wg0"

# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"
# Directory where kubeconfig for Kubernetes worker nodes and kube-proxy
# is stored among other configuration files. Same variable expansion
# rule applies as with "k8s_ca_conf_directory"
k8s_config_directory: "{{ '~/k8s/configs' | expanduser }}"

# K8s worker binaries to download
k8s_worker_binaries:
  - kube-proxy
  - kubelet
  - kubectl

# Certificate/CA files for API server and kube-proxy
k8s_worker_certificates:
  - ca-k8s-apiserver.pem
  - ca-k8s-apiserver-key.pem
  - cert-k8s-apiserver.pem
  - cert-k8s-apiserver-key.pem

# Download directory for archive files
k8s_worker_download_dir: "/opt/tmp"

# Directory to store kubelet configuration
k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"

# kubelet settings
k8s_worker_kubelet_settings:
  "allow-privileged": "true"
  "config": "{{k8s_worker_kubelet_conf_dir}}/kubelet-config.yaml"
  "node-ip": "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  "container-runtime": "docker"
  "image-pull-progress-deadline": "2m"
  "kubeconfig": "{{k8s_worker_kubelet_conf_dir}}/kubeconfig"
  "network-plugin": "cni"
  "cni-conf-dir": "{{k8s_cni_conf_dir}}"
  "cni-bin-dir": "{{k8s_cni_bin_dir}}"
  "cloud-provider": ""
  "register-node": "true"

# kublet kubeconfig
k8s_worker_kubelet_conf_yaml: |
  kind: KubeletConfiguration
  apiVersion: kubelet.config.k8s.io/v1beta1
  address: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  authentication:
    anonymous:
      enabled: false
    webhook:
      enabled: true
    x509:
      clientCAFile: "{{k8s_conf_dir}}/ca-k8s-apiserver.pem"
  authorization:
    mode: Webhook
  clusterDomain: "cluster.local"
  clusterDNS:
    - "10.32.0.254"
  failSwapOn: true
  healthzBindAddress: "{{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}"
  healthzPort: 10248
  runtimeRequestTimeout: "15m"
  serializeImagePulls: false
  tlsCertFile: "{{k8s_conf_dir}}/cert-k8s-apiserver.pem"
  tlsPrivateKeyFile: "{{k8s_conf_dir}}/cert-k8s-apiserver-key.pem"

# Directroy to store kube-proxy configuration
k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"

# kube-proxy settings
k8s_worker_kubeproxy_settings:
  "config": "{{k8s_worker_kubeproxy_conf_dir}}/kubeproxy-config.yaml"

k8s_worker_kubeproxy_conf_yaml: |
  kind: KubeProxyConfiguration
  apiVersion: kubeproxy.config.k8s.io/v1alpha1
  bindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}
  clientConnection:
    kubeconfig: "{{k8s_worker_kubeproxy_conf_dir}}/kubeconfig"
  healthzBindAddress: {{hostvars[inventory_hostname]['ansible_' + k8s_interface].ipv4.address}}:10256
  mode: "ipvs"
  ipvs:
    minSyncPeriod: 0s
    scheduler: ""
    syncPeriod: 2s
  iptables:
    masqueradeAll: true
  clusterCIDR: "10.200.0.0/16"

# CNI network plugin settings
k8s_cni_dir: "/opt/cni"
k8s_cni_bin_dir: "{{k8s_cni_dir}}/bin"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_cni_plugin_version: "0.6.0"

The role will search for the certificates we created in certificate authority in the directory you specify in k8s_ca_conf_directory on the host you run Ansible. The files used here are listed in k8s_certificates. The Kubernetes worker binaries we need are listed in k8s_worker_binaries. The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI archive that we want do download is specified in k8s_cni_plugin_version and will be placed in k8s_cni_dir.

If you created a different VPN interface (e.g. peervpn0) change k8s_interface.

Now add an entry for your worker hosts into Ansible’s hosts file e.g.:

[k8s_worker]
worker0[1:3].i.domain.tld

Install the role via

ansible-galaxy install githubixx.kubernetes-worker

Next add the role to k8s.yml file e.g.:

  hosts: k8s_worker
  roles:
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

Run the playbook via

ansible-playbook --tags=role-kubernetes-worker k8s.yml

Now that we’ve installed basically everything needed for running pods,deployments,services,… we should be able to do a sample deployment. On your laptop run:

kubectl run my-nginx --image=nginx --replicas=4 --port=80

This will deploy 4 pods running nginx. To get a overview what’s running e.g. pods,services,deployments,… run:

kubectl get all -o wide

NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES    SELECTOR
deploy/my-nginx   4         4         4            4           1m        my-nginx     nginx     run=my-nginx

NAME                    DESIRED   CURRENT   READY     AGE       CONTAINERS   IMAGES    SELECTOR
rs/my-nginx-5d69b5ff7   4         4         4         1m        my-nginx     nginx     pod-template-hash=182561993,run=my-nginx

NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE
po/my-nginx-5d69b5ff7-66jgk   1/1       Running   0          1m        10.200.25.2   k8s-worker2
po/my-nginx-5d69b5ff7-kphsd   1/1       Running   0          1m        10.200.5.2    k8s-worker1
po/my-nginx-5d69b5ff7-mwcb6   1/1       Running   0          1m        10.200.5.3    k8s-worker1
po/my-nginx-5d69b5ff7-w888j   1/1       Running   0          1m        10.200.25.3   k8s-worker2

You should be also able to run curl on every master and controller node to get the default page from one of the four nginx webservers. In the case above curl http://10.200.25.2 should work on all nodes in our cluster (flanneld magic ;-) ).

You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:

kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'

10.8.0.111 10.200.0.0/24 
10.8.0.112 10.200.1.0/24 

The IP addresses 10.8.0.111/112 are adresses I assigned to the VPN interface (wg0 in my case) to worker01/02. That’s important since all communication should travel though the VPN interfaces.

If you just want to see if the worker nodes are ready use:

kubectl get nodes -o wide

worker01   Ready     <none>    65d       v1.11.3   10.8.0.111    <none>        Ubuntu 18.04.1 LTS   4.15.0-34-generic   docker://17.3.2
worker02   Ready     <none>    62d       v1.11.3   10.8.0.112    <none>        Ubuntu 18.04.1 LTS   4.15.0-34-generic   docker://17.3.2

At the end we need one last important addon to enable service discovery or DNS resolution within the K8s cluster: kube-dns or CoreDNS. If you use K8s >= v1.11.x I would recommend to use CoreDNS as it is the default now.

CoreDNS

Skip to next section if you want to use kube-dns. If you cloned the ansible-kubernetes-playbooks repository already you find a coredns directory in there with a playbook file called coredns.yml. I’ve added a detailed README to the playbook repository so please follow the instructions there to install CoreDNS. Please be aware that this playbook uses Ansible’s k8s module which was included in Ansible 2.6. The kube-dns playbook (see next section) also works with Ansible < 2.6.

kube-dns

Skip this if you already have installed CoreDNS above. I’ll keep this for reference at the moment but you should go with CoreDNS whenever possible! Also if you can’t use Ansible >= 2.6 you can still use this playbook as the CoreDNS playbook above needs Ansible >= 2.6. If you cloned the ansible-kubernetes-playbooks repository already you find a kubedns directory in there with a playbook file called kubedns.yml. So change directory to kubedns and adjust templates/kubedns.yaml.j2. There’re basically only two settings you need to adjust if you don’t have used my default settings: Search for clusterIP and for cluster.local and change accordingly. You should find the following entries:

clusterIP: 10.32.0.254
--domain=cluster.local
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A

Now run

ansible-playbook kubedns.yml

to roll out kube-dns deployment. If you run

kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide

you should see something like this:

NAME                        READY     STATUS    RESTARTS   AGE       IP           NODE
kube-dns-6c857864fb-bp2kx   3/3       Running   0          32m       10.200.7.9   k8s-worker2

What’s next

There’re a lot more things that could/should be done now but running Heptio Sonobuoy could be a good next step. Heptio Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests in an accessible and non-destructive manner.

Also you may have a look at Ark. Heptio Ark is a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes.

Next up: Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik