Kubernetes the not so hard way with Ansible (at Scaleway) - Part 7 - The worker [updated for K8s v1.8]

February 20, 2017

CHANGELOG

2017-11-19

  • update to flannel 0.9.1

2017-10-10

  • update to flannel 0.9.0
  • flanneld config now uses VXLAN backend by default
  • add –healthz-ip and –healthz-port options to flanneld systemd service file
  • removed alsologtostderr option from systemd service file
  • use variable for flannel subnet directory
  • update CNI plugin to 0.6.0
  • variable local_cert_dir changed to k8s_ca_conf_directory / added k8s_ca_conf_directory
  • Docker update to 17.03.2-ce
  • added –masquerade-all to kube-proxy settings to avoid DNS problems
  • added healthz-bind-address and healthz-port option to kube-apiserver
  • added task to install several needed network packages
  • added missing default variable k8s_controller_manager_cluster_cidr
  • changed variable k8s_download_dir to k8s_worker_download_dir
  • a few fixes in the role
  • rename local_cert_dir -> k8s_ca_conf_directory
  • rename k8s_cni_plugins -> k8s_cni_plugin_version
  • removed k8s_kubelet_token as we now use RBAC (RBAC everywhere ;-) )

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping Kubernetes Workers.

To allow easy communication between the hosts and their services (etcd, API server, kubelet, …) we installed PeerVPN . This gives us some kind of a unified and secure network for our Kubernetes hosts (like a AWS VPC or Google Cloud Engine VPC). Now we need the same for our pods we want to run in our cluster - a.k.a. the pod network. For this we use flannel. flannel is a network fabric for containers, designed for Kubernetes. First we need a big IP range for that. The default value in my flannel role ansible-role-flanneld is 10.200.0.0/16. This range is stored in etcd. Flannel will use a /24 subnet for every host where flanneld runs on out of that big IP range we configured. Further every pod on a worker node will get a IP address out of the /24 subnet which flannel uses for a specific host. On the flannel site you can see a diagram that shows pretty good how this works.

As already mentioned I created a role for installing flannel: ansible-role-flanneld. Install the role via

ansible-galaxy install githubixx.flanneld

The role has the following default settings:

k8s_interface: "tap0"
k8s_conf_dir: "/var/lib/kubernetes"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_ca_conf_directory: "/etc/k8s/certs"

etcd_conf_dir: "/etc/etcd"
etcd_bin_dir: "/usr/local/bin"
etcd_client_port: 2379
etcd_certificates:
  - ca-etcd.pem
  - ca-etcd-key.pem
  - cert-etcd.pem
  - cert-etcd-key.pem

flannel_version: "v0.9.1"
flannel_etcd_prefix: "/kubernetes-cluster/network"
flannel_ip_range: "10.200.0.0/16"
flannel_backend_type: "vxlan"
flannel_cni_name: "podnet"
flannel_subnet_file_dir: "/run/flannel"
flannel_options_dir: "/etc/flannel"
flannel_bin_dir: "/usr/local/sbin"
flannel_ip_masq: "true"
flannel_cni_conf_file: "10-flannel"
flannel_healthz_ip: "0.0.0.0"
flannel_healthz_port: "0" # 0 = disable

Basically there should be no need to change any of the settings if you used mostly the default settings of my other roles so far. Maybe there’re two settings which you may choose to change. flannel_etcd_prefix is the path in etcd where flannel will store it’s config object. So with the default above the whole path to the flannel config object in etcd would be /kubernetes-cluster/network/config. Next flannel_ip_range is the big IP range I mentioned above. Don’t make it too small! For every host flannel will choose a /24 subnet out of this range.

Next we extend our k8s.yml playbook file and add the role e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.flanneld
      tags: role-kubernetes-flanneld

As you can see flanneld will be installed on all nodes (group k8s:children includes controller, worker and etcd in my case). I decided to do so because I’ll have Docker running on every host so it makes sense to have one unified network setup for all Docker daemones. Be aware that flanneld needs to run BEFORE Docker! Now you can apply the role to all specifed hosts:

ansible-playbook --tags=role-kubernetes-flanneld k8s.yml

Now we need to install Docker on all of our nodes (you don’t need Docker on the etcd hosts if you used separate nodes for etcd and controller). You can use whatever Ansible Docker playbook you want to install Docker (you should find quite a few out there ;-) ). I created my own because I wanted to use the official Docker binaries archive, overlay FS storage driver and a custom systemd unit file. Be aware that you need to set a few options to make Docker work with flannel overlay network. Also we use Docker 17.03.2-ce. At time of writing this is the latest Docker version which is supported by Kubernetes v1.8. If you want to use my Docker playbook you can install it via

ansible-galaxy install githubixx.docker

The playbook has the following default variables:

docker_download_dir: /opt/tmp

docker_version: 17.03.2-ce
docker_user: docker
docker_group: docker
docker_uid: 666
docker_gid: 666
docker_bin_dir: /usr/local/bin
docker_storage_driver: overlay
docker_log_level: error
docker_iptables: false
docker_ip_masq: false
docker_bip: ""
docker_mtu: 1472

There should be no need to change any of this default values besides maybe docker_storage_driver. If you don’t use my Docker role pay attention to set at least the last four default settings mentioned above correct. As usual place the variables in group_vars/k8s.yml if you want to change variables. Add the role to our playbook file k8s.yml e.g.:

-
  hosts: k8s:children
  roles:
    -
      role: githubixx.docker
      tags: role-docker

A word about docker_storage_driver: Since we use Ubuntu 16.04 at Scaleway we should have already a very recent kernel running (at time of writing this blog post it was kernel 4.8.x or 4.10.x on my VPS instance). It makes sense to use a recent kernel for Docker in general. Ubuntu 16.04 additionally provides kernel 4.4.x and 4.8.x. I recommend to use 4.10.x if possible. Verify that you have overlayfs filesystem available on your worker instances (execute cat /proc/filesystems | grep overlay. If you see an output you should be fine). In my case the overlayfs is compiled into the kernel. If it’s not compiled in you can normally load it via modprobe -v overlay (-v gives us a little bit more information). We’ll configure Docker to use overlayfs by default because it’s one of the best choises (Docker 1.13.x started to use overlayfs by default if available). But you can change the storage driver via the docker_storage_driver variable if you like. Again: Use kernel >=4.8 if possible!

Now you can roll out Docker role on all nodes using

ansible-playbook --tags=role-docker k8s.yml

In part 6 we installed Kubernetes API server, Scheduler and Controller manager on the controller nodes. For the worker I’ve also prepared a Ansible role which installes Kubernetes worker. The Kubernetes part of a worker node needs a kubelet and a kube-proxy daemon. The worker do the “real” work. They run the pods and the Docker container. So in production and if you do real work it won’t hurt if you choose bigger iron for the worker hosts ;-) The kubelet is responsible to create a pod/container on a worker node if the scheduler had choosen that node to run a pod on. The kube-proxy cares about routes. E.g. if a pod or a service was added kube-proxy takes care to update routing rules in iptables accordingly.

The worker depend on the infrastructure we installed in part 6. The playbook uses the following variables:

k8s_conf_dir: "/var/lib/kubernetes"
k8s_bin_dir: "/usr/local/bin"
k8s_release: "1.8.0"
k8s_interface: "tap0"

k8s_ca_conf_directory: "/etc/k8s/certs"
k8s_config_directory: "/etc/k8s/configs"

k8s_worker_binaries:
  - kube-proxy
  - kubelet
  - kubectl

k8s_worker_certificates:
  - ca-k8s-apiserver.pem
  - ca-k8s-apiserver-key.pem
  - cert-k8s-apiserver.pem
  - cert-k8s-apiserver-key.pem
  - cert-kube-proxy.pem
  - cert-kube-proxy-key.pem

k8s_worker_download_dir: "/opt/tmp"

k8s_worker_kubelet_conf_dir: "/var/lib/kubelet"
k8s_worker_kubelet_serialize_image_pulls: "false"
k8s_worker_kubelet_allow_privileged: "true"
k8s_worker_kubelet_container_runtime: "docker"
k8s_worker_kubelet_docker: "unix:///var/run/docker.sock"
k8s_worker_kubelet_image_pull_progress_deadline: "2m"
k8s_worker_kubelet_register_node: "true"
k8s_worker_kubelet_runtime_request_timeout: "10m"
k8s_worker_kubelet_cadvisor_port: "4194" # port or "0" to disable
k8s_worker_kubelet_cloud_provider: ""
k8s_worker_kubelet_healthz_port: "10248"

k8s_worker_kubeproxy_conf_dir: "/var/lib/kube-proxy"
k8s_worker_kubeproxy_proxy_mode: "iptables"

k8s_controller_manager_cluster_cidr: "10.200.0.0/16"

k8s_cluster_dns: "10.32.0.254"
k8s_cluster_domain: "cluster.local"

k8s_network_plugin: "cni"

k8s_cni_dir: "/opt/cni"
k8s_cni_bin_dir: "{{k8s_cni_dir}}/bin"
k8s_cni_conf_dir: "/etc/cni/net.d"
k8s_cni_plugin_version: "0.6.0"

The playbook will search for the certificates we created in part 4 in the directory you specify in k8s_ca_conf_directory on the host you run Ansible. The files used here are listed in k8s_certificates. The Kubernetes worker binaries we need are listed in k8s_worker_binaries. The Kubelet can use CNI (the Container Network Interface) to manage machine level networking requirements. The CNI archive that we want do download is specified in k8s_cni_plugin_version and will be placed in k8s_cni_dir.

If you created a different PeerVPN interface (e.g. peervpn0) change k8s_interface.

Now add an entry for your worker hosts into Ansible’s hosts file e.g.:

[k8s_worker]
worker[1:3].your.tld

Install the role via

ansible-galaxy install githubixx.kubernetes-worker

Next add the role to k8s.yml file e.g.:

  hosts: k8s_worker
  roles:
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

Run the playbook via

ansible-playbook --tags=role-kubernetes-worker k8s.yml

Now that we’ve installed basically everything needed for running pods,deployments,services,… we should be able to do a sample deployment. On your laptop run:

kubectl run my-nginx --image=nginx --replicas=4 --port=80

This will deploy 4 pods running nginx. To get a overview what’s running e.g. pods,services,deployments,… run:

kubectl get all -o wide

NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES    SELECTOR
deploy/my-nginx   4         4         4            4           1m        my-nginx     nginx     run=my-nginx

NAME                    DESIRED   CURRENT   READY     AGE       CONTAINERS   IMAGES    SELECTOR
rs/my-nginx-5d69b5ff7   4         4         4         1m        my-nginx     nginx     pod-template-hash=182561993,run=my-nginx

NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE
po/my-nginx-5d69b5ff7-66jgk   1/1       Running   0          1m        10.200.25.2   k8s-worker2
po/my-nginx-5d69b5ff7-kphsd   1/1       Running   0          1m        10.200.5.2    k8s-worker1
po/my-nginx-5d69b5ff7-mwcb6   1/1       Running   0          1m        10.200.5.3    k8s-worker1
po/my-nginx-5d69b5ff7-w888j   1/1       Running   0          1m        10.200.25.3   k8s-worker2

You should be also able to run curl on every master and controller node to get the default page from one of the four nginx webservers. In the case above curl http://10.200.25.2 should work on all nodes in our cluster (flanneld magic ;-) ).

You can output the worker internal IPs and the pod CIDR’s that was assigned to that host with:

kubectl get nodes --output=jsonpath='{range .items[*]}{.status.addresses[?(@.type=="InternalIP")].address} {.spec.podCIDR} {"\n"}{end}'

10.3.0.211 10.200.0.0/24 
10.3.0.212 10.200.1.0/24 

The IP adress 10.3.0.211/212 are adressess I assigned to the PeerVPN interface to worker1/2. That’s important since all communication should travel though the PeerVPN interfaces.

If you just want to see if the worker nodes are ready use:

kubectl get nodes -o wide

NAME          STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
k8s-worker1   Ready     <none>    10m       v1.8.0    <none>        Ubuntu 16.04.3 LTS   4.10.8-docker-1   docker://Unknown
k8s-worker2   Ready     <none>    10m       v1.8.0    <none>        Ubuntu 16.04.3 LTS   4.10.8-docker-1   docker://Unknown

Now finally we install KubeDNS to enable services in our K8s cluster to do DNS lookups of internal services (service discovery) in a predictable way. If you cloned the ansible-kubernetes-playbooks repository already you find a kubedns directory in there with a playbook file called kubedns.yml. For this to work we need to add a few variables to Ansibles host_vars/all e.g.:

# Service IP of our DNS server
k8s_cluster_dns: 10.32.0.254
# Domain of our cluster
k8s_cluster_domain: cluster.local

Now run

ansible-playbook kubedns.yml

to roll out KubeDNS deployment. If you run

kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide

you should see something like this:

NAME                       READY     STATUS    RESTARTS   AGE       IP            NODE
kube-dns-d44664bbd-4g4nf   3/3       Running   0          3m        10.200.5.6    k8s-worker1
kube-dns-d44664bbd-xhhwx   3/3       Running   0          3m        10.200.25.6   k8s-worker2

That’s it for part 7. What’s still missing is a ingress controller (e.g. with nginx or traeffic) to “route” HTTP requests into your cluster/pods and a possibility to work with Kubernetes network policies (e.g. with kube-router as flannel doesn’t enforce network policies. Also Project Calico is an option to enforce network policies (and more). I’ll address this in a later blog post. Also of course updating the whole Kubernetes components if a new release is shipped needs to be addressed. I haven’t tested it but if you change the Ansible variables accordingly and roll out role by role and node by node the update could work. But at least you should have a look at kubectl drain to move workload of a node before you update Docker for example.

There’re a lot more things that could/should be done now but running Heptio Sonobuoy could be a good next step. Heptio Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests in an accessible and non-destructive manner.

But for now: Have fun with your K8s cluster! ;-)