Kubernetes the not so hard way with Ansible - The basics - (K8s v1.28)

I created a series of posts about running Kubernetes (K8s for short) managed by Ansible. I have part of my hosts at Hetzner Cloud (mainly because I need a fixed IP for my mailserver and to have an entrypoint to the K8s cluster). I also run part of my Kubernetes cluster VMs on some local machines at home to save costs. That’s all possible because all VMs are connected securely through WireGuard VPN which basically allows to have all VMs on a single subnet no matter where the VMs are located. But in general you should be able to use the Ansible roles mentioned here with minor or no modifications for other ISP e.g. Scaleway or Digital Ocean. I’ll only test this with Ubuntu 20.04 and 22.04. But with no/minimal modifications it should work with all systemd based Linux operating systems.

I used Kelsey Hightower’s wonderful guide Kubernetes the hard way as starting point. My goal is to install a Kubernetes cluster with Ansible which could be used in production and is maintainable. The last point was esp. important to me and during the years my Ansible roles served me well so far 😉

If you need something fast up and running for testing purposes maybe have a look at this projects:

For production like setups one can also use one of these:

Specific for Hetzner Hosting:

To enable the Kubernetes services to communicate securely between the hosts I’ll use WireGuard. Linux kernels >= 5.6 in general and also Ubuntu 20.04 LTS (which includes kernel 5.4 by default) have the wireguard module included. Other distributions with Linux kernel < 5.6 (and no backport of the wireguard module) might need Dynamic Kernel Module Support (DKMS). My WireGuard Ansible Module supports other OSs like Debian, CentOS, Fedora, e.g. (more about that in a later blog post). Kelsey Hightower uses Google Cloud which supports cool networking options but those features are normally not available for other environments. So WireGuard will help to compensate this a little bit as it creates a network at layer 2 with communication encrypted and it’s easy to install e.g. with the WireGuard Ansible role as mentioned above. Also WireGuard integrates very well with the Linux network stack. So every feature the Linux network stack offers can be used with WireGuard too.

I’ve written a four part blog series about Virtualization with Arch Linux and QEMU/KVM. It shows how to install and configure three physical hosts to be used for virtualization and installs three Ubuntu Virtual Machines on each Physical Host (so nine VMs althogether for redundancy). This is a perfect start for this blog series and the setup described there is actually the one I use for this blog post too and it’s production ready. As a reminder the host setup looks like this:

k8s-01              # K8s cluster name
  |-> k8s-010100    # Physical host #1
    |-> k8s-010101  # VM #1 running etcd
    |-> k8s-010102  # VM #2 running K8s control plane
    |-> k8s-010103  # VM #3 running K8s worker
  |-> k8s-010200    # Physical host #2
    |-> k8s-010201  # VM #1 running etcd
    |-> k8s-010202  # VM #2 running K8s control plane
    |-> k8s-010203  # VM #3 running K8s worker
  |-> k8s-010300    # Physical host #3
    |-> k8s-010301  # VM #1 running etcd
    |-> k8s-010302  # VM #2 running K8s control plane
    |-> k8s-010303  # VM #3 running K8s worker

If you want to do something “real” (that means something production like) with your Kubernetes cluster you’ll need at least seven up to nine (or more of course) independent hosts (Physical or Virtual Machines). Three for the Kubernetes controller nodes (control plane with kube-apiserver, kube-scheduler and kube-controller-manager) and another three for etcd (for high availability). It’s also possible to run the etcd service on the controller nodes for smaller clusters. But it’s recommended for production to install etcd on it’s own hosts. Also fast storage is recommended for etcd. Using at least SSD or even better NVMe disks for the etcd hosts makes lot of sense in production. You also need at least one or two nodes for the worker (the nodes that will run the container workload and do the actual work).

For local testing it’s also possible to setup the mentioned Virtual Machines with Hashicorp’s Vagrant. Vagrant (with QEMU/KVM driver e.g.) is also used for the Molecule tests in my Ansible roles. To be able to test my Kubernetes roles before releasing them, there is a Molecule test for my Ansible kubernetes_worker role e.g. This test scenario is setting up a fully functional Kubernetes cluster using various of my Ansible roles. So if you want to have a sneak preview you can check this out.

Another cheap option might be Hetzner Cloud. For smaller workloads at Hetzner Online CX11 instances (1x64 bit core, 2 GB RAM, 20 GB SSD each) are sufficient for the controller nodes if you run Ubuntu 20.04. For Ubuntu 22.04 it might make sense to use a little more memory. So if you run production load you should distribute the services on more hosts and use bigger hosts for the worker (maybe something like CX31 or even bigger). Choosing the right instance types isn’t that easy but in general if you have services which needs lots of RAM and less CPU then less instances with more RAM might make sense. If you need more CPU than RAM for your workloads then more instances with more cores might make sense.

To setup the hosts at Hetzner one can use the hcloud Ansible collection or Hetzner Cloud Console UI. Of course you also find Scaleway and Digital Ocean collection there too which might be also an option. Besides other cloud modules you can also manage VMs that are created via libvirt with the Ansible virt module to deploy VMs locally or remote by using QEMU/KVM e.g. In my blog series Virtualization with Arch Linux and QEMU/KVM I used virt-manager to setup the Virtual Machines.

Another possibility to setup the hosts is Hashicorp’s Terraform. Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. There is a Hetzner Cloud Provider available for Terraform e.g.

In this blog post I’ll try to stick with Ansible where ever possible as one can basically manage everything with it.

If you never heard of Ansible: Ansible is a powerful IT automation engine. Instead of managing and handling your instances or deployments by hand Ansible will do this for you. This is less error prone and everything is repeatable. To do something like installing a package you create a Ansible task. This tasks are organized in playbooks. The playbooks can be modified/customized via variables for hosts, host groups, and so on. A very useful feature of Ansible are roles or collections. E.g. you want to install ten hosts with Apache webserver. In that case you just add a Apache role to that ten hosts and maybe modify some host group variables and roll out Apache webserver on all the hosts you specified. Very easy! For more information read Getting started with Ansible. But I’ll add some comments in my blog posts what’s going on in the roles/playbooks we use.

For Ansible beginners: Also have a look here: ANSIBLE BEST PRACTICES: THE ESSENTIALS

As a side note: I was also thinking about using ImmutableServer and Immutable infrastructure but decided to go with Ansible for now. These immutable server/infrastructure concepts have some real advantages and we also using it in my company very successfully together with the Google Cloud. Using Virtual Machines like Docker container and throw them away at any time is quite cool 😄 . VM images can be created with Hashicorp’s Packer e.g. and rolled out with Ansible, Terraform, or whatever you prefer. When a server starts, a startup-script or cloud-init can setup the VM and its services by reading the instance metadata e.g. But that’s going to far for now. I just wanted to mention it 😉

I’ve already written in my blog post Virtualization with Arch Linux and QEMU/KVM - part 2 about how to setup a Python virtual environment for Ansible. Also in the blog series about virtualization mentioned above I created a Python virtual environment with Ansible installed for my Physical Hosts in /opt/scripts/ansible/k8s-01_phy (/opt/scripts/ansible is my base directory for everything Ansible related). Now I’ll create one for my Virtual Machines in /opt/scripts/ansible/k8s-01_vms. So in /opt/scripts/ansible directory I execute the following command:

python3 -m venv k8s-01_vms

That creates a directory called k8s-01_vms. k8s-01 is for “Kubernetes Cluster 01” and vms for “Virtual Machines”. So everything related to the “Physical Hosts” will be managed by k8s-01_phy (just FYI, not relevant for this blog series) and every thing related to the “Virtual Machines” which runs on the Physical Hosts will go into k8s_01_vms. So lets enter this directory now. Normally you have to activate that venv with

source ./bin/activate

But to make things a bit easier I’ve installed a tool called direnv. It should be included basically in every Linux OS package manager. So just install it with apt, yum, pacman or whatever package manager. You also need a hook. It’s just one line in $HOME/.bashrc. E.g. for Bash:

eval "$(direnv hook bash)"

Please read direnv hook on how to setup (also for other shells). In my venv there is a file called .envrc and it looks like this:

export ANSIBLE_HOME="$(pwd)"

source ./bin/activate

With direnv allow I kinda activate .envrc. So every time I enter this directory now direnv will set ANSIBLE_HOME environment variable and load the venv. ANSIBLE_HOME is by default the “root” directory of other variables e.g. roles_path. So Ansible Roles will be installed in $ANSIBLE_HOME/roles e.g. If I leave that directory it’ll unload the loaded environment and your venv settings are gone. To check if the venv is loaded you can run which python3 or which python. It should point to bin/python3 or bin/python within your venv directory.

Next lets upgrade Python package manager to the latest version (this now only happens within the venv and doesn’t touch you system Python):

python3 -m pip install --upgrade pip

Now lets install Ansible. At time of writting this blog post the latest version is 9.1.0 (just remove the version number if you don’t want to specify it)

python3 -m pip install ansible==9.1.0

In the venv’s bin directory there should be a few Ansible binaries now:

ls -1A bin/ansible*
bin/ansible
bin/ansible-community
bin/ansible-config
bin/ansible-connection
bin/ansible-console
bin/ansible-doc
bin/ansible-galaxy
bin/ansible-inventory
bin/ansible-playbook
bin/ansible-pull
bin/ansible-test
bin/ansible-vault

A very helpful tool is ansible-lint. Ansible Lint is a command-line tool for linting playbooks, roles and collections aimed toward any Ansible users. Its main goal is to promote proven practices, patterns and behaviors while avoiding common pitfalls that can easily lead to bugs or make code harder to maintain. It can be installed with python3 -m pip install ansible-lint. Just run ansible-lint in the venv directory and the tool will give you hints on how to improve your code if needed.

To make life a little bit easier lets create ansible.cfg file in your venv. For now it will only have one entry. E.g.:

[defaults]
inventory = hosts

This avoids that I need to specify -i hosts all the time when running ansible or ansible-playbook. To generate an example ansible.cfg one can use ansible-config init --disabled > /tmp/ansible.cfg. Afterwards inspect /tmp/ansible.cfg for other options you might want to set. Also see Ansible Configuration Settings.

Ubuntu Cloud Images already have python3 installed. So there is normally no need to install it. But if it’s not there please check Install Python on remote hosts.

A tool for installing Ansible roles or collections is ansible-galaxy which is included if you install Ansible. Also have a look at Ansible Galaxy for more information (you can also browse the available roles/collections there).

My Ansible directory structure will look like this (excluding everything related to Python venv) when everything is setup at the end of the blog series (also see Ansible directory layout best practice):

.
├── ansible.cfg
├── certificates
├── .envrc
├── factscache
├── group_vars
│   ├── all.yml
│   ├── cert_manager.yml
│   ├── cilium.yml
│   ├── k8s_all.yml
│   ├── k8s_ca.yml
│   ├── k8s_controller.yml
│   ├── k8s_etcd.yml
│   ├── k8s_worker.yml
│   ├── traefik.yml
├── hosts
├── host_vars
│   └── k8s-010101.i.domain.tld # VM #1 running etcd
│   └── k8s-010102.i.domain.tld # VM #2 running K8s control plane
│   └── k8s-010103.i.domain.tld # VM #3 running K8s worker
│   └── k8s-010201.i.domain.tld # VM #4 running etcd
│   └── k8s-010202.i.domain.tld # VM #5 running K8s control plane
│   └── k8s-010203.i.domain.tld # VM #6 running K8s worker
│   └── k8s-010301.i.domain.tld # VM #7 running etcd
│   └── k8s-010302.i.domain.tld # VM #8 running K8s control plane
│   └── k8s-010303.i.domain.tld # VM #9 running K8s worker
├── k8s.yml
├── kubeconfig
├── playbooks
│   └── ansible-kubernetes-playbooks
│       ├── coredns
└── roles
    ├── githubixx.ansible_role_wireguard
    ├── githubixx.cert_manager_kubernetes
    ├── githubixx.cfssl
    ├── githubixx.cilium_kubernetes
    ├── githubixx.cni
    ├── githubixx.containerd
    ├── githubixx.etcd
    ├── githubixx.haproxy
    ├── githubixx.harden_linux
    ├── githubixx.kubectl
    ├── githubixx.kubernetes_ca
    ├── githubixx.kubernetes_controller
    ├── githubixx.kubernetes_worker
    ├── githubixx.longhorn_kubernetes
    ├── githubixx.lvm
    ├── githubixx.runc
    ├── githubixx.traefik_kubernetes 

Don’t worry if your directories doesn’t contain all the files yet. We’ll get there. Just make sure that at least the top level directories like group_vars, host_vars, playbooks and roles exist. As you can see from the output above group_vars, host_vars, playbooks and roles are directories and hosts and k8s.yml are files. I’ll explain what these directories and files are good for while walking through the blog posts and I’ll also tell a little bit more about Ansible.

Hint: A few variables are needed by more then one role and playbooks. You can put this kind of variables into group_vars/all.yml e.g. But it’s of course up to you where you want to place the variables as long as the roles/playbooks find them when they’re needed 😉 It’s also possible to put the variables into Ansible’s hosts file. Variables that are used by all hosts I’ll put into group_vars/all.yml. Later there will be more files e.g. for every hosts group. All etcd hosts will be in group k8s_etcd and variables that are only needed for these hosts will be placed in group_vars/k8s_etcd.yml. You may organize variables differently of course (also see Using variables).

That’s it for the basics. Continue with harden the instances.