Kubernetes the not so hard way with Ansible - Harden the instances - (K8s v1.24)

2020-07-23

  • Fix typos

2020-05-09

  • Updated text to reflect current state of technology

In part 1 I created at least four hosts (or seven hosts if you setup hosts to run etcd cluster on it’s own hosts) for the Kubernetes controller and worker nodes and did a first introduction into Ansible configuration management. Before I’m going to install all the Kubernetes services the hosts should be hardened a little bit and a firewall should be setup. While my Kubernetes roles by default try to bind all Kubernetes services to the VPN interface (I’ll use WireGuard but you can also use PeerVPN, OpenVPN or whatever VPN solution you want) it still makes sense to have a firewall in place just in case.

I’ll use two different DNS entries for every host. As you may have recognized in part 1 of this tutorial I’ve files like controller01.i.domain.tld in the host_vars directory. controller01.i.domain.tld is the DNS entry for the first K8s controller node. The .i. in this hostname means internal. I’ll use the WireGuard IP for this DNS record. All my K8s Ansible roles will use the host names specified in Ansible’s hosts file.

This will ensure that all Kubernetes services only will communicate via the VPN connection. To repeat: It doesn’t matter what kind of VPN software or VPC (like AWS VPC or Google Cloud Network) you use. Just make sure that the connection between the host is encrypted. In my case this will be ensured by WireGuard. The important part here is that the host names that you use in Ansible’s hosts file point to the internal (WireGuard) IP of your VM. If you use AWS VPC or Google’s Cloud Networking you don’t really need a fully meshed VPN solution like WireGuard as you’re already able to configure such a VPC that all the hosts can talk to each other. So in this case you can skip installing WireGuard.

One hint about Hetzner Cloud Networks: Just because they are private doesn’t mean that they are secure 😉 Traffic is NOT encrypted in that case. So for secure communication you still need something like WireGuard.

I also have DNS records for the public IP of the host e.g. controller01.p.domain.tld. .p. means public in my case. I use this DNS records for Ansible to provide the ansible_host variable value. E.g. for controller01.i.domain.tld host there exists a file host_vars/controller01.i.domain.tld which contains (besides other entries) this entry:

ansible_host: "controller01.p.domain.tld"

So all Ansible roles and playbooks will use the internal DNS entry. If Ansible needs to connect to the host controller01.i.domain.tld via SSH to manage the host e.g. it will use the value from ansible_host and this is the public DNS record of that host (which means controller01.p.domain.tld in my case).

One of the reasons for this is that I’ll setup the WireGuard VPN also via Ansible and it simply doesn’t exist yet 😉 So I need the public IP of the host to manage the host via Ansible. On the other hand the K8s internal communication should only use the WireGuard VPN and therefore I use the internal DNS records in Ansible’s hosts file. My Ansible roles will use this internal host names in various places e.g. to build a list of hosts that include the etcd cluster nodes.

Here is an example how host_vars/worker01.i.domain.tld could look like:

---
wireguard_address: "10.8.0.111/24"
wireguard_port: 51820
wireguard_endpoint: "worker01.p.domain.tld"
wireguard_persistent_keepalive: "30"

ansible_host: "worker01.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

So Ansible is using ansible_host value if it connects to that host via SSH and that’s the public IP of that host. The same is true for WireGuard.

To make our Kubernetes nodes a little bit more secure we’ll add some basic security settings and of course we’ll use Ansible to roll it out. Ansible directory layout best practice will tell you to have a separate inventory file for production and staging server. Since I don’t have a staging system I just create a single hosts file now e.g.:

[vpn]
assets.i.domain.tld
controller0[1:3].i.domain.tld
worker0[1:2].i.domain.tld
workstation

[k8s_kubectl]
workstation

[k8s_ca]
workstation

[k8s_etcd]
controller0[1:3].i.domain.tld

[k8s_controller]
controller0[1:3].i.domain.tld

[k8s_worker]
worker0[1:2].i.domain.tld

Adjust the file to your needs of course! The headings in brackets are group names, which are used to classify systems.

One of the first hosts is assets.i.domain.tld. That one contains my Docker registry. It’s also part of the VPN mesh so that Kubernetes hosts are able to fetch container images from there.

As you can see you can use ranges controller0[1:3]... instead of specifying every node here. I also created a group of host groups for all our Kubernetes hosts called [k8s:children]. Now a little bit more information about the host groups mentioned above:

The host group [vpn] includes all hosts that will be part of my fully meshed WireGuard VPN. Fully meshed means that every host can talk to every host in this group with one exception: workstation. This is basically only a VPN client host (my laptop) where I run ansible and ansible-playbook commands. But the real reason why my workstation is part of the WireGuard VPN is that this enables me to run kubectl locally which in turn connects to the K8s API server over a secure WireGuard VPN connection. I’ll explain this in another blog post how it works. Stay tuned for now…

[k8s_kubectl] group (well in this case there is only one host in the group…) is used to execute THE Kubernetes control utility called kubectl later. E.g. if you configured kubectl correctly you can run it directly on the shell on your workstation or through Ansible which in turn executes kubectl locally (if shell or command module is used) or uses the K8s API directly by using the k8s module. kubectl will store it’s configuration in $HOME/.kube/config by default (also the k8s module uses this config).

So if Ansible starts kubectl as user root it will search in the wrong $HOME directory! That’s why it is important to tell Ansible to execute kubectl as the user which generated $HOME/.kube/config. I will use the host specified in the [k8s_kubectl] group later. Of course replace the hostname workstation with the real hostname of your workstation/laptop/chromebook/whatever!

As the group before [k8s_ca] also specifies only workstation. I’ll create some certificates in a later blog post. The generated certificates will be stored locally on my workstation and copied to the Kubernetes hosts as needed by Ansible.

The [k8s_etcd] group contains the hosts which will run the etcd cluster you probably already guessed ;-) In my case that’s the same nodes as the nodes in the [k8s_controller] group. For production you should place the etcd cluster on separate hosts as already mentioned.

The hosts in the [k8s_controller] group runs at least three important Kubernetes components including kube-apiserver, kube-scheduler and kube-controller-manger. The roles harden-linux and kubernetes-controller will be applied to to those Kubernetes hosts.

I also define a group [k8s_worker]. That group contains the nodes which will run the container workloads and do the heavy work. The roles harden-linux, kubernetes-worker, cilium-kubernetes and containerd will be applied to to those Kubernetes hosts. That’s what this group is used for.

Here are examples host_vars files for the hosts in Ansible’s hosts file:

Ansible host file: host_vars/controller01.i.domain.tld:

---
wireguard_address: "10.8.0.101/24"
wireguard_endpoint: "controller01.p.domain.tld"
wireguard_port: 51820

ansible_host: "controller01.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

Ansible host file: host_vars/controller02.i.domain.tld:

---
wireguard_address: "10.8.0.102/24"
wireguard_endpoint: "controller02.p.domain.tld"
wireguard_port: 51820

ansible_host: "controller02.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

Ansible host file: host_vars/controller03.i.domain.tld:

---
wireguard_address: "10.8.0.103/24"
wireguard_endpoint: "controller03.p.domain.tld"
wireguard_port: 51820

ansible_host: "controller03.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

Ansible host file: host_vars/worker01.i.domain.tld:

---
wireguard_address: "10.8.0.111/24"
wireguard_endpoint: "worker01.p.domain.tld"
wireguard_port: 51820
wireguard_persistent_keepalive: "30"

ansible_host: "worker01.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

Ansible host file: host_vars/worker02.i.domain.tld:

---
wireguard_address: "10.8.0.112/24"
wireguard_endpoint: "worker02.p.domain.tld"
wireguard_port: 51820
wireguard_persistent_keepalive: "30"

ansible_host: "worker02.p.domain.tld"
ansible_port: 22222
ansible_user: dauser
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3

Ansible host file: host_vars/workstation

---
wireguard_address: "10.8.0.2/24"
wireguard_endpoint: ""
ansible_connection: local
ansible_become_user: root
ansible_become: true
ansible_become_method: sudo

These are of course just examples. Adjust to your needs! wireguard_address will be used by the WireGuard role to assign the wg0 interface (that’s the default) the specified IP and netmask. As you can see all hosts will be part of the network 10.8.0.0/24 (the WireGuard VPN network for internal K8s communication). You’ll see later what wireguard_endpoint is used for. ansible_host I already explained above. ansible_python_interpreter is needed if you want to use Ansible with Ubuntu 18.04. For my workstation I also defined ansible_connection: local as my workstation is basically localhost for Ansible. To allow Ansible to become root on my workstation I also defined some ansible_become_* variables as you can see above.

I just want to mention that is is possible to use a dynamic inventory plugin for Ansible like ansible-hcloud-inventory or Scaleway dynamic inventory for Ansible or scw_inventory. In this case you just need to tag your instances and the dynamic inventory plugin will discovery the hosts to use for a specific role/task.

To specify what Ansible should install/modify on the hosts I created a playbook file. You already saw it in part 1 in the directory structure and it’s called k8s.yml. It basically contains the host group names and what role that host group (or host) has (e.g. like being a Kubernetes controller or worker node).

For the impatient the file should look like this at the end (when we’re done with the whole tutorial):

---
  hosts: k8s_ca
  roles:
    -
      role: githubixx.cfssl
      tags: role-cfssl
    -
      role: githubixx.kubernetes-ca
      tags: role-kubernetes-ca

-
  hosts: k8s_kubectl
  roles:
    -
      role: githubixx.kubectl
      tags: role-kubectl

-
  hosts: vpn
  roles:
    -
      role: githubixx.ansible_role_wireguard
      tags: role-wireguard

-
  hosts: k8s_etcd
  roles:
    -
      role: githubixx.etcd
      tags: role-etcd

-
  hosts: k8s_controller
  roles:
    -
      role: githubixx.harden-linux
      tags: role-harden-linux
    -
      role: githubixx.kubernetes-controller
      tags: role-kubernetes-controller

-
  hosts: k8s_worker
  roles:
    -
      role: githubixx.harden-linux
      tags: role-harden-linux
    -
      role: githubixx.containerd
      tags: role-containerd
    -
      role: githubixx.cilium_kubernetes
      tags: role-cilium-kubernetes
    -
      role: githubixx.kubernetes-worker
      tags: role-kubernetes-worker

- 
  hosts: traefik
  roles:
    - role: githubixx.traefik_kubernetes
      tags: role-traefik-kubernetes

-
  hosts: cert_manager
  roles:
    - role: githubixx.cert_manager_kubernetes
      tags: role-cert-manager-kubernetes

As already mentioned I created a Ansible role for hardening a Linux installation (see ansible-role-harden-linux). It applies some basic security settings. It’s not perfect of course but it will secure our Kubernetes cluster quite a bit. Feel free to create a pull request if you want to contribute or fork your own version. If you install the role via

ansible-galaxy install githubixx.harden-linux

and then include the role into your playbook (k8s.yml) like in the example above.

In the example above you see that Ansible should apply the role githubixx.harden-linux to all Kubernetes hosts - controller and worker (you really what to harden all hosts of the Kubernetes cluster). Hardening doesn’t ends here of course. There are further things you can do like installing rootkit and vulnerability scanner, IDS (intrusion detection system), collect and ship logs to other hosts for later analysis in case of a host was cracked and things like that. But that’s not the scope of that role.

Regarding the syntax I used above: Later when there is not only one role but a few more or during testing it’s sometimes very handy to apply only one role at a time. That’s possible with the syntax above because if you only want apply the harden-linux role you can run

ansible-playbook --tags=role-harden-linux k8s.yml

This will only run the harden-linux role on the specified hosts.

Additional Ansible hint: Sometimes you only want to apply a role to one specific host e.g. because you only want to test it there before rolling it out on all hosts. Another case could be that you want to upgrade node by node. That’s possible with e.g.

ansible-playbook --tags=role-harden-linux --limit=controller01.i.domain.tld k8s.yml

This works fine as long as you don’t need facts from other hosts. But for the etcd role e.g. you need facts from other hosts. For the etcd role we need to know the IPs of all hosts in the [k8s_etcd] group to render the etcd systemd service file. But if you limit the run to one host Ansible won’t gather the facts of the other hosts and will fail.

One possible workaround is to cache the facts of all hosts. For this to work you need to adjust a few settings in ansible.cfg (/etc/ansible/ansible.cfg by default or in whatever possible location you have put that file like ./ansible.cfg local config file in current working directory):

[defaults]
gathering = smart

fact_caching = jsonfile
fact_caching_connection = /opt/scripts/ansible
fact_caching_timeout = 86400

If you now run

ansible -m setup all

Ansible will gather facts (like network addresses, disk information, RAM, CPU, …) of all hosts. It will store a file (which is called like the host name) in the directory you specified in fact_caching_connection and cache the entries for fact_caching_timeout seconds (in the example above 1 day). This is very useful and I recommend to use this workaround as it saves quite some pain especially while doing your first experiments.

I recommend to run ansible -m setup all after you add a new host or did some major changes like changing IP address of a host. It’s also important to update the cache after you applied the wireguard or peervpn role for the first time. That’s because we add a new host interface which we make Ansible aware of.

One final hint: If you add a new host the default login is again root and maybe a different SSH port (if you change via the harden-linux role). You can specify specific SSH settings like port and login user in the Ansible host_vars directory and the matching hosts file. E.g. you added the file worker04.i.domain.tld then you can temp. define this Ansible variables:

ansible_port: 22
ansible_user: "root"

As you can see worker04 uses different SSH port and user login settings. So you now can apply the harden-linux role to the fresh and unmodified node worker04 and later remove/change this entries and extend the node range from worker0[1:3]... to worker0[1:4]... because worker04 has now the harden-linux role applied and should behave like the older nodes. Of course you can also use various parameters with ansible-playbook command to achieve the same effect.

If you start a new host at Hetzner Cloud or Scaleway you login as root by default (and I guess that’s true for some other provider too). That’s normally not considered a good practice and that’s one thing the role changes. In general the role has the following features:

  • Change root password
  • Add a regular (or call it deploy) user used for administration (e.g. for Ansible or login via SSH)
  • Allow this regular user mentioned above executing commands via sudo
  • Adjust APT update intervals
  • Setup UFW firewall and allow only SSH access by default (add more ports/networks if you like)
  • Adjust sysctl settings (/proc filesystem)
  • Change SSH default port (if requested)
  • Disable SSH password authentication
  • Disable SSH root login
  • Disable SSH PermitTunnel
  • Install sshguard and adjust whitelist
  • Deletes /root/.pw file (contains root password on a Scaleway host but will have no impact on hosts of other provider - it just deletes the file if present)
  • Optional: Install/configure Network Time Synchronization (NTP) e.g. openntpd/ntp/systemd-timesyncd

Ansible roles can be customized via variables. Let’s talk shortly about the variables that are needed to be specified (some variables have no default values). Since I want to apply the harden-linux role to all of my Kubernetes hosts I can create a file in group_vars directory e.g. group_vars/k8s.yml (already mentioned above). Variables in group_vars directory will be applied to a group of hosts. In the example that’s the host group k8s. That’s the group specified in the Ansible hosts file above.

But for now I would recommend to put all variables in group_vars/all.yml file as it makes things easier at the beginning. I just wanted to mention that you can group variables.

I’ll start to fill this file now with the variables that have no defaults but needed for the harden-linux role to work e.g.:

harden_linux_root_password: crypted_pw
harden_linux_deploy_user: deploy
harden_linux_deploy_user_password: crypted_pw
harden_linux_deploy_user_home: /home/deploy
harden_linux_deploy_user_public_keys:
  - /home/deploy/.ssh/id_rsa.pub

With harden_linux_root_password and harden_linux_deploy_user_password I specify the password for the root user and the deploy user. Ansible won’t encrypt the password for you. To create an encrypted password one can do so e.g. with

ansible localhost -m debug -a "msg={{ 'mypassword' | password_hash('sha512', 'mysecretsalt') }}"

Just to mention it: Passwords or secrets in general can be stored and managed with ansbile-vault but that is out of scope of this tutorial.

harden_linux_deploy_user specifies the user I want to use to login at the remote hosts. As already mentioned the harden-linux role will disable root user login via SSH for a good reason. So I need a different user. This user will get sudo permission which is needed for Ansible to do it’s work.

harden_linux_deploy_user_public_keys specifies a list of public SSH key files that should be added to $HOME/.ssh/authorized_keys of the deploy user on the remote host. Specifying /home/deploy/.ssh/id_rsa.pub e.g. as a argument the content of that local file will be added to $HOME/.ssh/authorized_keys of the deploy user on the remote host.

The following variables have defaults (for all possible settings see defaults/main.yml file of that role). Only change if you need another value for the variable.

The role changes some SSHd settings by default:

harden_linux_sshd_settings:
  # Disable password authentication 
  "^PasswordAuthentication": "PasswordAuthentication no"
  # Disable SSH root login  
  "^PermitRootLogin": "PermitRootLogin no"
  # Disable tun(4) device forwarding
  "^PermitTunnel": "PermitTunnel no"
  # Set SSHd port
  "^Port ": "Port 22"

Personally I always change the default SSH port as lot of brute force attacks taking place against this port (but to make that sure: This won’t prevent you from attacks as there are mass scanner out there that are able to scan the whole internet in a few minutes…). So if you want to change the port setting for example you can do so:

harden_linux_sshd_settings_user:
  "^Port ": "Port 22222"

(Please notice the whitespace after “^Port”!). The playbook will combine harden_linux_sshd_settings and harden_linux_sshd_settings_user while the settings in harden_linux_sshd_settings_user have preference which means it will override the ^Port setting/key in harden_linux_sshd_settings.

As you may have noticed all the key’s in harden_linux_sshd_settings and harden_linux_sshd_settings_user begin with ^. That’s because it is a regular expression (regex). One of playbook task’s will search for a line in /etc/ssh/sshd_config e.g. ^Port (while the ^ means “a line starting with …”) and replaces the line (if found) with e.g Port 22222. This makes the playbook very flexible for adjusting settings in sshd_config (you can basically replace every setting). You’ll see this pattern for other tasks too so everything mentioned here holds true in such cases.

The role uses UFW - Uncomplicated Firewall and Ansible’s ufw module to setup firewall rules. UFW is basically just a frontend for Linux iptables. So here are some defaults for the firewall:

harden_linux_ufw_defaults:
  "^IPV6": 'IPV6=yes'
  "^DEFAULT_INPUT_POLICY": 'DEFAULT_INPUT_POLICY="DROP"'
  "^DEFAULT_OUTPUT_POLICY": 'DEFAULT_OUTPUT_POLICY="ACCEPT"'
  "^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="DROP"'
  "^DEFAULT_APPLICATION_POLICY": 'DEFAULT_APPLICATION_POLICY="SKIP"'
  "^MANAGE_BUILTINS": 'MANAGE_BUILTINS=no'
  "^IPT_SYSCTL": 'IPT_SYSCTL=/etc/ufw/sysctl.conf'
  "^IPT_MODULES": 'IPT_MODULES="nf_conntrack_ftp nf_nat_ftp nf_conntrack_netbios_ns"'

These settings are basically changing the values in /etc/defaults/ufw. While these settings are good default settings I need to change one for Kubernetes networking to work: DEFAULT_FORWARD_POLICY="ACCEPT". To override this default setting I add the following text to group_vars/all.yml:

harden_linux_ufw_defaults_user:
  "^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="ACCEPT"'

As already mentioned above this playbook will also combine harden_linux_ufw_defaults and harden_linux_ufw_defaults_user while the settings in harden_linux_ufw_defaults_user have preference which means it will override the ^DEFAULT_FORWARD_POLICY setting in harden_linux_ufw_defaults.

Next I specify some firewall rules with harden_linux_ufw_rules. This is the default:

harden_linux_ufw_rules:
  - rule: "allow"
    to_port: "22"
    protocol: "tcp"

So by default only SSH access is allowed. If you changed the SSH Port setting above to e.g. 22222 you need to add a firewall rule too to allow incoming traffic. Additionally I also add a firewall rule for WireGuard (which uses port 51820/udp by default) which I’ll use in a later blog post:

harden_linux_ufw_rules:
  - rule: "allow"
    to_port: "22222"
    protocol: "tcp"
  - rule: "allow"
    to_port: "51820"
    protocol: "udp"

You can add more settings for a rule like interface, from_ip, … Please have a look at ufw.yml (search for Apply firewall rules) for all possible settings.

You can also allow hosts to communicate on specific networks (without port restrictions). E.g. I add the IP range used for K8s network (the WireGuard subnet in my case) here since that’s the range I’ll use for Kubernetes services and they should be able to communicate without restrictions. I also add the range I’ll use later for Cilium (which is used for pod to pod communication if the pods that want to communicate are located on different hosts). The IP range used for Kubernetes Pods later is 10.200.0.0/16 e.g. (every Pod will get an IP address out of this IP range). If you use the range 10.8.0.0/24 for the WireGuard network (every Kubernetes host gets an IP address out of this IP range), harden_linux_ufw_allow_networks would have the following entries:

harden_linux_ufw_allow_networks:
  - "10.8.0.0/24"
  - "10.200.0.0/16"

If you want to avoid problems regarding the firewall rules blocking your Kubernetes traffic you can start with more relaxed settings and simply allow all three private IP ranges defined in RFC1918 e.g.:

harden_linux_ufw_allow_networks:
  - "10.0.0.0/8"
  - "172.16.0.0/12"
  - "192.168.0.0/16"

Next I change some system variables (sysctl.conf / proc filesystem). This settings are recommendations from Google which they use for their Google Compute Cloud OS images (see GCP - Requirements to build custom images and Configure security best practices). These are the default settings (if you are happy with this settings you don’t have to do anything but I recommend to verify if they work for your setup):

harden_linux_sysctl_settings:
  "net.ipv4.tcp_syncookies": 1                    # Enable syn flood protection
  "net.ipv4.conf.all.accept_source_route": 0      # Ignore source-routed packets
  "net.ipv6.conf.all.accept_source_route": 0      # IPv6 - Ignore ICMP redirects
  "net.ipv4.conf.default.accept_source_route": 0  # Ignore source-routed packets
  "net.ipv6.conf.default.accept_source_route": 0  # IPv6 - Ignore source-routed packets
  "net.ipv4.conf.all.accept_redirects": 0         # Ignore ICMP redirects
  "net.ipv6.conf.all.accept_redirects": 0         # IPv6 - Ignore ICMP redirects
  "net.ipv4.conf.default.accept_redirects": 0     # Ignore ICMP redirects
  "net.ipv6.conf.default.accept_redirects": 0     # IPv6 - Ignore ICMP redirects
  "net.ipv4.conf.all.secure_redirects": 1         # Ignore ICMP redirects from non-GW hosts
  "net.ipv4.conf.default.secure_redirects": 1     # Ignore ICMP redirects from non-GW hosts
  "net.ipv4.ip_forward": 0                        # Do not allow traffic between networks or act as a router
  "net.ipv6.conf.all.forwarding": 0               # IPv6 - Do not allow traffic between networks or act as a router
  "net.ipv4.conf.all.send_redirects": 0           # Don't allow traffic between networks or act as a router
  "net.ipv4.conf.default.send_redirects": 0       # Don't allow traffic between networks or act as a router
  "net.ipv4.conf.all.rp_filter": 1                # Reverse path filtering - IP spoofing protection
  "net.ipv4.conf.default.rp_filter": 1            # Reverse path filtering - IP spoofing protection
  "net.ipv4.icmp_echo_ignore_broadcasts": 1       # Ignore ICMP broadcasts to avoid participating in Smurf attacks
  "net.ipv4.icmp_ignore_bogus_error_responses": 1 # Ignore bad ICMP errors
  "net.ipv4.icmp_echo_ignore_all": 0              # Ignore bad ICMP errors
  "net.ipv4.conf.all.log_martians": 1             # Log spoofed, source-routed, and redirect packets
  "net.ipv4.conf.default.log_martians": 1         # Log spoofed, source-routed, and redirect packets
  "net.ipv4.tcp_rfc1337": 1                       # Implement RFC 1337 fix
  "kernel.randomize_va_space": 2                  # Randomize addresses of mmap base, heap, stack and VDSO page
  "fs.protected_hardlinks": 1                     # Provide protection from ToCToU races
  "fs.protected_symlinks": 1                      # Provide protection from ToCToU races
  "kernel.kptr_restrict": 1                       # Make locating kernel addresses more difficult
  "kernel.perf_event_paranoid": 2                 # Set perf only available to root

You can override every single setting. For Kubernetes we’ll override the following settings to allow packet forwarding which is needed for the pod network (again put it into group_vars/all.yml):

harden_linux_sysctl_settings_user:
  "net.ipv4.ip_forward": 1
  "net.ipv6.conf.default.forwarding": 1
  "net.ipv6.conf.all.forwarding": 1

One of the Ansible’s role task will combine harden_linux_sysctl_settings and harden_linux_sysctl_settings_user while again harden_linux_sysctl_settings_user settings have preference. Again have a look at defaults/main.yml file of the role for more information about the settings.

If you want UFW (firewall) logging enabled set:

harden_linux_ufw_logging: 'on'

Possible values are on,off,low,medium,high and full.

And finally there are the SSHGuard settings. SSHGuard protects from brute force attacks against SSH (and other services). To avoid locking out yourself for a while you can add IPs or IP ranges to a whitelist. By default it’s basically only “localhost”:

harden_linux_sshguard_whitelist:
  - "127.0.0.0/8"
  - "::1/128"

I recommend to additionally add your WireGuard ranges here too at least. Also think about adding the IP of the host you administer the Kubernetes cluster and/or the IP of the host you run Ansible (maybe a Jenkins host e.g.).

Now I can apply the role to the hosts:

ansible-playbook --tags=role-harden-linux k8s.yml

Afterwards I configure Ansible to use the user provided as value for the harden_linux_deploy_user variable. That user will have sudo permissions to run all commands. If you add a new worker sometime in the future you should apply this role first (and only this role!) to the new host. To limit the execution of the playbook to the new host execute

ansible-playbook --tags=role-harden-linux --limit=host.i.domain.tld k8s.yml

(replace host.i.domain.tld with the the actual hostname of course).

After the harden-linux role was applied add a few more settings to group_vars/all.yml or for a specific host to host_vars/host.i.domain.tld e.g. If you followed my recommendation to change the SSHd port to e.g. “22222” you also need to change the ansible_port variable like:

ansible_user: deploy
ansible_become: true
ansible_become_method: sudo
ansible_port: 22222

Feel free to specify additional Ansible parameters here too of course. ansible_user tells Ansible which user to use to login at the remote hosts. With ansible_become Ansible will execute tasks with sudo by default. As you might remember the harden-linux role will give the deploy user sudo permissions to execute tasks. I also changed the default SSH port in the harden-linux role as already stated. This normally reduces the amount of SSH login attacks. This is of course security by obscurity but at least at the moment it makes still sense IMHO.

But now I need to specify the port parameter all the time if I want to ssh to the host you may object. Don’t fear. Just create a file $HOME/.ssh/config and add e.g.:

Host *.i.domain.tld *.p.domain.tld
  Port 22222
  User deploy

Of course you need to replace *.(i|p).domain.tld with more useful values (e.g. .example.net if you own that domain ;-) ). Now you can use SSH as you did before and don’t need to worry about the SSH port anymore.

Now that the hosts are secured a little bit head over to the next part of the tutorial! There I’ll install WireGuard and use it as a (not so poor man’s) replacement for AWS VPC or Google Cloud Networking.