Contents

Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 1) [Updated for Traefik v2.4]

CHANGELOG

2021-05-19

  • update Traefik to v2.4
  • update cert-manager to v1.3

This is an updated version of my older blog post which used Traefik v1.7 for Ingress and also for managing Let’s Encrypt TLS certificates. This new blog post uses Traefik v2.x and cert-manager for managing Let’s Encrypt TLS certificates.

If you followed my blog series Kubernetes the Not So Hard Way With Ansible so far, the Kubernetes installation can only handle internal requests. But most people want to make their services public available. For this to work we need Kubernetes Ingress. More information is provided in the Kubernetes Ingress Resources documentation.

In short: Typically, services and pods have IPs only routeable by the cluster network. All traffic that ends up at an edge router is either dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting, and more. Users request ingress by POSTing the Ingress resource to the Kubernetes API server. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic in an HA manner.

Traefik is such a reverse proxy and load balancer. It supports several backends (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, and a lot more) to manage its configuration automatically and dynamically. If you submit an Ingress object to Kubernetes API server, Traefik as Ingress controller will handle this request and configure the proxy for this request accordingly.

In my previous blog post with Traefik v1 I also used Traefik to dynamically manage Let’s Encrypt TLS certificates to secure the communication between server and client. This is still possible but you need persistent storage to store the acme.json file or the Enterprise version of Traefik if you still want to use etcd as I did in my older blog post. Using the Enterprise Edition of Traefik wasn’t an option for me because the costs are just too high for a private person like me.

With cert-manager exists an option to manage Let’s Encrypt certificates for quite some time. Also cert-manager reached version v1 in the second half of 2020 so it can be considered stable now. cert-manager also supports HashiCorp Vault, Venafi, self signed and internal certificate authorities. So you also get more flexibility. If cert-manager is used with Kubernetes it will install all resources as custom resource definitions (CRDs). So you’ll get new Kubernetes objects like Issuer, Certificate and CertificateRequest. We’ll come back to this topics later.

Install Traefik

I’ve prepared an Ansible role to install Traefik. It’s available at Ansible Galaxy and can be installed via

ansible-galaxy install githubixx.traefik_kubernetes

or you just clone the Github repository in your roles directory:

git clone https://github.com/githubixx/ansible-role-traefik-kubernetes

You also need to have Helm 3 binary installed on that host where ansible-playbook runs. You can either try to use your favorite package manager if your distribution includes helm in its repository or use one of the Ansible Helm roles (e.g. helm or directly download the binary from Helm releases and put it into /usr/local/bin/ directory e.g.) For Archlinux Helm can be installed via sudo pacman -S helm e.g.

Also kubectl should to be installed. It’s normally also available via your package manager or you use my kubectl role. At least you need a proper configured KUBECONFIG which is located at ${HOME}/.kube/config by default (also see Kubernetes the not so hard way with Ansible - Control plane).

Behind the doors it uses the official Helm chart. Currently procedures like installing, updating/upgrading and deleting the Traefik deployment are supported.

The provided default settings are optimized for a bare-metal, on-premise or otherwise self-hosted Kubernetes cluster where Traefik is the public entry point for the Kubernetes services. While the configuration can of course be customized as you can do with any Helm chart, the default settings will setup Traefik with the following configuration:

  • Traefik instances will be deployed as DaemonSet
  • Traefik pods uses hostPort
  • Traefik listens on port 80 on all interfaces of the host for incoming HTTP requests
  • Traefik listens on port 443 on all interfaces of the host for incoming HTTPS requests
  • Traefik dashboard is enabled but is not exposed to the public internet
  • TLS certificates are provided by cert-manager (see part 2)

So lets have a look at the default role variables this role provides (and which can be customized of course) at first:

# Helm chart version (uses Traefik v2.4.8)
traefik_chart_version: "9.19.1"

# Helm release name
traefik_release_name: "traefik"

# Helm repository name
traefik_repo_name: "traefik"

# Helm chart name
traefik_chart_name: "{{ traefik_repo_name }}/{{ traefik_release_name }}"

# Helm chart URL
traefik_chart_url: "https://helm.traefik.io/traefik"

# Kubernetes namespace where Traefik resources should be installed
traefik_namespace: "traefik"

# Directory that contains Helm chart values file. If you specify this
# variable Ansible will try to locate a file called "values.yml.j2" or
# "values.yaml.j2" in the specified directory (".j2" because you can
# use the usual Jinja2 template stuff there). The content of this file
# will be provided to "helm install/template" command as values file.
# By default the directory is the users "$HOME" directory plus
# "/traefik/helm". If the task doesn't find such a file it uses
# the values in "templates/traefik_values_default.yml.j2" by default.
traefik_chart_values_directory: "{{ '~/traefik/helm' | expanduser }}"

If there is a newer Helm chart version available adjust traefik_chart_version accordingly. You can adjust the other variables too but normally there is no need to do so. If you want to install Traefik resources in a different namespace adjust traefik_namespace.

The role contains a default values file for the Helm chart in templates/traefik_values_default.yml.j2. Helm will use these values later to render the YAML manifests needed for Traefik like ServiceAccount, Deployment, and so on. So instead of creating the Kubernetes manifests manually as YAML files, Helm will render these templates with the values specified in templates/traefik_values_default.yml.j2 accordingly. But that are details you normally don’t need to care about as the Ansible role abstracts this details away. If you want to use different values just create a directory somewhere and create a file called values.yml.j2 or values.yaml.j2 there. Set the value of traefik_chart_values_directory to this directory. You can also use templates/traefik_values_default.yml.j2 as a template and adjust it to your needs.

So lets have a look at the values. If needed I’ll add additional comments to the comments that are already in the templates/traefik_values_default.yml.j2 file:

# All possible Helm chart values here can be found at:
# https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml

# These arguments are passed to Traefik's binary. For all options see:
# https://doc.traefik.io/traefik/reference/static-configuration/cli/
#
# First one sets log level accordingly.
#
# Second one sets value of "kubernetes.io/ingress.class"
# annotation to watch for. If a "standard" Kubernetes "Ingress" object
# is submitted to Kubernetes API server (instead of Traefik's own ingress
# implementation called "IngressRoute"), Traefik will handle these requests
# and route them accordingly.
additionalArguments:
  - "--log.level=INFO"
  - "--providers.kubernetesingress.ingressclass=traefik"

If you deploy the role the first time it may make sense to set log.level to DEBUG. In case of problems you might get more information what’s the problem is all about. This should be changed again once you go into production ot avoid extensive logging.

The value of providers.kubernetesingress.ingressclass becomes important later when cert-manager gets installed and a “http solver” will be configured. cert-manager will manage all Let's Encrypt SSL certificates and of course it needs to generate one if it’s not there already or if needs to be renewed. Let's Encrypt needs to verify that you own the domain for which one you want a certificate. So cert-manager will create a certificate request accordingly. Let's Encrypt will “call back” for verification. Before this happens cert-manager has already created an Ingress to intercept this verification request. This Ingress object will contain an annotation called kubernetes.io/ingress.class. This is basically the “signal” for Traefik to handle this Ingress setup if the value of this annotation is traefik. So this value and the cert-manager http01 solver value needs to match. I’ll come back to this when cert-manager gets configured.

# Global arguments passed to Traefik's binary.
# https://doc.traefik.io/traefik/reference/static-configuration/cli/
#
# First one disables periodical check if a new version has been released.
#
# Second one disables anonymous usage statistics. 
globalArguments:
  - "--global.checknewversion=false"
  - "--global.sendanonymoususage=false"

These two options should be quite obvious. We don’t need to check for newer versions as updating/upgrading will be managed by this role anyways if a newer version was specified and the role deployed.

# This creates the Traefik deployment. As "DaemonSet" is specified here
# this will create a Traefik instance on all Kubernetes worker nodes. If
# only a subset of nodes should be used specify "affinity", "nodeSelector"
# or "toleration's" accordingly. See link to Helm chart values above.
deployment:
  enabled: true
  kind: DaemonSet
  dnsPolicy: ClusterFirstWithHostNet

Instead of a Kubernetes Deployment I choose DaemonSet as deployment type. If you don’t have the luxury to run at AWS or Google Cloud where you can use a Service of type LoadBalancer to make your services externally available a DaemonSet is a good alternative.

First of all: What is a DaemonSet? A DaemonSet ensures that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. For the Traefik pods this means that exactly one Traefik pod will run on every worker node. As I only have a few worker nodes that’s ok for me. If you have tens or hundreds of worker nodes then this makes probably not much sense ;-) But that’s not a problem.

As mentioned already in the description above affinity", nodeSelector or tolerations can be used to deploy Traefik only on a few nodes. Maybe it makes even sense to have a dedicated pool of nodes that only run Traefik pods if you can afford it. Also providers like Hetzner, DigitalOcean, Scaleway, and so on offer Load Balancer which can be put “in front” of the Traefik instances. This way you can achieve high availability (HA). If one worker node goes down the Load Balancer can shift traffic to the remaining nodes. For more information also see Assigning Pods to Nodes.

# Instructs Traefik to listen on various ports.
ports:
  # The name of this one can't be changed as it is used for the readiness
  # and liveness probes, but you can adjust its config to your liking.
  # To access the dashboard you can use "kubectl port-forward" e.g.:
  # kubectl -n traefik port-forward $(kubectl get pods --selector "app.kubernetes.io/name=traefik" --output=name -A | head -1) 9000:9000
  # Opening http://localhost:9000/dashboard/ should show the dashboard.
  traefik:
    port: 9000
    expose: false
    protocol: TCP
  # Unsecured incoming HTTP traffic. If you uncomment "redirectTo: websecure"
  # all traffic that arrives at this port will be redirected to "websecure"
  # entry point which means to the entry point that handles secure HTTPS traffic.
  # But be aware that this could be problematic for cert-manager.
  # Also "hostPort" is used. As "DaemonSet" was specified above that basically
  # means that Traefik pods will answer requests on port 80 and 443 on all
  # Kubernetes worker nodes. So if the hosts have a public IP and port 80/443
  # are not protected by firewall, Traefik is available for requests from the
  # Internet (what you normally want in case of Traefik ;-) ) For other
  # options see link above. These settings are useful for baremetal or
  # on-premise solutions with no further loadbalancer.
  web:
    port: 30080
    hostPort: 80
    expose: true
    protocol: TCP
    # redirectTo: websecure
  # Entry point for HTTPS traffic.
  websecure:
    port: 30443
    hostPort: 443
    expose: true
    protocol: TCP

If you really want to access the dashboard (the traefik port above) from outside of your cluster create a secure ingress. That means the ingress should be at least TLS secured (what you can do with the help of cert-manager which I’ll talk about later) and BasicAuth middleware.

# These security settings are basically best practice to limit attack surface
# as good as possible.
# The attack surface can further limited with "seccomp" which is stable since
# Kubernetes v1.19 and allows to limit system calls to a bare minimum. See:
# https://kubernetes.io/docs/tutorials/clusters/seccomp/"
securityContext:
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true
  runAsGroup: 65532
  runAsNonRoot: true
  runAsUser: 65532

# All processes of the container are also part of this supplementary group ID.
podSecurityContext:
  fsGroup: 65532

The blog post Configure a Security Context for a Pod or Container has more information about these settings. As Traefik will be your entrypoint from the Internet to your Kubernetes cluster make sure to keep permissions at the lowest level possible. And Restrict a Container’s Syscalls with Seccomp can even more restrict the permissions for the Traefik binary/container.

# Set log level of general log and enable access log.
logs:
  general:
    level: INFO
  access:
    enabled: true

# As Traefik web/websecure ports are exposed by "hostPort" a service isn't
# needed.
service:
  enabled: false

# CPU and RAM resource limits. These settings should also be set to
# reasonable values in case of a memory leak e.g.
resources:
  requests:
    cpu: "100m"
    memory: "50Mi"
  limits:
    cpu: "300m"
    memory: "150Mi"

But nothing is made in stone ;-) To use your own values just create a file called values.yml.j2 or values.yaml.j2 and put it into the directory specified in traefik_chart_values_directory (which is $HOME/traefik/helm by default). Then this role will use that file to render the Helm values. You can use templates/traefik_values_default.yml.j2 as a template or just start from scratch. As mentioned above you can modify all settings for the Helm chart that are different to the default ones which are located here. And since the source template is just a Jinja2 template you can of course use all Ansible template magic.

After the values file is in place and the defaults/main.yml values are checked the role can be installed. Most of the role’s tasks are executed locally so to say as quite a few tasks need to communicate with the Kubernetes API server or executing Helm commands.

So my traefik entry in Ansible’s hosts file is just

[traefik]
localhost

And in k8s.yml I added

- 
  hosts: traefik
  roles:
    - role: githubixx.traefik_kubernetes
      tags: role-traefik-kubernetes

which I already mentioned in a previous post.

The default action of the role is to just render the Kubernetes resources YAML file after replacing all Jinja2 variables and stuff like that. The role githubixx.traefik-kubernetes has a tag role-traefik-kubernetes assigned as you can see above. Assuming that the values for the Helm chart should be rendered (nothing will be installed in this case) and the playbook is called k8s.yml execute the following command:

ansible-playbook --tags=role-traefik-kubernetes k8s.yml

One of the final tasks is called TASK [githubixx.traefik-kubernetes : Output rendered template]. This allows to check the YAML file before Traefik gets deployed. As you might figure out the output isn’t that “pretty”. To get around this you can either set ANSIBLE_STDOUT_CALLBACK=debug environment variable or stdout_callback = debug in ansible.cfg. If you run the ansible-playbook command again now the YAML output should now look way better.

If the rendered output contains everything you need the role can be installed which finally deploys Traefik:

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=install k8s.yml

To check if everything was deployed use the usual kubectl commands like kubectl -n <traefik_namespace> get pods -o wide. E.g.:

kubectl -n traefik get pods -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
traefik-7fpnt   1/1     Running   0          1m    10.200.2.109   worker01   <none>           <none>
traefik-g8c8n   1/1     Running   0          1m    10.200.3.174   worker02   <none>           <none>
traefik-hskmx   1/1     Running   0          1m    10.200.1.159   worker03   <none>           <none>
traefik-m2gbs   1/1     Running   0          1m    10.200.0.187   worker04   <none>           <none>

To check if Traefik delivers something on port 80 e.g. a simple curl request should output at last “something” ;-) E.g.

curl -v worker01

Or use the IP address of a worker node. But make sure that port 80 and 443 are accessible from other hosts (see Firewall settings below).

As Traefik issues updates/upgrades every few weeks/months the role also can do updates/upgrades. This method can also be used to change existing values without upgrading the Traefik version e.g. Also see Traefik releases before updating Traefik. Changes to the Helm chart can be found in the commit history.

If you want to upgrade Traefik/Helm chart you basically only need to change traefik_chart_version variable e.g. from 9.12.3 to 9.13.0. If only parameters should be changed update the values accordingly.

So to do the Traefik update or to roll out the new values run

ansible-playbook --tags=role-traefik-kubernetes --extra-vars action=upgrade k8s.yml

For more information about the Ansible role see the README.

Firewall settings

After the Traefik role is deployed we need to make sure to open port 80 and 443 so that Let’s Encrypt server can reach the Traefik instances for the acme challenge request and of course to serve http/https requests later. E.g. for the harden-linux role that means to extend the harden_linux_ufw_rules variable to finally have something like this (if you changed the SSHd port to 22222 as recommended otherwise the value will certainly be