Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik v2 and cert-manager (Part 2) [Updated for Traefik v2.8]

2022-02-02

  • update cert-manager to v1.6.1

2021-09-12

  • update Traefik to v2.5
  • update cert-manager to v1.5

2021-05-19

  • update Traefik to v2.4
  • update cert-manager to v1.3

In part 1 I installed Traefik proxy. So it’s now basically possible to expose Kubernetes services to the Internet. But nowadays traffic should be encrypted whenever possible. And even if you don’t think you need it, think about SEO. Google ranks sites with encrypted traffic higher e.g.

So cert-manager can be installed to automatically get TLS certificates from Let’s Encrypt. The certificates can then be used by Traefik to enable SSL for an Ingress. cert-manager will also take care to keep them up to date.

As with Traefik I’ve also prepared an Ansible role to install cert-manager. It’s available at Ansible Galaxy and can be installed via

ansible-galaxy install githubixx.cert_manager_kubernetes

or you just clone the Github repository in your roles directory:

git clone https://github.com/githubixx/ansible-role-cert-manager-kubernetes

Like for the Traefik role you also need Helm 3 and kubectl plus a proper configured KUBECONFIG.

Behind the doors it uses the official Helm chart. Currently procedures like installing, updating/upgrading and deleting the cert-manager deployment are supported.

So lets have a look at the available role variables:

# Helm chart version
cert_manager_chart_version: "v1.6.1"

# Helm release name
cert_manager_release_name: "cert-manager"

# Helm repository name
cert_manager_repo_name: "jetstack"

# Helm chart name
cert_manager_chart_name: "{{ cert_manager_repo_name }}/{{ cert_manager_release_name }}"

# Helm chart URL
cert_manager_chart_url: "https://charts.jetstack.io"

# Kubernetes namespace where cert-manager resources should be installed
cert_manager_namespace: "cert-manager"

# The following list contains the configurable parameters of the cert-manager
# Helm chart. For all possible values see:
# https://artifacthub.io/packages/helm/jetstack/cert-manager#configuration
# But for most users "installCRDs=true" should be sufficient.
# If true, CRD resources will be installed as part of the Helm chart.
# If enabled, when uninstalling CRD resources will be deleted causing all
# installed custom resources to be DELETED.
cert_manager_values:
  - installCRDs=true
  - global.leaderElection.namespace="{{ cert_manager_namespace }}"

# To install "ClusterIssuer" for Let's Encrypt (LE) "cert_manager_le_clusterissuer_options"
# needs to be defined. The variable contains a list of hashes and can be defined
# in "group_vars/all.yml" e.g.
#
# name:   Defines the name of the "ClusterIssuer"
# email:  Use a valid e-mail address to be alerted by LE in case a certificate
#         expires
# server: Hostname part of the LE URL
# private_key_secret_ref_name:  Name of the secret which stores the private key
# solvers_http01_ingress_class: Value of "kubernetes.io/ingress.class" annotation.
#                               Depends on your ingress controller. Common values
#                               are "traefik" for Traefik or "nginx" for nginx.
#
# Besides "email" the following values can be used as is and will create valid
# "ClusterIssuer" for Let's Encrypt staging and production. Only "email" needs
# to be adjusted if Traefik is used as ingress controller. For other ingress
# controllers "solvers_http01_ingress_class" needs to be adjusted too. Currently
# only "ClusterIssuer" and "http01" solver is implemented. For definition also
# see "tasks/install-issuer.yml".
#
cert_manager_le_clusterissuer_options:
  - name: letsencrypt-prod
    email: insert@your-e-mail-address.here
    server: acme-v02
    private_key_secret_ref_name: letsencrypt-account-key
    solvers_http01_ingress_class: "traefik"
  - name: letsencrypt-staging
    email: insert@your-e-mail-address.here
    server: acme-staging-v02
    private_key_secret_ref_name: letsencrypt-staging-account-key
    solvers_http01_ingress_class: "traefik"

First check if you want to change any of the default values in default/main.yml. As usual those values can be overridden in host_vars or group_vars. Normally there is no need to change that much. Besides the cert_manager_chart_version you might want do add a few options to cert_manager_values. It contains the configurable parameters of the cert-manager Helm chart. The list is submitted “as is” to helm binary for template, install or upgrade commands.

My cert_manager entry in Ansible’s hosts file is just

[cert_manager]
localhost

And in k8s.yml I added

-
  hosts: cert_manager
  roles:
    - role: githubixx.cert_manager_kubernetes
      tags: role-cert-manager-kubernetes

The default action is to just render the Kubernetes resources YAML file after replacing all Jinja2 variables and stuff like that (that means not specifying any value via --extra-vars action=... to ansible-playbook).

So to render the YAML files that WOULD be applied (nothing will be installed at this time) and the playbook is called k8s.yml execute the following command (as mentioned in the previous blog post you may set ANSIBLE_STDOUT_CALLBACK=debug environment variable or stdout_callback = debug in ansible.cfg to get a pretty printed output):

ansible-playbook --tags=role-cert-manager-kubernetes k8s.yml

If the rendered output contains everything you need, the role can be installed which finally deploys cert-manager (still assuming the playbook file is called k8s.yml - if not please adjust accordingly):

ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install k8s.yml

To check if everything was deployed use the usual kubectl commands like kubectl -n <cert_manager_namespace> get pods -o wide. E.g.

kubectl -n cert-manager get pods -o wide

NAME                                       READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
cert-manager-76d899dd6c-8q8bx              1/1     Running   0          1m    10.200.3.146   worker01   <none>           <none>
cert-manager-cainjector-68c96b7844-wrnr5   1/1     Running   0          1m    10.200.2.32    worker02   <none>           <none>
cert-manager-webhook-5bb449596f-5pbqx      1/1     Running   0          1m    10.200.3.249   worker01   <none>           <none>

Before the playbook finishes it waits for the first cert-manager-webhooks pod to become ready. In general wait until all cert-manager pods are ready before you try to get the first certificate.

The role currently supports deploying a ClusterIssuer for Let’s Encrypt (LE) for LE staging and production. The most relevant variable in this case is cert_manager_le_clusterissuer_options. Please see the role variables above for more information.

After cert_manager_le_clusterissuer_options variable is adjusted accordingly the ClusterIssuer can be installed:

ansible-playbook --tags=role-cert-manager-kubernetes --extra-vars action=install-issuer k8s.yml

After deploying the issuer the first time it takes a little bit until they are ready. To figure out if they are ready kubectl can be used:

kubectl get clusterissuer.cert-manager.io

NAME                  READY   AGE
letsencrypt-prod      True    10m
letsencrypt-staging   True    11m

Before a Certificate can be requested make sure that the DNS entry for the domain you want to get a certificate points to one of the Traefik instances or to the loadbalancer IP that you might have placed “in front” of the Traefik instances.

Now a certificate can be issued. This happens outside of this Ansible role. E.g. to get a certificate for domain www.domain.name from Let’s Encrypt staging server (this one is only for testing and doesn’t issue a valid certificate that browsers will accept) create a YAML file (e.g. domain-name.yaml) like this:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-name
  namespace: namespace-name
spec:
  commonName: www.domain.name
  secretName: secret-name
  dnsNames:
    - www.domain.name
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer

issuerRef.name: letsencrypt-staging points to Let’s Encrypt staging API. Before switching to production API letsencrypt-prod make sure that staging works fine. The production API has some rate limiting. So if you experiment to much with this issuer Let’s Encrypt might block you for a while. So after changing the values to your needs, apply this file with kubectl apply -f domain-name.yaml.

If you request a (Cluster)Issuer or a Certificate you can watch cert-manager logs to see what’s going on e.g. (in case you use a different namespace for cert-manager change the namespace accordingly):

kubectl -n cert-manager logs --tail=5 -f $(kubectl -n cert-manager get pods -l app=cert-manager --output=jsonpath='{.items..metadata.name}')

To get information about a Certificate this command can be used:

kubectl -n your-namespace get certificate cert-name -o json

Esp. watch out if the Certificate is ready e.g.:

kubectl -n your-namespace get certificate your-certificate -o json | jq '.status.conditions'

[
  {
    "lastTransitionTime": "2021-01-03T22:05:59Z",
    "message": "Certificate is up to date and has not expired",
    "reason": "Ready",
    "status": "True",
    "type": "Ready"
  }
]

For more information also see the README of the role.

Now that the (staging) certificate is in place we finally create an IngressRoute. IngressRoute is a Traefik specific custom implementation of Ingress. The IngressRoute will use the certificate which is stored as a Kubernetes secret that cert-manager fetched from Let’s Encrypt:

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: www-domain-name
  namespace: namespace-name
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`www.domain.name`)
      services:
        - kind: Service
          name: service-name
          namespace: namespace-name
          passHostHeader: true
          port: 80
  tls:
    secretName: cert-name

This manifest specifies an IngressRoute called www-domain-name in namespace namespace-name. It’s bound to the web and websecure entrypoints. There is also a Rule. It’ll trigger if the incoming request wants to fetch a page from www.domain.name and will forward the request to a Service called service-name in namespace namespace-name. Finally a secretName called cert-name is specified. That’s the reference to the Certificate which was created above.

If you save the manifest now to ingressroute.yaml it can be applied: kubectl apply -f ingressroute.yaml. This will create the resource and you should see the IngressRoute in the Traefik dashboard (how to access it see above).

That’s it basically! :-)

You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS record to one of the Traefik instances (which basically means to one of the Traefik DaemonSet members) and the host dies you’re out of business for a while. Also if you use DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes. One solution to this problem could be a managed loadbalancer as already mentioned further above.

If you can change your DNS records via API (which is the case for Google Cloud DNS or OVH DNS e.g.) you could deploy a Kubernetes Cron Job that monitors all Traefik instances and changes DNS records if one of the nodes fail or you can implement a watchdog functionality yourself and deploy the program as pod into your K8s cluster. This can also depend on Prometheus metrics.

But one of the best options to solve the problem is probably MetalLB. Also see Configuring HA Kubernetes cluster on bare metal servers with GlusterFS & MetalLB and What you need to know about MetalLB.

If you use Hetzner cloud hcloud-fip-controller is a possible option that might be sufficient for maybe quite a few use cases. hcloud-fip-controller is a small controller, to handle floating IP management in a Kubernetes cluster on Hetzner cloud virtual machines.

There is also kube-vip. The kube-vip project provides High-Availability and load-balancing for both inside and outside a Kubernetes cluster.

Next up: Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes