Kubernetes the Not So Hard Way With Ansible - Ingress with Traefik [Updated for v1.7]

2021-01-13

IMPORTANT NOTE:

  • I no longer update this blog post but I’ll leave it for reference. Traefik v1 will run out of support in 2021 so it’s time to move on. Also for automatic TLS certificates it makes more sense to use cert-manager which also reached version 1 now. Please see my new blog post with Traefik v2 and cert-manager.

2020-08-13

  • update to Traefik v1.7.11 from v1.7.26
  • update etcd_version to 3.4.7

2019-05-07

  • update to Traefik v1.7.11 from v1.7.8

2019-01-31

  • update to Traefik v1.7.8 from v1.7.7

2019-01-16

  • update to Traefik v1.7.7 from v1.6
  • as the web module/provider is deprecated I replaced it with ping and api (which also contains the dashboard)
  • ping is used for liveness and readiness probes on port 8080
  • api provides the dashboard on port 9090 now (was 8080) and can be secured with basic challenge
  • reduced resources.requests.cpu to 100m
  • removed securityContext privileged: true and replaced with needed Linux capabilities for increased security
  • acme (Let’s encrypt) now uses tlsChallenge (TLS-ALPN-01)
  • add basic authentication to dashboard

Currently the Kubernetes cluster I build so far only answers internal requests. But most people want to deliver their website or stuff like that through a webserver/appserver running in the cluster. For this to work we need something that is called ingress. More information is provided in the Kubernetes Ingress Resources documentation.

Typically, services and pods have IPs only routeable by the cluster network. All traffic that ends up at an edge router is either dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting, and more. Users request ingress by POSTing the Ingress resource to the API server. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic in an HA manner.

There’re several options to implement this e.g. Kube-Lego which uses nginx as reverse proxy and automatically requests free certificates for Kubernetes Ingress resources from Let’s Encrypt. Traefik supports several backend (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, and a lot more) to manage its configuration automatically and dynamically. As with Kube-Lego it’s also possible to request and renew certificates from Let’s Encrypt automatically.

There are some posts/docs out there which describes some setups e.g.

Automated certificate provisioning in Kubernetes using kube-lego
Kubernetes with Traefik and Let’s Encrypt
Traefik as Kubernetes Ingress Controller on ARM with automatic TLS certificates
Traefik Kubernetes Ingress Controller documentation

As I have already etcd for Kubernetes to store it’s state it makes sense to reuse it for Traefik as well of course. As of Traefik 1.5.0-rc1 this is now possible.

First I need to define a few Ansible variables. I put them in group_vars/all.yml. As of Kubernetes 1.8 RBAC (Role-Based Access Control) is stable.

traefik_clusterrole: |
  ---
  kind: ClusterRole
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: traefik-ingress-controller
  rules:
    - apiGroups:
        - ""
      resources:
        - services
        - endpoints
        - secrets
      verbs:
        - get
        - list
        - watch
    - apiGroups:
        - extensions
      resources:
        - ingresses
      verbs:
        - get
        - list
        - watch  

traefik_clusterrolebinding: |
  ---
  kind: ClusterRoleBinding
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: traefik-ingress-controller
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: traefik-ingress-controller
  subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: kube-system  

The playbook I’ll execute later will use this two variables (and some other to come) to create the ClusterRole and ClusterRoleBinding for Traefik. In short a Role is basically a set of permissions which we assign to a RoleBinding which grants the permissions defined in a Role to a user or set of users. Role’s and RoleBinding’s are used if you want to grant permissions in namespaces. Without namespaces the settings are cluster-wide and that’s why they’re called ClusterRole and ClusterRoleBinding ;-) As you can see above for Traefik we define the role and binding cluster-wide. To serve content from all services in all namespaces Traefik needs access to all namespaces.

Now we add a variable we need for the service account:

traefik_serviceaccount: |
  ---
  apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: traefik-ingress-controller
    namespace: kube-system  

When you (a human) access the cluster (for example, using kubectl), you are authenticated by the kube-apiserver as a particular user account. Processes in containers inside pods can also contact the kube-apiserver. When they do, they are authenticated as a particular Service Account. In our case we create and use the service account traefik-ingress-controller and it will be placed in the kube-system namespace.

And now comes the interesting part - the DaemonSet running the Traefik daemon. Again I define a variable with the following content (adjust the settings according to your needs, especially the value of acme.email. How to create a encrypted password and replace the value crypted_password_here see below in the detailed parameter description:

traefik_daemonset: |
  ---
  kind: DaemonSet
  apiVersion: apps/v1
  metadata:
    name: traefik-ingress-controller
    namespace: kube-system
    labels:
      k8s-app: traefik-ingress-lb
  spec:
    selector:
      matchLabels:
        k8s-app: traefik-ingress-lb
    updateStrategy:
      type: RollingUpdate
    template:
      metadata:
        labels:
          k8s-app: traefik-ingress-lb
          name: traefik-ingress-lb
      spec:
        serviceAccountName: traefik-ingress-controller
        terminationGracePeriodSeconds: 60
        hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        containers:
        - image: traefik:v1.7.26-alpine
          name: traefik-ingress-lb
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 2
            httpGet:
              path: /ping
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /ping
              port: 8080
              scheme: HTTP
            periodSeconds: 10
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "64Mi"
              cpu: "250m"
          ports:
          - name: http
            containerPort: 80
            hostPort: 80
          - name: https
            containerPort: 443
            hostPort: 443
          - name: ping
            containerPort: 8080
          - name: admin
            containerPort: 9090
          securityContext:
            capabilities:
              drop:
              - ALL
              add:
              - NET_BIND_SERVICE
          volumeMounts:
          - name: tls
            mountPath: {{k8s_conf_dir}}
            readOnly: true
          args:
          - --checknewversion=false
          - --loglevel=INFO
          - --defaultentrypoints=http,https
          - --entrypoints=Name:http Address::80 Redirect.EntryPoint:https
          - --entrypoints=Name:https Address::443 TLS
          - --entrypoints=Name:ping Address::8080
          - --entrypoints=Name:admin Address::9090 Compress:true Auth.Basic.Users:admin:crypted_password_here
          - --etcd=true
          - --etcd.prefix=/traefik
          - --etcd.watch=true
          - --etcd.endpoint={{groups.k8s_etcd|first}}:2379
          - --etcd.tls=true
          - --etcd.tls.ca={{k8s_conf_dir}}/ca-etcd.pem
          - --etcd.tls.cert={{k8s_conf_dir}}/cert-traefik.pem
          - --etcd.tls.key={{k8s_conf_dir}}/cert-traefik-key.pem
          - --etcd.useapiv3=true
          - --kubernetes=true
          - --kubernetes.watch=true
          - --acme=true
          - --acme.acmeLogging=true
          - --acme.caserver=https://acme-staging.api.letsencrypt.org/director 
          - --acme.entrypoint=https
          - --acme.httpChallenge=true
          - --acme.httpchallenge.entrypoint=http
          - --acme.email=_YOUR_@_DOMAIN_._TLD_
          - --acme.onHostRule=true
          - --acme.storage=/traefik/acme/account
          - --ping=true
          - --ping.entryPoint=ping
          - --api=true
          - --api.entrypoint=admin
          - --api.dashboard=true
        volumes:
          - name: tls
            secret:
              secretName: traefik-etcd  

First of all: What is a DaemonSet? A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. For our Traefik pods this means that exactly one Traefik pod will run on every worker node. As I only have a few worker nodes that’s ok for me. If you have tens or hundreds of worker nodes then this makes probably not much sense ;-) But that’s not a problem as you can attach labels to a Kubernetes worker/node and assign a nodeSelector to your pod configuration to run Traefik DaemonSet only on specific nodes. For more information see Assigning Pods to Nodes documentation.

Now what’s in our Traefik DaemonSet specification? Let me shortly explain the important part’s from top to bottom:

updateStrategy:
  type: RollingUpdate

This specifies how the pods of the DaemonSet should be updated. I’ve chosen to use the RollingUpdate strategy. RollingUpdate means after you update a DaemonSet template, old DaemonSet pods will be killed, and new DaemonSet pods will be created automatically, in a controlled fashion. You can further fine tune the update process by setting maxUnavailable (default to 1) and minReadySeconds (default to 0) as well. maxUnavailable equals 1 means that only one pod of the whole DaemonSet while be updated and only if the updated one is healthy again the update process start with updating the next pod (also see Perform a Rolling Update on a DaemonSet.

hostNetwork: true

This pods will use the host network directly and not the “pod network” (the term “pod network” is a little bit misleading as there is no such thing - it basically just comes down to routing network packets and namespaces). So we can bind the Traefik ports on the host interface on port 80, 443, 8080 and 9090 (see below). That also means of course that no further pods of a DaemonSet can use this ports and of course also no other services on the worker nodes. But that’s what we want here as Traefik is basically our “external” loadbalancer for our “internal” services - our tunnel to the rest of the internet so to say ;-)

serviceAccountName: traefik-ingress-controller

Remember the service account variable we defined above? Here we define that the pods in this DaemonSet should use this service account which is also configured in ClusterRoleBinding we defined. The ClusterRoleBinding has the ClusterRole traefik-ingress-controller assigned which in turn means that Traefik is allowed to execute all the actions defined in ClusterRole traefik-ingress-controller.

dnsPolicy: ClusterFirstWithHostNet

This setting is important. It will configure the Traefik pods to use the Kubernetes cluster internal DNS server (most likely KubeDNS or CoreDNS). That means the pods /etc/resolv.conf will be configured to use the Kubernetes DNS server. Otherwise the DNS server of the Kubernetes node will be used (basically /etc/resolv.conf of the worker node but that can’t resolve cluster.local DNS e.g.).

image: traefik:v1.7.26-alpine

We’ll use Traefik v1.7.26 Alpine Linux container.

The livenessProbe and readinessProbe are important for the update process to decide if a pod update was successful.

resources define e.g. how much CPU and RAM resources a pod can acquire (also see Managing Compute Resources for Containers ). You should almost always define limits for your pods!

ports:
- name: http
  containerPort: 80
  hostPort: 80
- name: https
  containerPort: 443
  hostPort: 443
- name: ping
  containerPort: 8080
- name: admin
  containerPort: 9090

Here I map the containerPort to the hostPort and also give the different port’s a name. Port 80 and 443 should be self explaining. On port 8080 the API/ping services are listening e.g. for healthcheck (see livenessProbe and readinessProbe above). On port 9090 you can connect to the dashboard of Traefik. Make sure to have a firewall in place that blocks access to port 8080 and esp. to 9090!

securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - NET_BIND_SERVICE

Without this setting I won’t be able to bind Traefik on port 80 and 443 (which is basically true for all services that want to use ports < 1024). We can also use privileged: true but this makes the attack surface way bigger. So using Linux Capabilities give us fine-grained control over superuser permissions and reduce the permissions to a minimum.

volumeMounts:
- name: tls
  mountPath: {{k8s_conf_dir}}
  readOnly: true

The traefik playbook which I’ll execute later will import the etcd CA (certificate authority), certificate and certificate key file into Kubernetes and store it as a secret. The three files will then be available for Traefik to use in {{k8s_conf_dir}} (which is /var/lib/kubernetes by default) as you can see above (and later in the Traefik options). For this to work we also need a volumes specification that references the secret that the traefik playbook created:

volumes:
  - name: tls
    secret:
      secretName: traefik-etcd

Now to the options we supply to Traefik itself. If you want to see all options just run:

docker run -it --rm traefik:v1.7.26-alpine --help

So walk quickly through the options I used in my example:

--entrypoints=Name:http Address::80
--entrypoints=Name:https Address::443 TLS

I guess this is quite obvious: Traefik should listen on port 80 and 443 and for port 443 also enable TLS/SSL. If you want to redirect incoming HTTP requests to HTTPS use this options instead:

--entrypoints=Name:http Address::80 Redirect.EntryPoint:https
--entrypoints=Name:https Address::443 TLS
--entrypoints=Name:ping Address::8080
--entrypoints=Name:admin Address::9090 Compress:true Auth.Basic.Users:admin:crypted_password_here

Additionally I’ve defined an entrypoint that listens on port 8080 and 9090. On port 8080 the API/ping services of Traefik are listening. It offers a /ping endpoint among other things. That one I use for the Kubernetes healthchecks so Kubernetes can deicide if a service needs to be restarted or if it is still healthy. On port 9090 I’ve the dashboard. I also configured that Traefik should compress the content before delivery (gzip compression). I also configured basic authentication. In the example above there is a username called admin. To create an encrypted password (and replace the text crypted_password_here) you can use htpasswd which you normally get if you have Apache webserver installed e.g. Or we just use the Apache httpd docker container. E.g.:

docker run --rm httpd:2.4 htpasswd -B -nb admin s3cr3t

Of course replace s3cr3t with the password you want to use. -B forces bcrypt encryption. -n displays the result to stdout and with -b you can supply the password from commandline.

Here we enable the Consul backend (where Traefik stores the Let’s Encrypt certificates and some other configuration settings). We use the prefix /traefik (that’s basically the root for all further Traefik keys in Consul key/values store). E.g you can get the Traefik leader directly from Consul via consul kv get traefik/leader. As mentioned above we will use the first hostname in Ansible’s consul_instances host group as the Consul endpoint. And finally we tell Traefik to use a TLS connection with the certificates and the CA we created above.

- --etcd=true
- --etcd.prefix=/traefik
- --etcd.watch=true
- --etcd.endpoint={{groups.k8s_etcd|first}}:2379
- --etcd.tls=true
- --etcd.tls.ca={{k8s_conf_dir}}/ca-etcd.pem
- --etcd.tls.cert={{k8s_conf_dir}}/cert-etcd.pem
- --etcd.tls.key={{k8s_conf_dir}}/cert-etcd-key.pem
- --etcd.useapiv3=true

Here I enable the etcd backend (where Traefik stores the Let's Encrypt certificates and some other configuration settings). The etcd prefix is /traefik (that’s basically the root for all further Traefik keys in etcd key/values store). As mentioned above I’ll use the first hostname in Ansible’s k8s_etcd host group as the etcd endpoint. And finally I tell Traefik to use a TLS connection with the certificates and the CA I created above. I also add a flag to instruct Traefik to use etcd’s v3 API.

--kubernetes=true
--kubernetes.watch=true

This enables Kubernetes backend. If Traefik should only watch out for ingress events in default namespace you can add --kubernetes.namespaces=default. If you have different namespaces for which you want also ingress add them to this list or remove --kubernetes.namespaces which causes Traefik to watch all namespaces.

--api=true
--api.entrypoint=admin
--api.dashboard=true

This enables the Traefik dashboard and bind it on port 9090 as we defined a admin entrypoint which listens on port 9090 above.

--acme=true
--acme.acmeLogging=true
--acme.caserver=https://acme-staging.api.letsencrypt.org/director 
--acme.entrypoint=https
--acme.httpChallenge=true
--acme.httpchallenge.entrypoint=http
--acme.email=_YOUR_@_DOMAIN_._TLD_
--acme.onHostRule=true
--acme.storage=/traefik/acme/account

If you want automatic TLS/SSL configuration with free certificates from Let’s Encrypt then keep this lines. As Let’s Encrypt has some rate limiting you should keep --acme.caserver=https://acme-staging.api.letsencrypt.org/directory during testing the configuration.

If you are confident that everything works as expected you should remove the Let’s Encrypt settings from etcd as they’re only containing staging data with invalid certificate data. The playbook’s delete_etcd.yml (if you use etcd as backend) can do this for you and delete only the key’s mentioned by using the tag traefik-etcd-key and define a variable delete_keys=true:

ansible-playbook -v --extra-vars="delete_keys=true" -t traefik-etcd-key delete_etcd.yml

After you’re done with testing and want to go in production it makes sense to remove everything related to Traefik like the DaemonSet, Service, ServiceAccount, … and re-install from scratch. So if you run

ansible-playbook -v --extra-vars="delete_keys=true" delete_etcd.yml

all Kubernetes Traefik resources will be deleted (by default the playbook won’t delete the Let’s Encrypt data from etcd because in production there is no need to do so normally. But with the variable delete_keys=true specified they’ll also be deleted.

But be aware that you can only create certificates for 20 registered domains per week ATM. A registered domain is, generally speaking, the part of the domain you purchased from your domain name registrar. For instance, in the name www.example.com, the registered domain is example.com. --acme.entrypoint=https is the entrypoint to proxy acme challenge to.

Replace the value of --acme.email=_YOUR_@_DOMAIN_._TLD_ with your e-mail address of course. --acme.onHostRule will request a Let’s Encrypt certificate if a Ingress resource provides a host rule (you’ll see a example below). Finally --acme.storage=/traefik/acme/account will cause that the certificates will be stored in a key/value store backend which is etcd in my case. So as long as etcd is available Traefik can fetch the certificates from the backend as long as their TTLs are valid (Traefik will take care about the renewal process of the certificates but it makes sense to have some monitoring of this process to be sure the certificates a replaced in time).

Before we now roll out the whole thing make sure to open port 80 and 443 so that Let’s Encrypt server can reach the Traefik instances for the acme challenge request. E.g. for the harden-linux role that means to extend the harden_linux_ufw_rules variable to finally have something like this (if you changed the SSHd port to 22222 as recommended otherwise the value will certainly be 22):

harden_linux_ufw_rules:
  - rule: "allow"
    to_port: "22222"
    protocol: "tcp"
  - rule: "allow"
    to_port: "51820"
    protocol: "udp"
  - rule: "allow"
    to_port: "80"
    protocol: "tcp"
  - rule: "allow"
    to_port: "443"
    protocol: "tcp"

Be aware that it could (and also will) take a few minutes if a new certificate is requested for the first time from Let’s Encrypt! Have a look at the pod logs of the DaemonSet regarding the registration process (e.g. kubectl logs traefik-ingress-controller-XXXXX -f --namespace=kube-system).

Before we now install all Kubernetes Traefik resources make sure that you have specified the following variables in group_vars/all.yml:

etcd_version: "3.4.7"
etcd_bin_dir: "/usr/local/bin"
etcd_client_port: "2379"

The latest supported and tested etcd version for Kubernetes v1.18.x is 3.4.7. The playbooks also need to locate etcdctl utility on the first etcd host and it will run the binary in the directory specified in etcd_bin_dir. And the playbooks also needs the etcd_client_port. But if you followed my blog series so far then you almost certainly already have the variables set or just the etcd role default variables will be used.

To get the playbook I created to install Traefik and all it’s required resources clone my ansible-kubernetes-playbooks repository e.g.:

git clone https://github.com/githubixx/ansible-kubernetes-playbooks.git

Then

cd traefik

and run the playbook. If you want to use ’etcd’ as Traefik backend then run

ansible-playbook install_or_update_etcd.yml

This will install all the resources we defined above and of course the Traefik DaemonSet. The Traefik dashboard should be available on all worker nodes on port 9090 shortly after the playbook ran successfully.

You can easily access the Traefik dashboard now using e.g.

kubectl port-forward traefik-ingress-controller-wqsxt 9090:9090 --namespace=kube-system

Of course replace traefik-ingress-controller-wqsxt with the name of one of your Traefik pods (use kubectl get pods --namespace=kube-system to get a list of pods in kube-system namespace).

Also

kubectl logs traefik-ingress-controller-wqsxt --namespace=kube-system

will show you the logs of pod traefik-ingress-controller-wqsxt (again replace with one of your Traefik pod names).

Be aware that Traefik is very picky on the one hand but also isn’t very good sometimes telling you what you did wrong ;-) So you maybe need to experiment a little bit until you get what you want. At least the options above should work as I’ve tested them quite a few times… Also the --debug option could help to get more information.

Now that we have deployed our Traefik loadbalancer we want of course expose a website to the world. Before you proceed make sure that the DNS entry of the domain you want to expose to the outside points to one of the Traefik instances or create a round robin DNS entry and point the same domain name to two or three different A records e.g. This step is especially important if you configured Traefik to automatically create Let’s Encrypt certificates as the Let’s Encrypt server will contact the Traefik instances to verify that you own the domain (ACME challenge responses).

The first thing we need is a Deployment. As an example we will deploy two nginx webserver. Let’s assume you own the domain example.com and the two nginx server should deliver that site.

Here is a example deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-com
  namespace: default
  labels:
    app: example-com
spec:
  replicas: 2
  selector:
    matchLabels:
      app: example-com
  template:
    metadata:
      labels:
        app: example-com
    spec:
      containers:
      - name: nginx
        image: nginx:1.17
        ports:
        - containerPort: 80

As you can see in the metadata we name the deployment example-com (but could be any name you like of course) and it will be deployed in the default namespace. There will be two pods as stated in replicas: 2. The selector field defines how the Deployment knows what Pods to manage. In this case, we simply select on one label defined in the Pod template: app: example-com. The Pod template’s specification, or template: spec field, indicates that the Pods run one container, nginx, which runs the nginx Docker Hub image at version 1.17.

To roll out the example above copy the example above, open your favorite text editor and adjust to your needs. Save it as deployment.yml and create the deployment via

kubectl create -f deployment.yml

Verify that the pods are up and running e.g.:

kubectl get pods -l 'app=example-com' --namespace=default -o wide

NAME                          READY     STATUS    RESTARTS   AGE       IP             NODE
example-com-d8d7c48c4-j7brl   1/1       Running   0          7d        10.200.25.15   worker02
example-com-d8d7c48c4-vdb4k   1/1       Running   0          7d        10.200.5.14    worker01

Next we need a Service. Here is an example:

kind: Service
apiVersion: v1
metadata:
  name: example-com
  namespace: default
spec:
  selector:
    app: example-com
  ports:
  - name: http
    port: 80

A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them - sometimes called a microservice. The set of Pods targeted by a Service is (usually) determined by a Label Selector.

As you already guessed the Service will be called example-com and lives in the default namespace. We defined a Pod selector with app: example-com. The incoming Service port is 80 and we name it http. If you don’t specify a targetPort then targetPort is equal to port. So if your Pod ports are e.g. 8080 you also need to add targetPort: 8080 to .service.spec.ports.

If you don’t specify a clusterIP you get a stable IP address automatically for this service (it lasts as long as the service exists).

To roll out the service example above again copy the example, open your favorite text editor and adjust to your needs. Save it as service.yml and create the service via

kubectl create -f service.yml

Verify that the service was created:

kubectl get svc --namespace=default

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
example-com   ClusterIP   10.32.226.13    <none>        80/TCP    7d

Before we create the Ingress resource make (again) sure that your DNS entries point to the correct IP address. In our example above that means that we need to make sure that example.com and www.example.com are pointing to the public IP of one of the worker nodes where Traefik runs on and that port 80 and 443 are not blocked at the firewall.

Finally we define the Kubernetes Ingress resource:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: example-com
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "traefik"
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-com
          servicePort: 80
  - host: www.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-com
          servicePort: 80

Again we name the Ingress resource name: example-com. The kubernetes.io/ingress.class annotation can be attached to any Ingress object in order to control whether Traefik should handle it. If the annotation is missing, contains an empty value, or the value traefik, then the Traefik controller will take responsibility and process the associated Ingress object. If the annotation contains any other value (usually the name of a different Ingress controller), Traefik will ignore the object. In ingress.spec.rules I defined two host rules: One for example.com and one for www.example.com. Both define that all requests (paths: /) should go to the service example-com that we defined above on port 80.

To roll out the Ingress example copy the example above, open your favorite text editor and adjust to your needs. Save it as ingress.yml and create the deployment via

kubectl create -f ingress.yml

That’s it basically! You should now be able to curl http://example.com (which of course you have to change to match your domain) and get the default nginx homepage. A few minutes later you should also be able to curl https://example.com. As already mentioned it takes Traefik a few minutes to get the SSL certificate from Let’s Encrypt but as soon as it is stored etcd the renewed process shouldn’t interrupt normal website operations.

As etcd now contains quite important data (your SSL certificate(s)) you should think about a backup and restore process and disaster recovery ;-).

You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS record to one of the Traefik instances (which basically means to one of the Traefik DaemonSet members) and the host die’s you’re out of business for a while. Also if you used DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes.

keepalived-vip could be a solution to that problem but this project is no longer maintained.

If you can change your DNS records via API (which is the case for Google Cloud DNS or OVH DNS e.g.) you could deploy a Kubernetes Cron Job that monitors all Traefik instances and changes DNS records if one of the nodes fail or you can implement a watchdog functionality yourself and deploy the program as pod into your K8s cluster.

But one of the best options to solve the problem is probably MetalLB. Also see Configuring HA Kubernetes cluster on bare metal servers with GlusterFS & MetalLB and What you need to know about MetalLB.

Also hcloud-fip-controller is a possible option that might be sufficient for maybe quite a few use cases. hcloud-fip-controller is a small controller, to handle floating IP management in a kubernetes cluster on hetzner cloud virtual machines.

Next up: Kubernetes the Not So Hard Way With Ansible - Upgrading Kubernetes