Kubernetes the Not So Hard Way With Ansible (at Scaleway) - Part 8 - Ingress with Traefik

Make your services/pods available to the internet with automatic TLS certificate generation

October 29, 2017

CHANGELOG

2017-12-02

  • Consul now configured with TLS support
  • Traefik now configured to use TLS for communication and authorization with Consul
  • Ansible role to create CA (certificate authority) and certificates for Consul/Traefik

2017-11-26

  • Update about bug using Traefik with etcd K/V

2017-10-10

  • Added example service, ingress and deployment resources

Currently the Kubernetes cluster we build so far only answers internal requests. But most people want to deliver their website or stuff like that through a webserver/appserver running in the cluster. For this to work we need something that is called ingress. More information is provided in the Kubernetes Ingress Resources documentation. Typically, services and pods have IPs only routable by the cluster network. All traffic that ends up at an edge router is either dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting, and more. Users request ingress by POSTing the Ingress resource to the API server. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic in an HA manner.

There’re several options to implement this e.g. Kube-Lego which uses nginx as reverse proxy and automatically requests free certificates for Kubernetes Ingress resources from Let’s Encrypt. But I’ll use Traefik because it is a modern HTTP reverse proxy and load balancer for the cloud age made to deploy microservices with ease (ok and I know two of the maintainer in person ;-) ). It supports several backends (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, and a lot more) to manage its configuration automatically and dynamically. As with Kube-Lego it’s also possible to request and renew certificates from Let’s Encrypt automatically.

There are some posts out there which describes some setups e.g.

Automated certificate provisioning in Kubernetes using kube-lego

Kubernetes with Traefik and Let’s Encrypt

Træfik as Kubernetes Ingress Controller on ARM with automatic TLS certificates

Traefik Kubernetes Ingress Controller documentation

A word about etcd as Traefik backend

It would have made sense to use etcd as backend to store the Let’s Encrypt SSL certificates as we already use it for the Kubernetes API service but there is still a bug (as of 20171016) which prevents us from doing so. Basically it works but we would not be able to automatically generate Let’s Encrypt certificates and store them in etcd. So as long as this bug exists we need something different to as a backend for Traefik and Consul by HashiCorp is a valid option for now (Update 20171126: Looks like the fix for the bug mentioned above is merged and will be in Traefik 1.5 according to this pull request: https://github.com/containous/traefik/pull/2407 Update 20171201: Traefik 1.5 RC1 is out and should work with etcd. Will update this blog post when I’ve tested it and have time ;-) ).

One of the blog posts above uses Kubernetes StatefulSets but I’ll stay with the Ansible route to install and manage Consul (as we did with etcd ). I think it’s a valid option to manage this kind of backend services outside of Kubernetes. If there is a problem with the Kubernetes setup I still want be able to handle all critical Kubernetes components (which could get interesting if e.g. kubectl doesn’t work anymore for whatever reason or you’ve authentication problems or whatever).

Creating a certificate authority (CA) and certificates for Consul

The first thing we need is a certificate authority (CA) to issue and sign certificates for Consul. We need the certificates to secure the communication between the Consul members. Additionally we also need them to secure the communication from the clients (Traefik in our case) to the Consul members.

To make the task a little bit easier I’ve created a Ansible role. Install the role via

ansible-galaxy install githubixx.consul-ca

Now we need to define a few Ansible variables (put them in group_vars/all.yml or where it fits best for you):

# Where to store the CA and certificate files. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/consul/ssl". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "consul_ca_conf_directory" will have a value of
# "/home/da_user/consul/ssl".
consul_ca_conf_directory: "{{ '~/consul/ssl' | expanduser }}"

# The user who own's the certificate directory and files
consul_ca_certificate_owner: "root"
# The group which own's the certificate directory and files
consul_ca_certificate_group: "root"

# Expiry for Consul root certificate
ca_consul_expiry: "87600h"

#
# Certificate authority for Consul certificates
#
ca_consul_csr_cn: "Consul"
ca_consul_csr_key_algo: "rsa"
ca_consul_csr_key_size: "2048"
ca_consul_csr_names_c: "DE"
ca_consul_csr_names_l: "The_Internet"
ca_consul_csr_names_o: "Consul"
ca_consul_csr_names_ou: "BY"
ca_consul_csr_names_st: "Bayern"

#
# CSR parameter for Consul certificate
#
consul_csr_cn: "server.dc1.consul"
consul_csr_key_algo: "rsa"
consul_csr_key_size: "2048"
consul_csr_names_c: "DE"
consul_csr_names_l: "The_Internet"
consul_csr_names_o: "Consul"
consul_csr_names_ou: "BY"
consul_csr_names_st: "Bayern"

One note about consul_csr_cn: Consul want’s the certs to have server.<data_center>.consul as common name (cn) value (also see Consul: Adding TLS to Consul using Self Signed Certificates. So in Consul’s config.json you have a parameter datacenter (see below) which is often dc1 by default. E.g. if you have "datacenter": "par1" in Consul’s configuration specified the first value in consul_csr_cn should be server.par1.consul. You can specify additional common names afterwards separated by comma’s. Even wildcards are possible e.g. consul_csr_cn: "server.dc1.consul,*.example.com" but still the first value should be as mentioned.

So besides server.<data_center>.consul it is important that you specify the domain here that you use to connect to Consul. As we will see later by default we define the Consul endpoint for Traefik like this: --consul.endpoint={{groups.consul_instances|first}}:8443. This takes the first host in Ansible’s consul_instances host group. Now if the first hostname is e.g. consul1.example.com then consul_csr_cn should be server.dc1.consul,*.example.com. If the domain name doesn’t match Traefik won’t connect to Consul because of certificate mismatch.

Now we can create the certificate authority (CA) and the certificates:

ansible-playbook --tags=role-consul-ca k8s.yml

Installing Consul

Next we want to install Consul. There’re quite a few Ansible Consul roles out there but I have choosen the one from Brian Shumate. It’s up2date and looks like that it’s also maintained well. So we first install it via

ansible-galaxy install brianshumate.consul

Consul 1.0.0 was just released and Brian already updated the role to use 1.0.0 as default. First we add the consul hosts to Ansible’s hosts file e.g.:

[consul_instances]
controller[1:3].your-domain.tld consul_node_role=server consul_bootstrap_expect=true

We create a new host group called consul_instances. Do not change the group name! The role expects this group name. As you can see we’ll use the three Kubernetes controller nodes to install Consul on (if you can afford it install Consul on it’s own hosts - which is also true for etcd). We also specify some parameter which let’s the servers elect a leader among themselves (so no need to specify which one is the bootstrap node and which one are the other server). Next we add an entry to our site playbook k8s.yml:

-
  hosts: consul_instances
  any_errors_fatal: true
  become: yes
  become_user: root
  roles:
    -
      role: brianshumate.consul
      tags: role-consul

The any_errors_fatal play option will mark all hosts as failed if any fails, causing an immediate abort. Next we’ll fine tune the role with some variables in group_vars/all.yml (or where ever it makes sense for you). I’ve set the following variables (see comments for what this variables are good for):

# We want to stay with 1.0.0 until we explicitly state otherwise
consul_version: "1.0.0"

# Set Consul user
consul_user: "consul"

# Use the Scaleway datacenter name here (in this case: Amsterdam called "ams1" or "par1" for Paris)
consul_datacenter: "ams1"

# We want to bind Consul service to the PeerVPN interface
# as all our Kubernetes services are bound to this interface
consul_iface: "{{peervpn_conf_interface}}"

# Consul should listen on our PeerVPN interface to make it accessable for the worker nodes
consul_client_address: "{{hostvars[inventory_hostname]['ansible_' + peervpn_conf_interface].ipv4.address}}"
consul_bind_address: "{{hostvars[inventory_hostname]['ansible_' + peervpn_conf_interface].ipv4.address}}"

# Download the files for installation directly on the remote hosts
consul_install_remotely: "true"

# We know when to update ;-)
consul_disable_update_check: "true"

# The HTTPS port. Traefik will use this port as endpoint port.
consul_ports_https: "8443"

# Enable TLS communication
consul_tls_enable: "true"

# Directory where certificates are stored locally (the location
# where the role copies the certificates). We just use the
# "consul_ca_conf_directory" variable value here
# specified in our Consul CA role.
consul_tls_src_files: "{{consul_ca_conf_directory}}"

# Location of Consul certificate files on remote hosts
consul_tls_dir: "/etc/consul/ssl"

# CA certificate filename (normally no need to change)
consul_tls_ca_crt: "ca-consul.pem"

# Server certificate filename (normally no need to change)
consul_tls_server_crt: "cert-consul.pem"

# Server key filename (normally no need to change)
consul_server_key: "cert-consul-key.pem"

# Verify incoming connections
consul_tls_verify_incoming: "true"

# Verify outgoing connections
consul_tls_verify_outgoing: "true"

# Verify server hostname
consul_tls_verify_server_hostname: "true"

Before installing the role I had to pip2 install netaddr on my local host. That’s because I run Archlinux and Ansible still uses Python 2.7 on my laptop. The role will execute pip3 install netaddr which installs the library for Python 3.x. So I did this manually which made the role happy. So we’re now setup and can install the role:

ansible-playbook --tags=role-consul k8s.yml

This takes a while but once done you can have a look with journalctl -f -t consul on one of the controller/consul nodes (depends where you installed Consul) if the nodes have joined the cluster and elected a leader (you should see something like this: consul: New leader elected: controller1).

ClusterRole and ClusterRoleBinding for Traefik

Next we need to define a few Ansible variables. Put them in group_vars/all.yml or where you think they fit best for your setup. As of Kubernetes 1.8 RBAC (Role-Based Access Control) is stable. If you use Kubernetes 1.7 you need to specify apiVersion: rbac.authorization.k8s.io/v1beta1 for ClusterRole and ClusterRoleBinding:

traefik_clusterrole: |
  ---
  kind: ClusterRole
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: traefik-ingress-controller
  rules:
    - apiGroups:
        - ""
      resources:
        - services
        - endpoints
        - secrets
      verbs:
        - get
        - list
        - watch
    - apiGroups:
        - extensions
      resources:
        - ingresses
      verbs:
        - get
        - list
        - watch

traefik_clusterrolebinding: |
  ---   
  kind: ClusterRoleBinding
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: traefik-ingress-controller
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: traefik-ingress-controller
  subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: kube-system

The playbook we execute later will use this two variables to create the ClusterRole and ClusterRoleBinding for Traefik. In short a Role is basically a set of permissions which we assign to a RoleBinding which grants the permissions defined in a Role to a user or set of users. Role’s and RoleBinding’s are used if you want to grant permissions in namespaces. Without namespaces the settings are cluster-wide and that’s why they’re called ClusterRole and ClusterRoleBinding ;-) As you can see above for Traefik we define the role and binding cluster-wide. To serve content from all services in all namespaces Traefik needs access to all namespaces.

ServiceAccount for Traefik

Now we add a variable we need for the service account:

traefik_serviceaccount: |
  ---
  apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: traefik-ingress-controller
    namespace: kube-system

When you (a human) access the cluster (for example, using kubectl), you are authenticated by the apiserver as a particular User Account. Processes in containers inside pods can also contact the apiserver. When they do, they are authenticated as a particular Service Account. In our case we create and use the service account traefik-ingress-controller and it will be placed in the kube-system namespace.

DaemonSet for Traefik

And now comes the interessting part - the DaemonSet running the Traefik daemon. Again we define a variable with the following content (adjust the settings according to your needs, especially the value of acme.email:

traefik_daemonset: |
  ---
  kind: DaemonSet
  apiVersion: extensions/v1beta1
  metadata:
    name: traefik-ingress-controller
    namespace: kube-system
    labels:
      k8s-app: traefik-ingress-lb
  spec:
    updateStrategy:
      type: RollingUpdate
    template:
      metadata:
        labels:
          k8s-app: traefik-ingress-lb
          name: traefik-ingress-lb
      spec:
        serviceAccountName: traefik-ingress-controller
        terminationGracePeriodSeconds: 60
        hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        containers:
        - image: traefik:1.4.4-alpine
          name: traefik-ingress-lb
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 2
            httpGet:
              path: /ping
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            failureThreshold: 2
            httpGet:
              path: /ping
              port: 8080
              scheme: HTTP
            periodSeconds: 5
          resources:
            requests:
              memory: "64Mi"
              cpu: "250m"
            limits:
              memory: "64Mi"
              cpu: "250m"
          ports:
          - name: http
            containerPort: 80
            hostPort: 80
          - name: https
            containerPort: 443
            hostPort: 443
          - name: admin
            containerPort: 8080
          securityContext:
            privileged: true
          volumeMounts:
          - name: tls
            mountPath: {{consul_tls_dir}}
            readOnly: true
          args:
          - --checknewversion=false
          - --loglevel=INFO
          - --defaultentrypoints=http,https
          - --entrypoints=Name:http Address::80
          - --entrypoints=Name:https Address::443 TLS
          - --consul=true
          - --consul.prefix=traefik
          - --consul.watch=true
          - --consul.endpoint={{groups.consul_instances|first}}:8443
          - --consul.tls=true
          - --consul.tls.ca={{consul_tls_dir}}/ca-consul.pem
          - --consul.tls.cert={{consul_tls_dir}}/cert-consul.pem
          - --consul.tls.key={{consul_tls_dir}}/cert-consul-key.pem
          - --kubernetes=true
          - --kubernetes.watch=true
          - --kubernetes.namespaces=default
          - --web=true
          - --web.readonly
          - --web.address=:8080
          - --acme=true
          - --acme.acmelogging=true
          - --acme.caserver=https://acme-staging.api.letsencrypt.org/directory
          - --acme.entrypoint=https
          - --acme.email=_YOUR_@_DOMAIN_._TLD_
          - --acme.onhostrule
          - --acme.storage=traefik/acme/account
        volumes:
          - name: tls
            secret:
              secretName: consul

First of all: What is a DaemonSet? A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. For our Traefik pods this means that exactly one Traefik pod will run on every worker node. As I only have a few worker nodes that’s ok for me. If you have tens or hundrets of worker nodes then this makes probably not much sense ;-) But that’s not a problem as you can attach labels to a Kubernetes worker/node and assign a nodeSelector to your pod configuration to run Traefik DaemonSet only on specific nodes. For more information see Assigning Pods to Nodes documentation.

Now what’s in our Traefik DaemonSet specification? Let me shortly explain the important part’s from top to bottom:

updateStrategy:
  type: RollingUpdate

This specifies how the pods of the DaemonSet should be updated. I’ve choosen to use the RollingUpdate strategy. RollingUpdate means after you update a DaemonSet template, old DaemonSet pods will be killed, and new DaemonSet pods will be created automatically, in a controlled fashion. You can further fine tune the update process by setting maxUnavailable (default to 1) and minReadySeconds (default to 0) as well. maxUnavailable equals 1 means that only one pod of the whole DaemonSet while be updated and only if the updated one is healthy again the update process start with updating the next pod (also see Perform a Rolling Update on a DaemonSet.

hostNetwork: true

This pods will use the host network directly and not the pod network. So we can bind the Traefik ports on the host interface on port 80, 443 and 8080 (see below). That also means of course that no further pods of a daemon set can use this ports and of course also no other services on the worker nodes. But that’s what we want here as Traefik is basically our “external” loadbalancer for our “internal” services - our tunnel to the rest of the internet so to say ;-)

serviceAccountName: traefik-ingress-controller

Remember the service account variable we defined above? Here we define that the pods in this DaemonSet should use this service account which is also configured in ClusterRoleBinding we defined. The ClusterRoleBinding has the ClusterRole traefik-ingress-controller assigned which in turn means that Traefik is allowed to execute all the actions defined in ClusterRole traefik-ingress-controller.

dnsPolicy: ClusterFirstWithHostNet

This setting is important. It will configure the Traefik pods to use the Kubernetes cluster internal DNS server (most likely KubeDNS or maybe CoreDNS). That means the pods /etc/resolv.conf will be configured to use the Kubernetes DNS server. Otherwise the DNS server of the Kubernetes node will be used (basically /etc/resolv.conf of the worker node but that can’t resolv cluster.local DNS e.g.).

image: traefik:1.4.4-alpine

We’ll use Traefik 1.4.4 Alpine Linux container.

The livenessProbe and readinessProbe are important for the update process to decide if a pod update was successful.

resources define e.g. how much CPU and RAM resources a pod can aquire (also see Managing Compute Resources for Containers ). You should almost always define limits for your pods!

ports:
- name: http
  containerPort: 80
  hostPort: 80
- name: https
  containerPort: 443
  hostPort: 443
- name: admin
  containerPort: 8080

Here we map the containerPort to the hostPort and also give the different port’s a name. Port 80 and 443 should be self explaining. Port 8080 is the admin UI of Traefik. It will bind to the PeerVPN interface by default.

securityContext:
  privileged: true

Without this setting we won’t be able to bind Traefik on port 80 and 443 (which is basically true for all services that want to use ports < 1024).

volumeMounts:
- name: tls
  mountPath: {{consul_tls_dir}}
  readOnly: true

The traefik playbook we will execute later will import the Consul CA, certificate and certificate key file into Kubernetes and store it as a secret. This three files will then be available for Traefik to use in {{consul_tls_dir}} (which /etc/consul/ssl by default) as you can see above (and later in the Traefik options). For this to work we also need a volumes specification that references the secret that the traefik playbook created:

volumes:
  - name: tls
    secret:
      secretName: consul

Now to the options we supply to Traefik itself. If you want to see all options just run:

docker run -it --rm traefik:1.4.4-alpine --help

So walk quickly through the options I used in my example:

--entrypoints=Name:http Address::80
--entrypoints=Name:https Address::443 TLS

I guess this is quite obvious: Traefik should listen on port 80 and 443 and for port 443 also enable TLS/SSL).

--consul=true
--consul.prefix=traefik
--consul.watch=true
--consul.endpoint={{groups.consul_instances|first}}:8443
--consul.tls=true
--consul.tls.ca={{consul_tls_dir}}/ca-consul.pem
--consul.tls.cert={{consul_tls_dir}}/cert-consul.pem
--consul.tls.key={{consul_tls_dir}}/cert-consul-key.pem

Here we enable the Consul backend (where Traeffic stores the Let’s Encrypt certificates and some other configuration settings). We use the prefix traefik (that’s basically the root for all further Traefik keys in Consul key/values store). E.g you can get the Traefik leader directly from Consul via consul kv get traefik/leader. As mentioned above we will use the first hostname in Ansible’s consul_instances host group as the Consul endpoint. And finally we tell Traefik to use a TLS connection with the certificates and the CA we created above.

--kubernetes=true
--kubernetes.watch=true
--kubernetes.namespaces=default

This enables Kubernetes backend. We’re only interested in default namespace ingress events. If you have different namespaces for which you want also ingress add them to this list or remove --kubernetes.namespaces which causes Traefik to watch all namespaces.

--web=true
--web.readonly
--web.address=:8080

This enables the Traefik UI and bind it on port 8080.

--acme=true
--acme.acmelogging=true
--acme.caserver=https://acme-staging.api.letsencrypt.org/directory
--acme.entrypoint=https
--acme.email=_YOUR_@_DOMAIN_._TLD_
--acme.onhostrule
--acme.storage=traefik/acme/account

If you want automatic TLS/SSL configuration with free certificates from Let’s Encrypt then keep this lines. As Let’s Encrypt has some rate limiting you should keep --acme.caserver=https://acme-staging.api.letsencrypt.org/directory during testing the configuration. If you are confident that everything works as expeced just remove this entry and update the DaemonSet. But be aware that you can only create certificates for 20 registered domains per week ATM. A registered domain is, generally speaking, the part of the domain you purchased from your domain name registrar. For instance, in the name www.example.com, the registered domain is example.com. --acme.entrypoint=https is the entrypoint to proxy acme challenge to. Replace the value of --acme.email=_YOUR_@_DOMAIN_._TLD_ with your e-mail address of course. --acme.onhostrule will request a Let’s Encrypt certificate if a Ingress resource provides a host rule (you’ll see a example below). Finally --acme.storage=traefik/acme/account will cause that the certificates will be stored in a key/value store backend which is Consul in our case. So as long as Consul is available Traefik can fetch the certificates from Consul as long as their TTLs are valid (Traefik will take care about the renewel process of the certificates but it makes sense to have some monitoring of this process to be sure the certificates a replaced in time).

Firewall settings

Before we now roll out the whole thing make sure to open port 80 and 443 so that Let’s Encrypt server can reach the Traefik instances for the acme challange request. E.g. for the harden-linux role that means to extend the harden_linux_ufw_rules variable to finally have something like this:

harden_linux_ufw_rules:
  - rule: "allow"
    to_port: "22222"
    protocol: "tcp"
  - rule: "allow"
    to_port: "7000"
    protocol: "udp"
  - rule: "allow"
    to_port: "80"
    protocol: "tcp"
  - rule: "allow"
    to_port: "443"
    protocol: "tcp"

Be aware that it could (and also will) take a few minutes if a new certificate is requested for the first time from Let’s Encrypt! Have a look at the pod logs of the DaemonSet regarding the registration process.

To get the playbook I created to install Traefik and all it’s required resources clone my ansible-kubernetes-playbooks repository e.g.:

git clone https://github.com/githubixx/ansible-kubernetes-playbooks.git

Then

cd traefik

and run the playbook with

ansible-playbook install_or_update.yml

This will install all the resources we defined above and of course the Traefik DaemonSet. The Traefik UI should be available on all worker nodes on port 8080 on the PeerVPN interface shortly after the playbook ran successfully.

You can easily access the Traefik UI now using e.g.

kubectl port-forward traefik-ingress-controller-wqsxt 8080:8080 --namespace=kube-system

Of course replace traefik-ingress-controller-wqsxt with the name of one of your Traefik pods (use kubectl get pods --namespace=kube-system to get a list of pods in kube-system namespace).

Also

kubectl logs traefik-ingress-controller-wqsxt --namespace=kube-system

will show you the logs of pod “traefik-ingress-controller-wqsxt” (again replace with one of your Traefik pod names).

Be aware that Traeffic is very picky on the one hand but also isn’t very good sometimes telling you what you did wrong ;-) So you maybe need to experiment a little bit until you get what you want. At least the options above should work as I’ve tested them quite a few times… Also the --debug option could help to get more information.

Example deployment and service

Now that we have deployed our Traefik loadbalancer we want of course expose a website e.g. to the world. Before you proceed make sure that the DNS entry of the domain you want to expose to the outside points to one of the Traefik instances or create a round robin DNS entry and point the same domain name to three different A records e.g. This step is especially important if you configured Traefik to automatically create Let’s Encrypt certificates as the Let’s Encrypt server will contact the Traefik instances to verify that you own the domain (ACME challenge responses).

The first thing we need is a Deployment. As an example we will deploy 2 nginx webserver. Let’s assume you own the domain example.com and the 2 nginx server should deliver that site.

Here is a example deployment:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: example-com
  namespace: default
  labels:
    app: example-com
spec:
  replicas: 2
  selector:
    matchLabels:
      app: example-com
  template:
    metadata:
      labels:
        app: example-com
    spec:
      containers:
      - name: nginx
        image: nginx:1.12.0
        ports:
        - containerPort: 80

As you can see in the metadata we name the deployment example-com (but could be any name you like of course) and it will be deployed in the default namespace. There will be two pods as stated in replicas: 2. The selector field defines how the Deployment knows what Pods to manage. In this case, we simply select on one label defined in the Pod template: app: example-com. The Pod template’s specification, or template: spec field, indicates that the Pods run one container, nginx, which runs the nginx Docker Hub image at version 1.12.0.

To roll out the example above copy the example above, open your favorite text editor and adjust to your needs. Save it as deployment.yml and create the deployment via kubectl apply -f deployment.yml.

Verify that the pods are up and running e.g.:

kubectl get pods -l 'app=example-com' --namespace=default

NAME                          READY     STATUS    RESTARTS   AGE       IP             NODE
example-com-d8d7c48c4-j7brl   1/1       Running   0          7d        10.200.25.15   k8s-worker2
example-com-d8d7c48c4-vdb4k   1/1       Running   0          7d        10.200.5.14    k8s-worker1

Next we need a Service. Here is an example:

kind: Service
apiVersion: v1
metadata:
  name: example-com
  namespace: default
spec:
  selector:
    app: example-com
  ports:
  - name: http
    port: 80

A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them - sometimes called a micro-service. The set of Pods targeted by a Service is (usually) determined by a Label Selector.

As you already guessed the Service will be called example-com and lives in the default namespace. We defined a Pod selector with app: example-com. The incoming Service port is 80 and we name it http. If you don’t specifiy a targetPort then targetPort is equal to port. So if your Pod ports are e.g. 8080 you also need to add targetPort: 8080 to .service.spec.ports.

If you don’t specifiy a clusterIP you get a stable IP address automatically for this service (it lasts as long as the service exists).

To roll out the service example above again copy the example, open your favorite text editor and adjust to your needs. Save it as service.yml and create the service via kubectl apply -f service.yml.

Verify that the service was created:

kubectl get svc --namespace=default
NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
example-com   ClusterIP   10.32.226.13    <none>        80/TCP    7d

Prepare DNS

Before we create the Ingress resource make sure that your DNS entries point to the correct IP address. In our example above that means that we need to make sure that example.com and www.example.com are pointing to the public IP of one of the worker nodes where Traefik runs on.

Example ingress

Finally we define the Kubernetes Ingress resource:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: example-com
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "traefik"
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-com
          servicePort: 80
  - host: www.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-com
          servicePort: 80

Again we name the Ingress resource name: example-com. The kubernetes.io/ingress.class annotation can be attached to any Ingress object in order to control whether Traefik should handle it. If the annotation is missing, contains an empty value, or the value traefik, then the Traefik controller will take responsibility and process the associated Ingress object. If the annotation contains any other value (usually the name of a different Ingress controller), Traefik will ignore the object. In ingress.spec.rules we defined two host rules: One for example.com and one for www.example.com. Both define that all requests (paths: /) should go to the service example-com that we defined above on port 80.

To roll out the Ingress example copy the example above, open your favorite text editor and adjust to your needs. Save it as ingress.yml and create the deployment via kubectl apply -f ingress.yml.

That’s it basically! You should now be able to curl http://example.com and get the default nginx homepage. A few minutes later you should also be able to curl https://example.com. As already mentioned it takes Traefik a few minutes to get the SSL certificate from Let’s Encrypt but as soon as it is stored in Consul the renewel process shoudn’t interrupt normal website operations.

As Consul now contains quite important data (your SSL certificate(s)) you should think about a backup and restore process - and I should thinking about securing the Consul communication with SSL certificates as we did with etcd and the API server ;-).

You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS recored to one of the Traefik instances (which basically means to one of the Traefik DaemonSet members) and the host die’s you’re out of business for a while. Also if you used DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes. keepalived-vip could be a solution to that problem but haven’t looked into it. If you can change your DNS records via API (which is the case for Google Cloud DNS e.g.) you could deploy a Kubernetes Cron Job that monitors all Traefik instances and changes DNS records if one of the nodes fail or you can implement a watchdog functionality yourself and deploy the program as pod into your K8s cluster. The journey doesn’t end here ;-)