Kubernetes the Not So Hard Way With Ansible (at Scaleway) - Part 8 - Ingress with Traefik
Make your services/pods available to the internet with automatic TLS certificate generation
October 29, 2017
CHANGELOG
2018-09-05
- I’ll no longer update this text as I migrated my hosts to Hetzner Online because of constant network issues with Scaleway. I’ve created a new blog series about how to setup a Kubernetes cluster at Hetzner Online but since my Ansible playbooks are not provider depended the blog text should work for Scaleway too if you still want to use it. The new blog post is here.
2018-06-04
- Update text about latest supported version of
etcd
by Kubernetes
2018-03-09
- Fix which script to use to install Traefik DaemonSet with etcd or Consul backend in the text
- Added Traefik option to redirect HTTP to HTTPS
2018-01-16
- Update to Traefik v1.5 which is needed to use Let’s Encrypt HTTP-01 challenge support (TLS-SNI-01 was permanently disabled - see 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure)
- Added etcd as backend (as an alternative to Consul). Requires Traefik >= v1.5!
- Kubernetes ClusterRole now stable (v1)
- Kubernetes ClusterRoleBinding now stable (v1)
- Kubernetes DaemonSet now stable (v1) (added
selector
in DaemonSet spec which is now needed)
2017-12-02
- Consul now configured with TLS support
- Traefik now configured to use TLS for communication and authorization with Consul
- Ansible role to create CA (certificate authority) and certificates for Consul/Traefik
2017-11-26
- Update about bug using Traefik with etcd K/V
2017-10-10
- Added example service, ingress and deployment resources
Currently the Kubernetes cluster we build so far only answers internal requests. But most people want to deliver their website or stuff like that through a webserver/appserver running in the cluster. For this to work we need something that is called ingress
. More information is provided in the Kubernetes [Ingress Resources](An Ingress is a collection of rules that allow inbound connections to reach the cluster services) documentation. Typically, services and pods have IPs only routable by the cluster network. All traffic that ends up at an edge router is either dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services. It can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, offer name based virtual hosting, and more. Users request ingress by POSTing the Ingress resource to the API server. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic in an HA manner.
There’re several options to implement this e.g. Kube-Lego which uses nginx as reverse proxy and automatically requests free certificates for Kubernetes Ingress resources from Let’s Encrypt. But I’ll use Traefik because it is a modern HTTP reverse proxy and load balancer for the cloud age made to deploy microservices with ease (ok and I know two of the maintainer in person ;-) ). It supports several backends (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, and a lot more) to manage its configuration automatically and dynamically. As with Kube-Lego it’s also possible to request and renew certificates from Let’s Encrypt automatically.
There are some posts/docs out there which describes some setups e.g.
Automated certificate provisioning in Kubernetes using kube-lego
Kubernetes with Traefik and Let’s Encrypt
Træfik as Kubernetes Ingress Controller on ARM with automatic TLS certificates
Traefik Kubernetes Ingress Controller documentation
Why I implemented it this way
One of the blog posts above uses Kubernetes StatefulSets but I’ll stay with the Ansible route to install and manage Consul (as we did with etcd ). I think it’s a valid option to manage this kind of backend services outside of Kubernetes. If there is a problem with the Kubernetes setup I still want be able to handle all critical Kubernetes components (which could get interesting if e.g. kubectl
doesn’t work anymore for whatever reason or you’ve authentication problems or whatever).
Traefik backends for storing configuration and Let’s Encrypt certificates
As we have already etcd for Kubernetes to store it’s state it makes sense to reuse it for Traefik as well of course. As of Traefik 1.5.0-rc1 this is now possible. If you want to use etcd as backend you can skip some of the steps blow. I’ll add a comment in the headline if you can the topic.
The second option is Consul als backend. I suspect that Traefik’s implementation for Consul is more stable and mature as it is included in Traefik for some time now while etcd support was basically just (re-)added. If you already using Consul it makes of course sense to use it here instead of etcd.
Creating a certificate authority (CA) and certificates for Consul (skip if you use etcd backend)
The first thing we need if we want to use Consul as backend is a certificate authority (CA) to issue and sign certificates for Consul. We need the certificates to secure the communication between the Consul members. Additionally we also need them to secure the communication from the clients (Traefik in our case) to the Consul members.
To make the task a little bit easier I’ve created a Ansible role. Install the role via
ansible-galaxy install githubixx.consul-ca
Now we need to define a few Ansible variables (put them in group_vars/all.yml
or where it fits best for you):
# Where to store the CA and certificate files. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/consul/ssl". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "consul_ca_conf_directory" will have a value of
# "/home/da_user/consul/ssl".
consul_ca_conf_directory: "{{ '~/consul/ssl' | expanduser }}"
# The user who own's the certificate directory and files
consul_ca_certificate_owner: "root"
# The group which own's the certificate directory and files
consul_ca_certificate_group: "root"
# Expiry for Consul root certificate
ca_consul_expiry: "87600h"
#
# Certificate authority for Consul certificates
#
ca_consul_csr_cn: "Consul"
ca_consul_csr_key_algo: "rsa"
ca_consul_csr_key_size: "2048"
ca_consul_csr_names_c: "DE"
ca_consul_csr_names_l: "The_Internet"
ca_consul_csr_names_o: "Consul"
ca_consul_csr_names_ou: "BY"
ca_consul_csr_names_st: "Bayern"
#
# CSR parameter for Consul certificate
#
consul_csr_cn: "server.dc1.consul"
consul_csr_key_algo: "rsa"
consul_csr_key_size: "2048"
consul_csr_names_c: "DE"
consul_csr_names_l: "The_Internet"
consul_csr_names_o: "Consul"
consul_csr_names_ou: "BY"
consul_csr_names_st: "Bayern"
One note about consul_csr_cn
: Consul want’s the certs to have server.<data_center>.consul
as common name (cn) value (also see Consul: Adding TLS to Consul using Self Signed Certificates. So in Consul’s config.json
you have a parameter datacenter
(see below) which is often dc1
by default. E.g. if you have "datacenter": "par1"
in Consul’s configuration specified the first value in consul_csr_cn
should be server.par1.consul
. You can specify additional common names afterwards separated by comma’s. Even wildcards are possible e.g. consul_csr_cn: "server.dc1.consul,*.example.com"
but still the first value should be as mentioned.
So besides server.<data_center>.consul
it is important that you specify the domain here that you use to connect to Consul. As we will see later by default we define the Consul endpoint for Traefik like this: --consul.endpoint={{groups.consul_instances|first}}:8443
. This takes the first host in Ansible’s consul_instances
host group. Now if the first hostname is e.g. consul1.example.com
then consul_csr_cn
should be server.dc1.consul,*.example.com
. If the domain name doesn’t match Traefik won’t connect to Consul because of certificate mismatch.
Now we can create the certificate authority (CA) and the certificates:
ansible-playbook --tags=role-consul-ca k8s.yml
Installing Consul (skip if you use etcd backend)
Next we want to install Consul. There’re quite a few Ansible Consul roles out there but I have choosen the one from Brian Shumate. It’s up2date and looks like that it’s also maintained well. So we first install it via
ansible-galaxy install brianshumate.consul
Consul 1.0.0 was just released and Brian already updated the role to use 1.0.0 as default. First we add the consul hosts to Ansible’s hosts
file e.g.:
[consul_instances]
controller[1:3].your-domain.tld consul_node_role=server consul_bootstrap_expect=true
We create a new host group called consul_instances
. Do not change the group name! The role expects this group name. As you can see we’ll use the three Kubernetes controller nodes to install Consul on (if you can afford it install Consul on it’s own hosts - which is also true for etcd). We also specify some parameter which let’s the servers elect a leader among themselves (so no need to specify which one is the bootstrap node and which one are the other server). Next we add an entry to our site playbook k8s.yml
:
-
hosts: consul_instances
any_errors_fatal: true
become: yes
become_user: root
roles:
-
role: brianshumate.consul
tags: role-consul
The any_errors_fatal
play option will mark all hosts as failed if any fails, causing an immediate abort. Next we’ll fine tune the role with some variables in group_vars/all.yml
(or where ever it makes sense for you). I’ve set the following variables (see comments for what this variables are good for):
# We want to stay with 1.0.0 until we explicitly state otherwise
consul_version: "1.0.0"
# Set Consul user
consul_user: "consul"
# Use the Scaleway datacenter name here (in this case: Amsterdam called "ams1" or "par1" for Paris)
consul_datacenter: "ams1"
# We want to bind Consul service to the PeerVPN interface
# as all our Kubernetes services are bound to this interface
consul_iface: "{{peervpn_conf_interface}}"
# Consul should listen on our PeerVPN interface to make it accessable for the worker nodes
consul_client_address: "{{hostvars[inventory_hostname]['ansible_' + peervpn_conf_interface].ipv4.address}}"
consul_bind_address: "{{hostvars[inventory_hostname]['ansible_' + peervpn_conf_interface].ipv4.address}}"
# Download the files for installation directly on the remote hosts
consul_install_remotely: "true"
# We know when to update ;-)
consul_disable_update_check: "true"
# The HTTPS port. Traefik will use this port as endpoint port.
consul_ports_https: "8443"
# Enable TLS communication
consul_tls_enable: "true"
# Directory where certificates are stored locally (the location
# where the role copies the certificates). We just use the
# "consul_ca_conf_directory" variable value here
# specified in our Consul CA role.
consul_tls_src_files: "{{consul_ca_conf_directory}}"
# Location of Consul certificate files on remote hosts
consul_tls_dir: "/etc/consul/ssl"
# CA certificate filename (normally no need to change)
consul_tls_ca_crt: "ca-consul.pem"
# Server certificate filename (normally no need to change)
consul_tls_server_crt: "cert-consul.pem"
# Server key filename (normally no need to change)
consul_server_key: "cert-consul-key.pem"
# Verify incoming connections
consul_tls_verify_incoming: "true"
# Verify outgoing connections
consul_tls_verify_outgoing: "true"
# Verify server hostname
consul_tls_verify_server_hostname: "true"
Before installing the role I had to pip2 install netaddr
on my local host. That’s because I run Archlinux and Ansible still uses Python 2.7 on my laptop. The role will execute pip3 install netaddr
which installs the library for Python 3.x. So I did this manually which made the role happy. So we’re now setup and can install the role:
ansible-playbook --tags=role-consul k8s.yml
This takes a while but once done you can have a look with journalctl -f -t consul
on one of the controller/consul nodes (depends where you installed Consul) if the nodes have joined the cluster and elected a leader (you should see something like this: consul: New leader elected: controller1
).
ClusterRole and ClusterRoleBinding for Traefik
Next we need to define a few Ansible variables. Put them in group_vars/all.yml
or where you think they fit best for your setup. As of Kubernetes 1.8 RBAC (Role-Based Access Control) is stable. If you use Kubernetes 1.7 you need to specify apiVersion: rbac.authorization.k8s.io/v1beta1
for ClusterRole
and ClusterRoleBinding
:
traefik_clusterrole: |
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- secrets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
traefik_clusterrolebinding: |
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik-ingress-controller
namespace: kube-system
The playbook we execute later will use this two variables (and some other to come) to create the ClusterRole and ClusterRoleBinding for Traefik. In short a Role
is basically a set of permissions which we assign to a RoleBinding
which grants the permissions defined in a Role
to a user or set of users. Role
’s and RoleBinding
’s are used if you want to grant permissions in namespaces. Without namespaces the settings are cluster-wide and that’s why they’re called ClusterRole
and ClusterRoleBinding
;-) As you can see above for Traefik we define the role and binding cluster-wide. To serve content from all services in all namespaces Traefik needs access to all namespaces.
ServiceAccount for Traefik
Now we add a variable we need for the service account:
traefik_serviceaccount: |
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: traefik-ingress-controller
namespace: kube-system
When you (a human) access the cluster (for example, using kubectl), you are authenticated by the apiserver as a particular User Account. Processes in containers inside pods can also contact the apiserver. When they do, they are authenticated as a particular Service Account. In our case we create and use the service account traefik-ingress-controller
and it will be placed in the kube-system
namespace.
DaemonSet for Traefik
And now comes the interessting part - the DaemonSet running the Traefik daemon. Again we define a variable with the following content (adjust the settings according to your needs, especially the value of acme.email
:
Use this variable content if you use CONSUL as backend (for etcd see below)!
traefik_daemonset: |
---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: traefik-ingress-controller
namespace: kube-system
labels:
k8s-app: traefik-ingress-lb
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 60
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- image: traefik:v1.5-alpine
name: traefik-ingress-lb
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 2
httpGet:
path: /ping
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
failureThreshold: 2
httpGet:
path: /ping
port: 8080
scheme: HTTP
periodSeconds: 5
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "64Mi"
cpu: "250m"
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: admin
containerPort: 8080
securityContext:
privileged: true
volumeMounts:
- name: tls
mountPath: {{consul_tls_dir}}
readOnly: true
args:
- --checknewversion=false
- --loglevel=INFO
- --defaultentrypoints=http,https
- --entrypoints=Name:http Address::80
- --entrypoints=Name:https Address::443 TLS
- --consul=true
- --consul.prefix=traefik
- --consul.watch=true
- --consul.endpoint={{groups.consul_instances|first}}:8443
- --consul.tls=true
- --consul.tls.ca={{consul_tls_dir}}/ca-consul.pem
- --consul.tls.cert={{consul_tls_dir}}/cert-consul.pem
- --consul.tls.key={{consul_tls_dir}}/cert-consul-key.pem
- --kubernetes=true
- --kubernetes.watch=true
- --kubernetes.namespaces=default
- --web=true
- --web.readonly
- --web.address=:8080
- --acme=true
- --acme.acmelogging=true
- --acme.caserver=https://acme-staging.api.letsencrypt.org/directory
- --acme.entrypoint=https
- --acme.httpchallenge=true
- --acme.httpChallenge.entryPoint=http
- --acme.email=_YOUR_@_DOMAIN_._TLD_
- --acme.onhostrule
- --acme.storage=traefik/acme/account
volumes:
- name: tls
secret:
secretName: traefik-consul
Use this variable content if you use ETCD as backend!
traefik_daemonset: |
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: traefik-ingress-controller
namespace: kube-system
labels:
k8s-app: traefik-ingress-lb
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 60
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- image: traefik:v1.5-alpine
name: traefik-ingress-lb
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 2
httpGet:
path: /ping
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
failureThreshold: 2
httpGet:
path: /ping
port: 8080
scheme: HTTP
periodSeconds: 5
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "64Mi"
cpu: "250m"
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: admin
containerPort: 8080
securityContext:
privileged: true
volumeMounts:
- name: tls
mountPath: {{k8s_conf_dir}}
readOnly: true
args:
- --checknewversion=false
- --loglevel=INFO
- --defaultentrypoints=http,https
- --entrypoints=Name:http Address::80
- --entrypoints=Name:https Address::443 TLS
- --etcd=true
- --etcd.prefix=/traefik
- --etcd.watch=true
- --etcd.endpoint={{groups.k8s_etcd|first}}:2379
- --etcd.tls=true
- --etcd.tls.ca={{k8s_conf_dir}}/ca-etcd.pem
- --etcd.tls.cert={{k8s_conf_dir}}/cert-etcd.pem
- --etcd.tls.key={{k8s_conf_dir}}/cert-etcd-key.pem
- --etcd.useapiv3=true
- --kubernetes=true
- --kubernetes.watch=true
- --kubernetes.namespaces=default
- --web=true
- --web.readonly
- --web.address=:8080
- --acme=true
- --acme.acmelogging=true
- --acme.caserver=https://acme-staging.api.letsencrypt.org/directory
- --acme.entrypoint=https
- --acme.httpchallenge=true
- --acme.httpChallenge.entryPoint=http
- --acme.email=_YOUR_@_DOMAIN_._TLD_
- --acme.onhostrule
- --acme.storage=/traefik/acme/account
volumes:
- name: tls
secret:
secretName: traefik-etcd
First of all: What is a DaemonSet? A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. For our Traefik pods this means that exactly one Traefik pod will run on every worker node. As I only have a few worker nodes that’s ok for me. If you have tens or hundrets of worker nodes then this makes probably not much sense ;-) But that’s not a problem as you can attach labels to a Kubernetes worker/node and assign a nodeSelector
to your pod configuration to run Traefik DaemonSet only on specific nodes. For more information see Assigning Pods to Nodes documentation.
Now what’s in our Traefik DaemonSet specification? Let me shortly explain the important part’s from top to bottom:
updateStrategy:
type: RollingUpdate
This specifies how the pods of the DaemonSet should be updated. I’ve choosen to use the RollingUpdate
strategy. RollingUpdate
means after you update a DaemonSet template, old DaemonSet pods will be killed, and new DaemonSet pods will be created automatically, in a controlled fashion. You can further fine tune the update process by setting maxUnavailable
(default to 1) and minReadySeconds
(default to 0) as well. maxUnavailable
equals 1
means that only one pod of the whole DaemonSet while be updated and only if the updated one is healthy again the update process start with updating the next pod (also see Perform a Rolling Update on a DaemonSet.
hostNetwork: true
This pods will use the host network directly and not the “pod network” (the term “pod network” is a little bit misleading as there is no such thing - it basically just comes down to routing network packets and namespaces). So we can bind the Traefik ports on the host interface on port 80, 443 and 8080 (see below). That also means of course that no further pods of a DaemonSet can use this ports and of course also no other services on the worker nodes. But that’s what we want here as Traefik is basically our “external” loadbalancer for our “internal” services - our tunnel to the rest of the internet so to say ;-)
serviceAccountName: traefik-ingress-controller
Remember the service account variable we defined above? Here we define that the pods in this DaemonSet should use this service account which is also configured in ClusterRoleBinding
we defined. The ClusterRoleBinding
has the ClusterRole
traefik-ingress-controller
assigned which in turn means that Traefik is allowed to execute all the actions defined in ClusterRole
traefik-ingress-controller
.
dnsPolicy: ClusterFirstWithHostNet
This setting is important. It will configure the Traefik pods to use the Kubernetes cluster internal DNS server (most likely KubeDNS or maybe CoreDNS). That means the pods /etc/resolv.conf
will be configured to use the Kubernetes DNS server. Otherwise the DNS server of the Kubernetes node will be used (basically /etc/resolv.conf
of the worker node but that can’t resolv cluster.local
DNS e.g.).
image: traefik:v1.5-alpine
We’ll use Traefik 1.5 Alpine Linux container.
The livenessProbe
and readinessProbe
are important for the update process to decide if a pod update was successful.
resources
define e.g. how much CPU and RAM resources a pod can aquire (also see Managing Compute Resources for Containers ). You should almost always define limits for your pods!
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: admin
containerPort: 8080
Here we map the containerPort
to the hostPort
and also give the different port’s a name
. Port 80 and 443 should be self explaining. Port 8080
is the admin UI of Traefik. It will bind to the PeerVPN interface by default.
securityContext:
privileged: true
Without this setting we won’t be able to bind Traefik on port 80
and 443
(which is basically true for all services that want to use ports < 1024).
# For CONSUL
volumeMounts:
- name: tls
mountPath: {{consul_tls_dir}}
readOnly: true
# For ETCD
volumeMounts:
- name: tls
mountPath: {{k8s_conf_dir}}
readOnly: true
The traefik
playbook we will execute later will import the Consul or etcd CA, certificate and certificate key file into Kubernetes and store it as a secret. The three files will then be available for Traefik to use in {{consul_tls_dir}}
(which is /etc/consul/ssl
by default) or {{k8s_conf_dir}}
(which is /var/lib/kubernetes
by default) as you can see above (and later in the Traefik options). For this to work we also need a volumes
specification that references the secret that the traefik
playbook created:
# For CONSUL
volumes:
- name: tls
secret:
secretName: traefik-consul
# For ETCD
volumes:
- name: tls
secret:
secretName: traefik-etcd
Now to the options we supply to Traefik itself. If you want to see all options just run:
docker run -it --rm traefik:v1.5-alpine --help
So walk quickly through the options I used in my example:
--entrypoints=Name:http Address::80
--entrypoints=Name:https Address::443 TLS
I guess this is quite obvious: Traefik should listen on port 80 and 443 and for port 443 also enable TLS/SSL. If you want to redirect incoming HTTP requests to HTTPS use this options instead:
--entrypoints=Name:http Address::80 Redirect.EntryPoint:https
--entrypoints=Name:https Address::443 TLS
--consul=true
--consul.prefix=/traefik
--consul.watch=true
--consul.endpoint={{groups.consul_instances|first}}:8443
--consul.tls=true
--consul.tls.ca={{consul_tls_dir}}/ca-consul.pem
--consul.tls.cert={{consul_tls_dir}}/cert-consul.pem
--consul.tls.key={{consul_tls_dir}}/cert-consul-key.pem
Here we enable the Consul backend (where Traeffic stores the Let’s Encrypt certificates and some other configuration settings). We use the prefix /traefik
(that’s basically the root for all further Traefik keys in Consul key/values store). E.g you can get the Traefik leader directly from Consul via consul kv get traefik/leader
. As mentioned above we will use the first hostname in Ansible’s consul_instances
host group as the Consul endpoint. And finally we tell Traefik to use a TLS connection with the certificates and the CA we created above.
If you use etcd as backend you’ll use of course not the consul flags but instead this:
- --etcd=true
- --etcd.prefix=/traefik
- --etcd.watch=true
- --etcd.endpoint={{groups.k8s_etcd|first}}:2379
- --etcd.tls=true
- --etcd.tls.ca={{k8s_conf_dir}}/ca-etcd.pem
- --etcd.tls.cert={{k8s_conf_dir}}/cert-etcd.pem
- --etcd.tls.key={{k8s_conf_dir}}/cert-etcd-key.pem
- --etcd.useapiv3=true
They’re basically similar as we used for Consul. We only add a flag to instruct Traefik to use etcd’s v3 API.
--kubernetes=true
--kubernetes.watch=true
--kubernetes.namespaces=default
This enables Kubernetes backend. We’re only interested in default
namespace ingress events. If you have different namespaces for which you want also ingress add them to this list or remove --kubernetes.namespaces
which causes Traefik to watch all namespaces.
--web=true
--web.readonly
--web.address=:8080
This enables the Traefik UI and bind it on port 8080.
--acme=true
--acme.acmelogging=true
--acme.caserver=https://acme-staging.api.letsencrypt.org/directory
--acme.entrypoint=https
--acme.httpchallenge=true
--acme.httpChallenge.entryPoint=http
--acme.email=_YOUR_@_DOMAIN_._TLD_
--acme.onhostrule
--acme.storage=/traefik/acme/account
If you want automatic TLS/SSL configuration with free certificates from Let’s Encrypt then keep this lines. As Let’s Encrypt has some rate limiting you should keep --acme.caserver=https://acme-staging.api.letsencrypt.org/directory
during testing the configuration. If you are confident that everything works as expected you should remove the Let’s Encrypt settings from etcd as they’re only containing staging data with invalid certificate data. The playbook’s delete_etcd.yml
(if you use etcd as backend) or delete_consul.yml
(for Consul backend) can do this for you and delete only the key’s mentioned by using the tag traefik-etcd-key
and define a variable delete_keys=true
:
ansible-playbook -v --extra-vars="delete_keys=true" -t traefik-etcd-key delete_etcd.yml
But after your’re done with testing and want to go in production it makes sense to remove everything related to Traefik like the DaemonSet, Service, ServiceAccount, … and re-install from scratch. So if you run
ansible-playbook -v --extra-vars="delete_keys=true" delete_etcd.yml
all Kubernetes Traefik resources will be deleted (by default the playbook won’t delete the Let’s Encrypt data from etcd because in production there is no need to do so normally. But with the variable delete_keys=true
specified they’ll also be deleted.
But be aware that you can only create certificates for 20 registered domains per week ATM. A registered domain is, generally speaking, the part of the domain you purchased from your domain name registrar. For instance, in the name www.example.com, the registered domain is example.com. --acme.entrypoint=https
is the entrypoint to proxy acme challenge to.
As mentioned in the changelog above Let’s Encrypt TLS-SNI-01 challenge was permanently disabled which we used so far. So now we use HTTP challenge instead. For this we need the flag’s --acme.httpchallenge=true
and --acme.httpChallenge.entryPoint=http
. If HTTP-01 challenge is used, acme.httpChallenge.entryPoint
has to be defined and reachable by Let’s Encrypt through the port 80. These are Let’s Encrypt limitations (see http://v1-5.archive.docs.traefik.io/configuration/acme/#acmehttpchallenge).
Replace the value of --acme.email=_YOUR_@_DOMAIN_._TLD_
with your e-mail address of course. --acme.onhostrule
will request a Let’s Encrypt certificate if a Ingress
resource provides a host rule (you’ll see a example below). Finally --acme.storage=/traefik/acme/account
will cause that the certificates will be stored in a key/value store backend which is Consul or etcd in our case. So as long as Consul or etcd is available Traefik can fetch the certificates from the backend as long as their TTLs are valid (Traefik will take care about the renewal process of the certificates but it makes sense to have some monitoring of this process to be sure the certificates a replaced in time).
Firewall settings
Before we now roll out the whole thing make sure to open port 80 and 443 so that Let’s Encrypt server can reach the Traefik instances for the acme challenge request. E.g. for the harden-linux role that means to extend the harden_linux_ufw_rules
variable to finally have something like this:
harden_linux_ufw_rules:
- rule: "allow"
to_port: "22222"
protocol: "tcp"
- rule: "allow"
to_port: "7000"
protocol: "udp"
- rule: "allow"
to_port: "80"
protocol: "tcp"
- rule: "allow"
to_port: "443"
protocol: "tcp"
Be aware that it could (and also will) take a few minutes if a new certificate is requested for the first time from Let’s Encrypt! Have a look at the pod logs of the DaemonSet regarding the registration process (e.g. kubectl logs traefik-ingress-controller-XXXXX -f --namespace=kube-system
).
Before we now install all Kubernetes Traefik resources make sure that you have specified the following variables in group_vars/all.yml
:
etcd_version: "3.1.12"
etcd_bin_dir: "/usr/local/bin"
etcd_client_port: "2379"
The latest supported and tested etcd version for Kubernetes is 3.1.12
for Kubernetes 1.10.x. The playbooks also need to locate etcdctl
utlity on the first etcd host and it will the binary in the directory specifed in etcd_bin_dir
. And the playbooks als needs the etcd_client_port
.
To get the playbook I created to install Traefik and all it’s required resources clone my ansible-kubernetes-playbooks repository e.g.:
git clone https://github.com/githubixx/ansible-kubernetes-playbooks.git
Then
cd traefik
and run the playbook. If you want to use ‘etcd’ as Traefik backend then run
ansible-playbook install_or_update_etcd.yml
or if you want to use Consul
run
ansible-playbook install_or_update_consul.yml
This will install all the resources we defined above and of course the Traefik DaemonSet. The Traefik UI should be available on all worker nodes on port 8080 on the PeerVPN interface shortly after the playbook ran successfully.
You can easily access the Traefik UI now using e.g.
kubectl port-forward traefik-ingress-controller-wqsxt 8080:8080 --namespace=kube-system
Of course replace traefik-ingress-controller-wqsxt
with the name of one of your Traefik pods (use kubectl get pods --namespace=kube-system
to get a list of pods in kube-system
namespace).
Also
kubectl logs traefik-ingress-controller-wqsxt --namespace=kube-system
will show you the logs of pod “traefik-ingress-controller-wqsxt” (again replace with one of your Traefik pod names).
Be aware that Traeffic is very picky on the one hand but also isn’t very good sometimes telling you what you did wrong ;-) So you maybe need to experiment a little bit until you get what you want. At least the options above should work as I’ve tested them quite a few times… Also the --debug
option could help to get more information.
Example deployment and service
Now that we have deployed our Traefik loadbalancer we want of course expose a website e.g. to the world. Before you proceed make sure that the DNS entry of the domain you want to expose to the outside points to one of the Traefik instances or create a round robin DNS entry and point the same domain name to three different A records e.g. This step is especially important if you configured Traefik to automatically create Let’s Encrypt certificates as the Let’s Encrypt server will contact the Traefik instances to verify that you own the domain (ACME challenge responses).
The first thing we need is a Deployment. As an example we will deploy 2 nginx webserver. Let’s assume you own the domain example.com
and the 2 nginx server should deliver that site.
Here is a example deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-com
namespace: default
labels:
app: example-com
spec:
replicas: 2
selector:
matchLabels:
app: example-com
template:
metadata:
labels:
app: example-com
spec:
containers:
- name: nginx
image: nginx:1.12.0
ports:
- containerPort: 80
As you can see in the metadata we name the deployment example-com
(but could be any name you like of course) and it will be deployed in the default
namespace. There will be two pods as stated in replicas: 2
. The selector
field defines how the Deployment knows what Pods to manage. In this case, we simply select on one label defined in the Pod template: app: example-com
. The Pod template
’s specification, or template: spec
field, indicates that the Pods run one container, nginx
, which runs the nginx Docker Hub image at version 1.12.0
.
To roll out the example above copy the example above, open your favorite text editor and adjust to your needs. Save it as deployment.yml
and create the deployment via
kubectl apply -f deployment.yml
Verify that the pods are up and running e.g.:
kubectl get pods -l 'app=example-com' --namespace=default
NAME READY STATUS RESTARTS AGE IP NODE
example-com-d8d7c48c4-j7brl 1/1 Running 0 7d 10.200.25.15 k8s-worker2
example-com-d8d7c48c4-vdb4k 1/1 Running 0 7d 10.200.5.14 k8s-worker1
Next we need a Service
. Here is an example:
kind: Service
apiVersion: v1
metadata:
name: example-com
namespace: default
spec:
selector:
app: example-com
ports:
- name: http
port: 80
A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them - sometimes called a micro-service. The set of Pods targeted by a Service is (usually) determined by a Label Selector.
As you already guessed the Service will be called example-com
and lives in the default
namespace. We defined a Pod
selector with app: example-com
. The incoming Service
port is 80
and we name it http
. If you don’t specifiy a targetPort
then targetPort
is equal to port
. So if your Pod
ports are e.g. 8080
you also need to add targetPort: 8080
to .service.spec.ports
.
If you don’t specifiy a clusterIP
you get a stable IP address automatically for this service (it lasts as long as the service exists).
To roll out the service example above again copy the example, open your favorite text editor and adjust to your needs. Save it as service.yml
and create the service via
kubectl apply -f service.yml
Verify that the service was created:
kubectl get svc --namespace=default
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-com ClusterIP 10.32.226.13 <none> 80/TCP 7d
Prepare DNS
Before we create the Ingress resource make (again) sure that your DNS entries point to the correct IP address. In our example above that means that we need to make sure that example.com
and www.example.com
are pointing to the public IP of one of the worker nodes where Traefik runs on and that port 80 and 443 are not blocked at the firewall.
Example ingress
Finally we define the Kubernetes Ingress resource:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: example-com
namespace: default
annotations:
kubernetes.io/ingress.class: "traefik"
spec:
rules:
- host: example.com
http:
paths:
- path: /
backend:
serviceName: example-com
servicePort: 80
- host: www.example.com
http:
paths:
- path: /
backend:
serviceName: example-com
servicePort: 80
Again we name the Ingress
resource name: example-com
. The kubernetes.io/ingress.class
annotation can be attached to any Ingress object in order to control whether Traefik should handle it. If the annotation is missing, contains an empty value, or the value traefik
, then the Traefik controller will take responsibility and process the associated Ingress
object. If the annotation contains any other value (usually the name of a different Ingress controller), Traefik will ignore the object. In ingress.spec.rules
we defined two host rules: One for example.com
and one for www.example.com
. Both define that all requests (paths: /
) should go to the service example-com
that we defined above on port 80.
To roll out the Ingress example copy the example above, open your favorite text editor and adjust to your needs. Save it as ingress.yml
and create the deployment via
kubectl apply -f ingress.yml
That’s it basically! You should now be able to curl http://example.com
and get the default nginx homepage. A few minutes later you should also be able to curl https://example.com
. As already mentioned it takes Traefik a few minutes to get the SSL certificate from Let’s Encrypt but as soon as it is stored in Consul the renewel process shoudn’t interrupt normal website operations.
As Consul or etcd now contains quite important data (your SSL certificate(s)) you should think about a backup and restore process and desaster recovery ;-).
You probably already figured out that the whole setup is ok so far but not perfect. If you point your website DNS recored to one of the Traefik instances (which basically means to one of the Traefik DaemonSet members) and the host die’s you’re out of business for a while. Also if you used DNS round robin and distribute the requests to all Traefik nodes you still have the problem if one node fails you loose at least the requests to this nodes. keepalived-vip could be a solution to that problem but haven’t looked into it. If you can change your DNS records via API (which is the case for Google Cloud DNS e.g.) you could deploy a Kubernetes Cron Job that monitors all Traefik instances and changes DNS records if one of the nodes fail or you can implement a watchdog functionality yourself and deploy the program as pod into your K8s cluster. The journey doesn’t end here ;-)
Next up: Kubernetes the Not So Hard Way With Ansible at Scaleway Part 9 - Upgrading Kubernetes