Kubernetes the not so hard way with Ansible - Certificate authority (CA) - (K8s v1.28)

2018-09-04 2024-01-03 2750 words 13 minutes

Contents

Introduction

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Installing the Client Tools and Kubernetes The Hard Way - Provisioning a CA and Generating TLS Certificates.

Now that I’ve done some preparation for our Kubernetes cluster

I need a PKI (public key infrastructure) to secure the communication between the Kubernetes components.

Install kubectl

I’ll use CloudFlare’s CFSSL PKI toolkit to bootstrap certificate authority’s and generate TLS certificates. ansible-role-cfssl will generate a few files for that purpose. You can generate the files on any host you want but I’ll use a directory on my workstation that runs Ansible because other roles need to copy a few of the generated files to the Kubernetes hosts later. So it makes sense to have the files at a place where Ansible has access (but of course you can also use a network share or something like that).

First I install the most important Kubernetes utility called kubectl. I’ll configure it later. At the moment I just install it. I’ve created a Ansible role to install kubectl locally. Add the following content to Ansible’s host file:

yaml

k8s_kubectl:
  hosts:
    k8s-01-ansible-ctrl.i.example.com:
      ansible_connection: local

k8s-01-ansible-ctrl is the hostname of my local workstation/laptop. Actually if ansible_connection: local or ansible_host is specified then the hostname doesn’t really matter. You can call it even bob or sam 😉. But of course Ansible uses this name internally and it’s also relevant how the host variables file in host_vars is called. So in my case an Ansible host_vars file look like this and has only one entry (host_vars/k8s-01-ansible-ctrl.i.example.com):

yaml

---
ansible_python_interpreter: "/opt/scripts/ansible/k8s-01_vms/bin/python"

This makes sure that the python binary of my Python venv environment will be used when tasks are executed on this host.

As already mentioned in the previous part my workstation could be part of the WireGuard fully meshed network that connects every Kubernetes node to all the other nodes. So I’d be able to access the Kubernetes API server (kube-apiserver) via VPN and don’t need SSH forwarding or make kube-apiserver available to my network or stuff like that to make kubectl work. But I decided not to do so and make kube-apiserver available to my internal network by binding the service to all network interfaces. The connection to kube-apiserver is encrypted anyways via TLS. Additionally firewall rules can be applied so that only some hosts are allowed to connect to kube-apiserver.

Then install the role with

bash

ansible-galaxy install githubixx.kubectl

The role has a few variables you can change if you like (normally not needed). Just add the variables and values you want to change to host_vars/k8s-01-ansible-ctrl.i.example.com (if that is the name of your workstation) or where it fit’s best for you. To get an overview see the kubectl role homepage at Github.

To finally deploy kubectl binary simply run

bash

ansible-playbook --tags=role-kubectl k8s.yml

Setup cfssl

Next we add a additional entry to the Ansible hosts file:

yaml

k8s_ca:
  hosts:
    k8s-01-ansible-ctrl.i.example.com:
      ansible_connection: local

k8s_ca (short for kubernetes certificate authority) is an Ansible host group (in this case the group contains only one host). As you can see it’s again my workstation/laptop. It will store all certificate authority files.

Lets install the cfssl role via

bash

ansible-galaxy install githubixx.cfssl

Add

yaml

- hosts: k8s_ca
  roles:
    -
      role: githubixx.cfssl
      tags: role-cfssl

to your k8s.yml file. This adds the role githubixx.cfssl to the hosts group k8s_ca (which is only one host in my case as already mentioned). Have a look at README file of that role for all variables you can change.

Now we can install the cfssl binaries via

bash

ansible-playbook --tags=role-cfssl k8s.yml

Setup certificate authorities

Next I generate the certificate authorities (CA) for etcd and Kubernetes to secure the communication between the services. DigitalOcean provides a good diagram of the Kubernetes operations flow:

(from Using Vault as a Certificate Authority for Kubernetes). Have a look at the diagram to get a better understanding of the K8s communication workflow.

As always I’ve prepared a Ansible role to generate the certificate authorities and certificates. Install the role via

bash

ansible-galaxy install githubixx.kubernetes_ca

Add the role to k8s.yml:

yaml

- hosts: k8s_ca
  roles:
    -
      role: githubixx.kubernetes_ca
      tags: role-kubernetes-ca

As with the cfssl role this role will also be applied to the Ansible k8s_ca host (which is again my workstation/laptop as you may remember from above).

This role has quite a few variables. But that’s mainly information needed for the certificates like algorithm (algo) and key size used, country (C), location (L), organization (O), organizational unit (OU) or state (ST). You can read more about for how and for what the certificates are used in How certificates are used by your cluster.

In contrast to Kelsey’s Hightower’s guide Provisioning a CA and Generating TLS Certificates I create a different certificate authority for etcd and Kubernetes API server (kube-apiserver). Since only Kubernetes API server talks to etcd directly it makes sense not to use the same CA for etcd and Kubernetes API server to sign certificates. This adds an additional layer of security. All variables are documented at the kubernetes-ca role homepage at Github. So I’ll only discuss the important parts here.

I’ll put all variables for this role into group_vars/k8s_ca.yml as most of them are just used by the Ansible Controller node. There are very few exceptions and I’ll mention them accordingly in the following text.

k8s_ca_conf_directory specifies where to store the certificates. I created a directory certificates in my Python venv. So the value for this variable is /opt/scripts/ansible/k8s-01_vms/certificates in my case. With k8s_ca_conf_directory_perm, k8s_ca_file_perm, k8s_ca_certificate_owner and k8s_ca_certificate_group you can specify who owns that directory and what permissions the directory and the files should have. Since k8s_ca_conf_directory is used by a few roles that target different hosts I’ll put this variable into group_vars/all.yml.

Some certificates need to include the IP addresses and the host names (also see All certificates). So this role needs to know what’s the Ansible hosts group for the Kubernetes Controller (k8s_ca_controller_nodes_group), the hosts group for Kubernetes Worker (k8s_ca_worker_nodes_group) and for the etcd hosts (k8s_ca_etcd_nodes_group). By default the values are k8s_controller, k8s_worker and k8s_etcd and that’s the hosts groups I already used before. It also needs to know the interface name specified by k8s_interface. In my case it the WireGuard interface wg0. This way the role can figure out the IP addresses and host names involved and can include this information in the certificates where necessary. As with k8s_ca_conf_directory I’ll also put k8s_interface into group_vars/all.yml.

ca_etcd_expiry and ca_k8s_apiserver_expiry are set to 87600h by default. That’s ten years. That the time after which the root certificate authorities of etcd and kube-apiserver will expire.

In general all these (ca|etcd|k8s)_*_csr_names_(c|l|ou|st) settings are mostly for informational purposes. So for all *_csr_names_c variables enter your country code e.g. US, DE, or whatever. Accordingly for *_csr_names_l enter your location e.g. New York, Berlin, and so on. For *_csr_names_ou enter an organizational unit like Engineering Department and for *_csr_names_st enter a state like California, Bayern, and so on. The default value for all *_csr_key_algo variables is rsa and for *_csr_key_size it’s 2048. These values should be just fine (also see Manage TLS Certificates in a Cluster). An alternative value for all *_csr_key_algo variables would be ecdsa. ecdsa (Elliptic Curve Digital Signature Algorithm) creates smaller files and it makes use of something called “Elliptic Curve Cryptography” (ECC). ecdsa is already around since quite a few years (maybe 15+ years). rsa is way older. So as long as you don’t use some quite dated software that needs to connect to kube-apiserver it should be fine to go with ecdsa. In this case the values for *_csr_key_size can be 256, 384 and 512. I’m using ecdsa and a key size of 384 for my certificates. For rsa the key size can be 2048 up to 8192 (you normally increase by a multiple of 1024 so 4096 should be also a valid value).

Lets talk about the values that shouldn’t be changed. First is k8s_admin_csr_names_o: "system:masters". The role will create a certificate for the “admin” user. That’s basically the very first and most powerful certificate/user. Note: There is actually nothing like a Kubernetes user. It’s is just certificates that identifies you when you connect to kube-apiserver. But normally just because you have a certificate doesn’t mean that you can to much 😉 That’s different for the “admin” certificate as it specifies that you’re in the system:masters group and that basically means superpowers. So it makes sense to keep this certificate in a secure place and create a new certificate/user right after the Kubernetes cluster is setup. For more information see User-facing roles.

Then there is k8s_worker_csr_names_o: "system:nodes". kubelet runs on every worker node. These processes needs to be in the system:nodes group. They’ll also get a username system:node:<nodeName>. For more information see Using Node Authorization.

Next there is k8s_kube_proxy_csr_cn: "system:kube-proxy", k8s_kube_proxy_csr_names_o: "system:node-proxier", k8s_scheduler_csr_cn: "system:kube-scheduler", k8s_scheduler_csr_names_o: "system:kube-scheduler", k8s_controller_manager_csr_cn: "system:kube-controller-manager" and k8s_controller_manager_csr_names_o: "system:kube-controller-manager". There we’ve a “common name” (cn) and an organization (o). The cn specifies the Default ClusterRole and o the Default ClusterRoleBinding. cn is basically the “who” and o “what” is allowed. Kubernetes has some default ClusterRoles and ClusterRoleBindings. This way the services identifies them accordingly and have the permissions they need to do their job. For more information see Core Components.

Note: Short version: It’s very important that the value of ca_k8s_apiserver_csr_cn and k8s_apiserver_csr_cn are different! Otherwise Python’s urllib3 won’t connect to kube-apiserver without using insecure-skip-tls-verify: true in kubeconfig.

Long version: Since this urllib3 is also used by the requests library (and most probably a lot others) for HTTP(s) connections that means that for example Python’s kubernetes library and therefore Ansible’s k8s_*s modules wont work without issues. While Python <= v3.8 worked fine this is no longer true with Python >= 3.9. For HTTPs connections Python uses OpenSSL which issues an error if a self-signed certificate doesn’t contain the X509v3 Authority Key Identifier in the X509v3 extensions. Trying to connect to kube-apiserver with Ansible’s k8s_* modules will show an error like this (e.g. when trying to retrieve information about K8s namespaces):

plain

Max retries exceeded with url: /api/v1/namespaces (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))

cfssl is used for creating the certificates in githubixx.kubernetes_ca role. It’s written in Go. And the README states: As of Go 1.7, self-signed certificates will not include the AKI. (AKI = Authority Key Identifier). But as I figured out here it is possible to get the AKI included in self-signed certificates if the issuer and subject common name are different. That makes OpenSSL used by Python >= 3.9 happy again 😉 I wanted to write that down as it really took me hours to figure out. At the very end of this blog post I’ll also show a command to verify the certificates. But lets continue with the more important stuff…

etcd_cert_hosts contains a list of additional IP addresses or host names you’d like to include into the etcd certificate. As mentioned above already the certificate wont only include the values specified here but also also some automatically collected IP addresses and hostnames. If you intend to install a loadbalancer for the etcd services running then you should also include the IP address and the DNS name of that loadbalancer. Otherwise you most probably will get certificate errors. Note: Currently the role doesn’t include the IPv6 addresses. So you might need to include them manually.

The same is basically true for k8s_apiserver_cert_hosts. Later I’ll install haproxy loadbalancer on all nodes. For all services that needs to connect to kube-apiserver like kubelet, kube-scheduler, and so on I’ll configure the loadbalancer as target for kube-apiserver. So in case one kube-apiserver goes down for maintenance e.g. the loadbalancer can switch to one of the remaining two kube-apiserver. In my case haproxy will listen on localhost on every host. As you can see it’s already included in the default list. But if you’ve some hardware loadbalancer that handles your load balancing needs then include that IP and DNS name too. Also if you use Kubernetes as OIDC provider you might also want include the IP and/or hostname you specify in the --service-account-issuer flag for kube-apiserver later (e.g. api.k8s-01.example.com) Note: Maybe you’re wondering what this IP 10.32.0.1 is all about. Actually it’s the first IP address of the IP range specified in the --service-cluster-ip-range option for kube-apiserver (happens in one of the next blog posts). The default service cluster IP range is 10.32.0.0/16 (see kubernetes-controller role). E.g. if the Kubernetes cluster was successfully deployed and one executes ping kubernetes or ping kubernetes.default.svc.cluster.local (if you keep the default cluster.local domain) within a Pod the commands will return 10.32.0.1. So that’s basically the “internal” IP address of the kube-apiservers. Actually it’s a Service IP which is a load balancer managed by Kubernetes at the end. That’s why it is important to have this IP in the list and also the other kubernetes.default variants.

And finally there is etcd_additional_clients. This list should contain all etcd clients that wants to connect to the etcd cluster. The most important client is kube-apiserver of course. So you definitely want to keep k8s-apiserver-etcd in this list. But I’ll also generate certificates for Cilium and Traefik which I’ll install later (it’s documented in the roles README). Cilium will be my solution for Kubernetes networking and Traefik for everything Ingress related (allows external users to access your services running in the Kubernetes cluster). While you can also install an etcd cluster for each of these services it’s additional effort of course. And since I already have one around it makes sense to use it 😉 Security wise this might be an issue for some environments as one needs to allow connections from Kubernetes worker nodes to etcd cluster. So you’ve to decide if this is acceptable or not. So in my case it’ll look like this:

yaml

etcd_additional_clients:
  - k8s-apiserver-etcd
  - traefik 
  - cilium

If you’re done with setting all variables the CSRs and the certificates can be generated via

bash

ansible-playbook --tags=role-kubernetes-ca k8s.yml

This only runs the Ansible kubernetes_ca role which was tagged as role-kubernetes-ca. After running the role there will quite a few files in k8s_ca_conf_directory. The filenames should give a good hint whats the content of these files and for what a file is used (also see the defaults/main.yml file of the role for more information). Here is an overview which files you should at least get:

bash

ca-etcd-config.json
ca-etcd.csr
ca-etcd-csr.json
ca-etcd-key.pem
ca-etcd.pem
ca-k8s-apiserver-config.json
ca-k8s-apiserver.csr
ca-k8s-apiserver-csr.json
ca-k8s-apiserver-key.pem
ca-k8s-apiserver.pem
cert-admin.csr
cert-admin-csr.json
cert-admin-key.pem
cert-admin.pem
cert-cilium.csr
cert-cilium-csr.json
cert-cilium-key.pem
cert-cilium.pem
cert-etcd-peer.csr
cert-etcd-peer-csr.json
cert-etcd-peer-key.pem
cert-etcd-peer.pem
cert-etcd-server.csr
cert-etcd-server-csr.json
cert-etcd-server-key.pem
cert-etcd-server.pem
cert-k8s-010102.i.example.com.csr
cert-k8s-010102.i.example.com-csr.json
cert-k8s-010102.i.example.com-key.pem
cert-k8s-010102.i.example.com.pem
cert-k8s-010103.i.example.com.csr
cert-k8s-010103.i.example.com-csr.json
cert-k8s-010103.i.example.com-key.pem
cert-k8s-010103.i.example.com.pem
cert-k8s-010202.i.example.com.csr
cert-k8s-010202.i.example.com-csr.json
cert-k8s-010202.i.example.com-key.pem
cert-k8s-010202.i.example.com.pem
cert-k8s-010203.i.example.com.csr
cert-k8s-010203.i.example.com-csr.json
cert-k8s-010203.i.example.com-key.pem
cert-k8s-010203.i.example.com.pem
cert-k8s-010302.i.example.com.csr
cert-k8s-010302.i.example.com-csr.json
cert-k8s-010302.i.example.com-key.pem
cert-k8s-010302.i.example.com.pem
cert-k8s-010303.i.example.com.csr
cert-k8s-010303.i.example.com-csr.json
cert-k8s-010303.i.example.com-key.pem
cert-k8s-010303.i.example.com.pem
cert-k8s-apiserver.csr
cert-k8s-apiserver-csr.json
cert-k8s-apiserver-etcd.csr
cert-k8s-apiserver-etcd-csr.json
cert-k8s-apiserver-etcd-key.pem
cert-k8s-apiserver-etcd.pem
cert-k8s-apiserver-key.pem
cert-k8s-apiserver.pem
cert-k8s-controller-manager.csr
cert-k8s-controller-manager-csr.json
cert-k8s-controller-manager-key.pem
cert-k8s-controller-manager.pem
cert-k8s-controller-manager-sa.csr
cert-k8s-controller-manager-sa-csr.json
cert-k8s-controller-manager-sa-key.pem
cert-k8s-controller-manager-sa.pem
cert-k8s-proxy.csr
cert-k8s-proxy-csr.json
cert-k8s-proxy-key.pem
cert-k8s-proxy.pem
cert-k8s-scheduler.csr
cert-k8s-scheduler-csr.json
cert-k8s-scheduler-key.pem
cert-k8s-scheduler.pem
cert-traefik.csr
cert-traefik-csr.json
cert-traefik-key.pem

For the curious ones 😉 : If you would like to know what’s in the certificate files (the .pem files) you can use this command:

bash

openssl x509 -noout -text -in cert-k8s-apiserver.pem

This will show you the “content” of the file in plain text. E.g. it’ll show you (in case of cert-k8s-apiserver.pem) the X509v3 Subject Alternative Name. It contains a list of all IP addresses and host names that were included in this certificate. Esp. for cert-k8s-apiserver.pem it is important to have all IP addresses and host names listed that you want to use later to connect to that service! For other .pem files it looks different. There the Subject key is the more relevant one. This also offers the possibility to check if everything needed is included in the certificate files before deploying them.

And as promised above already here is a command that helps you to verify if OpenSSL is fine with your certificates. It checks if the Certificate Authority (CA) and the certificate (normally the .pem) files matches. E.g.:

bash

openssl verify -verbose -x509_strict -CAfile ca-k8s-apiserver.pem cert-k8s-apiserver.pem

As mentioned above if ca_k8s_apiserver_csr_cn and k8s_apiserver_csr_cn values are the same you’ll get this error:

plain

C=DE, ST=Bayern, L=The_Internet, O=Kubernetes, OU=BY, CN=kubernetes
error 18 at 0 depth lookup: self-signed certificate
error cert-k8s-apiserver.pem: verification failed

While kubectl utility is fine with that, Python >= 3.9, OpenSSL and urllib3 are not 😉 But this can be fixed as mentioned above. Then just deploy the role again and then it should be fine - hopefully.

That’s it for now. In the next chapter I’ll install the etcd cluster and use the first CA and certificates that were generated in this part.