Kubernetes the not so hard way with Ansible (at Scaleway) - Part 5 - etcd cluster [updated for Kubernetes v1.10.x]

Install etcd cluster needed to store state of Kubernetes components

January 17, 2017

CHANGELOG

2018-09-05

  • I’ll no longer update this text as I migrated my hosts to Hetzner Online because of constant network issues wit h Scaleway. I’ve created a new blog series about how to setup a Kubernetes cluster at Hetzner Online but since my Ansible playbooks are not provider depended the blog text should work for Scaleway too if you still want to use it.

2018-06-12

  • supported etcd release for Kubernetes v.1.10.x is v3.1.12
  • parameter for etcdctl endpoint status ... changed (see end of this page) as client endpoint changed from http to https

2018-01-18

  • add note in variables that Kubernetes v1.9.1 supports/is tested with etcd v3.1.11
  • fix bug etcd_data_dir variable missing

2018-01-14

  • introduce flexible etcd parameter settings via etcd_settings/etcd_settings_user variables. This way all flags/settings of the current and future etcd version’s can be set and there is no need to adjust the etcd systemd service file template with every release.

2018-01-03

  • updated etcd to 3.2.13
  • added new etcd flags (see role variables below)
  • change default for k8s_ca_conf_directory (see role variables below). If you already defined k8s_ca_conf_directory by yourself in group_vars/k8s.yml or group_vars/all.yml nothing changes for you.
  • more documentation for role variables

2017-10-08

  • updated etcd to 3.2.8
  • rename local_cert_dir to k8s_ca_conf_directory and change default location
  • smaller changes needed for Kubernetes v1.8

This post is based on Kelsey Hightower’s Kubernetes The Hard Way - Bootstrapping the etcd cluster.

In part 4 we installed our PKI (public key infrastructure) in order to secure communication between our Kubernetes components/infrastructure. Now we use the certifcate authorities (CA) and generated keys for the first and very important component - the etcd cluster. etcd is basically a distributed key/value database. The Kubernetes components are stateless. All state is stored in etcd so you should take care of your etcd cluster in production. If you loose all etcd nodes you loose the whole Kubernetes state… So making a snapshot/backup from time to time is - at least - recommended ;-)

I want to mention that if your etcd nodes won’t join then a possible reason could be the certificate. If it isn’t your firewall blocking traffic between your etcd nodes the certifcate’s host list could be the problem. The error message isn’t always clear about the issue.

As usual we add the role ansible-role-etcd to the k8s.yml file e.g.:

  hosts: k8s_etcd
  roles:
    -
      role: githubixx.etcd
      tags: role-etcd

Next install the role via

ansible-galaxy install githubixx.etcd

(or just clone the Github repo whatever you like). Basically you don’t need to change a lot of variables but you can if you want of course:

# The directory from where to copy the K8s certificates. By default this
# will expand to user's LOCAL $HOME (the user that run's "ansible-playbook ..."
# plus "/k8s/certs". That means if the user's $HOME directory is e.g.
# "/home/da_user" then "k8s_ca_conf_directory" will have a value of
# "/home/da_user/k8s/certs".
k8s_ca_conf_directory: "{{ '~/k8s/certs' | expanduser }}"

# etcd version (Kubernetes 1.10.x supports etcd 3.1.12. So you may
# change this value accordingly if you use etcd for Kubernetes -
# see: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md)
etcd_version: "3.1.12"
# Port where etcd listening for clients
etcd_client_port: "2379"
# Port where etcd is listening for it's peer's
etcd_peer_port: "2380"
# Interface to bind etcd ports to
etcd_interface: "tap0"
# Directroy for etcd configuration
etcd_conf_dir: "/etc/etcd"
# Directory to store downloaded etcd archive
# Should not be deleted to avoid downloading over and over again
etcd_download_dir: "/opt/etcd"
# Directroy to store etcd binaries
etcd_bin_dir: "/usr/local/bin"
# etcd data directory (etcd database files so to say)
etcd_data_dir: "/var/lib/etcd"

# etcd flags/settings. This parameters are directly passed
# to "etcd" daemon during startup. To see all possible settings/flags
# either run "etcd --help" or have a look at the documentation.
# The dictionary keys below are just the flag names without "--". E.g.
# the first "name" flag below will be passed as "--name=whatever-hostname"
# to "etcd" daemon.
etcd_settings:
  "name": "{{ansible_hostname}}"
  "cert-file": "{{etcd_conf_dir}}/cert-etcd.pem"
  "key-file": "{{etcd_conf_dir}}/cert-etcd-key.pem"
  "peer-cert-file": "{{etcd_conf_dir}}/cert-etcd.pem"
  "peer-key-file": "{{etcd_conf_dir}}/cert-etcd-key.pem"
  "peer-trusted-ca-file": "{{etcd_conf_dir}}/ca-etcd.pem"
  "peer-client-cert-auth": "true" # # Enable peer client cert authentication
  "client-cert-auth": "true" # Enable client cert authentication
  "trusted-ca-file": "{{etcd_conf_dir}}/ca-etcd.pem"
  "advertise-client-urls": "{{'https://' + hostvars[inventory_hostname]['ansible_' + etcd_interface].ipv4.address + ':' + etcd_client_port}}"
  "initial-advertise-peer-urls": "{{'https://' + hostvars[inventory_hostname]['ansible_' + etcd_interface].ipv4.address + ':' + etcd_peer_port}}"
  "listen-peer-urls": "{{'https://' + hostvars[inventory_hostname]['ansible_' + etcd_interface].ipv4.address + ':' + etcd_peer_port}}"
  "listen-client-urls": "{{'https://' + hostvars[inventory_hostname]['ansible_' + etcd_interface].ipv4.address + ':' + etcd_client_port + ',https://127.0.0.1:' + etcd_client_port}}"
  "initial-cluster-token": "etcd-cluster-0" # Initial cluster token for the
                                            # etcd cluster during bootstrap.
  "initial-cluster-state": "new"   # Initial cluster state ('new' or
                                   # 'existing')
  "data-dir": "{{etcd_data_dir}}"  # etcd data directory (etcd database files)
  "wal-dir": ""                    # Dedicated wal directory ("" means no
                                   # seperated WAL directory)
  "auto-compaction-retention": "0" # Auto compaction retention in hour.
                                   # 0 means disable auto compaction.
  "snapshot-count": "100000"  # Number of committed transactions to
                              # trigger a snapshot to disk
  "heartbeat-interval": "100" # Time (in milliseconds) of a
                              # heartbeat interval
  "election-timeout": "1000"  # Time (in milliseconds) for an election
                              # to timeout. See tuning documentation
                              # for details
  "max-snapshots": "5"        # Maximum number of snapshot files to
                              # retain (0 is unlimited)
  "max-wals": "5"             # Maximum number of wal files to
                              # retain (0 is unlimited)
  "cors": ""                  # Comma-separated whitelist of origins
                              # for CORS (cross-origin resource sharing)
  "quota-backend-bytes": "0"  # Raise alarms when backend size exceeds
                              # the given quota (0 defaults to low
                              # space quota)
  "log-package-levels": ""    # Specify a particular log level for
                              # each etcd package
                              # (eg: 'etcdmain=CRITICAL,etcdserver=DEBUG')
  "log-output": "default"     # Specify 'stdout' or 'stderr' to skip
                              # journald logging even when running under
                              # systemd

# Certificate authority and certificate files for etcd
etcd_certificates:
  - ca-etcd.pem        # client server TLS trusted CA key file/peer
                       # server TLS trusted CA file
  - ca-etcd-key.pem    # CA key file
  - cert-etcd.pem      # peer server TLS cert file
  - cert-etcd-key.pem  # peer server TLS key file

The etcd default flags/settings defined in etcd_settings can be overriden by defining a variable called etcd_settings_user. You can also add additional settings by using this variable. E.g. to override the default value for log-output seting and add a new setting like grpc-keepalive-min-time add the following settings to group_vars/k8s.yml:

etcd_settings_user:
  "log-output": "stdout"
  "grpc-keepalive-min-time": "10s"

The role will search for the certificates we created in part 4 in the directory you specify in k8s_ca_conf_directory on the host you run Ansible. The files used here are listed in etcd_certificates. If you used a different name for the PeerVPN interface we created in part 3 you want to change etcd_interface too.

We can deploy the role now via

ansible-playbook --tags=role-etcd k8s.yml

This will install the etcd cluster and start the etcd daemons. Have a look at the logs of your etcd hosts if everything worked and the etcd nodes are connected. Use journalct --no-pager or journalctl -f or journalctl -t etcd to check the systemd log.

Log into one of the etcd nodes and check the cluster status e.g. via

# The value of this variable should be the value of Ansible variable "k8s_ca_conf_directory"
export CERTIFICATE_DIR="/path/where/your/certificates/are/located"

ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://etcd-node1:2379,https://etcd-node2:2379,https://etcd-node3:2379 \
  --cacert=${CERTIFICATE_DIR}/ca-etcd.pem \
  --cert=${CERTIFICATE_DIR}/cert-etcd.pem \
  --key=${CERTIFICATE_DIR}/cert-etcd-key.pem

Of course replace etcd-node(1-3) with your etcd node names or IPs. You should see now a output similar to this:

https://etcd-node1:2379 is healthy: successfully committed proposal: took = 9.416983ms
https://etcd-node2:2379 is healthy: successfully committed proposal: took = 6.206849ms
https://etcd-node3:2379 is healthy: successfully committed proposal: took = 8.409447ms

Next we’ll install the Kubernetes control plane.