- I’ll no longer update this text as I migrated my hosts to Hetzner Online because of constant network issues with Scaleway. I’ve created a new blog series about how to setup a Kubernetes cluster at Hetzner Online but since my Ansible playbooks are not provider depended the blog text should work for Scaleway too if you still want to use it. The new blog post is here.
- mention possibility to use Scaleway dynamic inventory for Ansible to discover hosts
k8s.yml file to install
kubectl tool locally.
v2.0.0 - Major refactoring of the Ansible role. Use https://github.com/githubixx/ansible-role-harden-linux/tree/v1.0.0 if you need the old version.
v1.0.0 - Initial blog post and Ansible role.
In part 1 we created five hosts (or eight hosts if you setup hosts to run etcd cluster on it’s own hosts) for our Kubernetes controller and worker nodes at Scaleway and did a first introduction into Ansible configuration management. As I mentioned in part 1 forget about the Scaleway security groups. They don’t really make much sense. We’ll address this in this post.
All our hosts at Scaleway have public IPs (just execute Scaleway CLI):
to see the public IPs of your hosts). You can grab the public DNS entry of a host also via Scaleway CLI e.g.
scw --region="ams1" inspect server:k8s-controller1 | jq '..dns_public'
if you followed my server naming in part 1 of this tutorial (needs
jq utility installed, for Paris region use
--region="par1"). You can use this public DNS records in your Ansible
hosts file or you create your own DNS entries now by pointing to your public instance IPs or maybe create a CNAME record pointing to the public DNS record Scaleway created for you.
To make our Kubernetes nodes a litte bit more secure we’ll add some basic security settings and of course we’ll use Ansible to roll it out. Ansible directory layout best practice will tell you to have a separate inventory file for production and staging server. Since I don’t have a staging system I just create a single
hosts file now e.g.:
[k8s_kubectl] localhost ansible_connection=local ansible_become_user=_YOUR-USER_ [k8s_ca] localhost ansible_connection=local ansible_become_user=_YOUR-USER_ [k8s_etcd] controller[1:3].your-domain.tld [k8s_controller] controller[1:3].your-domain.tld [k8s_worker] worker[1:2].your-domain.tld [k8s:children] k8s_controller k8s_worker
Adjust the file to your needs of course! The headings in brackets are group names, which are used to classify systems. As you can see you can use ranges
controller[1:3]... instead of specifing every node here. We also create a group of host groups for all our Kubernetes hosts called
[k8s:children]. Regarding the entries here:
[k8s_kubectl] group (well in this case there is only one host in the group…) is used to execute THE Kubernetes control utility called
kubectl later. E.g. if you configured
kubectl correctly you can run it directly on the shell on your workstation or through Ansible which in turn executes
kubectl locally. That’s why we set
_YOUR-USER_ for the
ansible_become_user with the username which you use on your workstation.
kubectl will store it’s configuration in
$HOME/.kube/config by default. So if Ansible starts
kubectl as user
root it will search in the wrong $HOME directory! That’s why it is important to tell Ansible to execute
kubectl as the user which generated
$HOME/.kube/config. We will use the host specified in the
[k8s_kubectl] group later.
As the group before
[k8s_ca] also specifiecs only
localhost. We’ll create some certificates in a later blog post. The generated certificates will be stored locally on your machine and copied to the Kubernetes hosts as needed by Ansible. That’s why we set the option
ansible_connection=local for this “host”.
[k8s_etcd] group contains the hosts which will run the etcd cluster you probably already guessed ;-) In our case that’s the same nodes as the nodes in the
[k8s_controller] group. For production you should place the etcd cluster on separate hosts as already mentioned.
The hosts in the
[k8s_controller] group runs at least three important Kubernetes components including kube-apiserver, kube-scheduler and kube-controller-manger.
We also define a group for our
[k8s_worker]. That’s the nodes which will run the Docker container and do the heavy work.
Finally if Ansible should execute tasks on all of our Kubernetes nodes we have a group called
[k8s:children] which contains our worker and controller node groups. If you run etcd cluster on separate nodes you should also add the
k8s_etcd group here. We’ll apply the roles
docker to all of our Kubernetes hosts. That’s what this group is used for.
I just want to mention that is is possible to use a dynamic inventory plugin for Ansible like Scaleway dynamic inventory for Ansible or scw_inventory. In this case you just need to tag your instances and the dynamic inventory plugin will discovery the hosts to use for a specific role/task.
To specify what Ansible should install/modify on the hosts we create a playbook file. You already saw it in part 1 in the directory structure and it’s called
k8s.yml. It basically contains the host group names and what role that host group (or host) has (e.g. like being a Kubernetes controller or worker node. For the start this file will just contain
;-) But this will change in a moment… For the impatient the file should look like this at the end (when we’re done with the whole tutorial):
--- hosts: k8s_ca roles: - role: githubixx.cfssl tags: role-cfssl - role: githubixx.kubernetes-ca tags: role-kubernetes-ca - hosts: k8s_kubectl roles: - role: githubixx.kubectl tags: role-kubectl - hosts: k8s:children roles: - role: githubixx.harden-linux tags: role-harden-linux - role: githubixx.peervpn tags: role-peervpn - role: githubixx.kubernetes-flanneld tags: role-kubernetes-flanneld - role: githubixx.docker tags: role-docker - hosts: k8s_etcd roles: - role: githubixx.etcd tags: role-etcd - hosts: k8s_controller roles: - role: githubixx.kubernetes-controller tags: role-kubernetes-controller - hosts: k8s_worker roles: - role: githubixx.kubernetes-worker tags: role-kubernetes-worker - hosts: consul_instances any_errors_fatal: true become: yes become_user: root roles: - role: brianshumate.consul tags: role-consul
I created a Ansible role for hardening a Ubuntu 16.04 installation (see ansible-role-harden-linux). It applies some basic security settings. It’s not perfect of course but it will secure our Kubernetes cluster quite a bit. Feel free to create a pull request if you want to contribute or fork your own version. If you install the role via
ansible-galaxy install githubixx.harden-linux then include the role into your playbook (k8s.yml) like in this example:
--- - hosts: k8s:children roles: - role: githubixx.harden-linux tags: role-harden-linux
In our example we’re specifying that Ansible should apply the role
githubixx.harden-linux to all our Kubernetes hosts (you really what to harden all the Kubernetes hosts ;-) ). Regarding the syntax I used above: Later when we have not only one role but a few more or during testing it is sometimes very handy to apply only one role at a time. That’s possible with the syntax above because if you only want apply the harden-linux role you can run
ansible-playbook --tags=role-harden-linux k8s.yml
This will only run the
harden-linux role on the specified hosts.
Additional Ansible hint: Sometimes you only want to apply a role to one specific host e.g. because you only want to test it there before rolling it out on all hosts. Another case could be that you want to upgrade node by node. That’s possible with e.g.
ansible-playbook --tags=role-harden-linux --limit=controller1.your-domain.tld k8s.yml
This works fine as long as you don’t need facts from other hosts. But for the etcd role e.g. you need facts from other hosts. For the etcd role we need to know the IPs of all hosts in the
[k8s_etcd] group to render the etcd systemd service file. But if you limit the run to one host Ansible won’t gather the facts of the other hosts and will fail. One possible workaround is to cache the facts of all hosts. For this to work you need to adjust a few settings in
/etc/ansible/ansible.cfg by default):
[defaults] gathering = smart fact_caching = jsonfile fact_caching_connection = /opt/scripts/ansible fact_caching_timeout = 86400
If you now run
ansible -m setup all
Ansible will gather facts (like network addresses, disk information, RAM, CPU, …) of all hosts. It will store a file (which is called like the host name) in the directory you specfied in
fact_caching_connection and cache the entries for
fact_caching_timeout seconds (in the example above 1 day). This is very useful and I recommend to use this workaround as it saves quite some pain especially while doing your first experimentes. I recommend to run
ansible -m setup all at least once a day or after you add a new host or did some major changes like changing IP address of a host. It’s also important to update the cache after you applied the
peervpn role for the first time. That’s because we add a new host interface which we make Ansible aware of.
One final hint: If you add a new host the default login is again
root and maybe a different SSH port (if you change via the
harden-linux role). You can specify specific SSH settings like port and login user in the Ansible
hosts file e.g.:
[k8s_worker] worker[1:3].your-domain.tld worker4.your-domain.tld ansible_port=22 ansible_user=root
As you can see worker4 uses different SSH port and user login settings. So you now can apply the
harden-linux role to the fresh and unmodified node worker4 and later remove this entry and extend the node range from
worker[1:4]... because worker4 has now the
harden-linux role applied and should behave like the older nodes.
If you start a new host at Scaleway you login as
root by default (and I guess that’s true for some other hoster too). That’s normally not considered a good practice and that’s one thing the role changes. In general the role has the following features:
- Change root password
- Add a regular (or call it deploy) user used for administration (e.g. for Ansible or login via SSH)
- Allow this regular user mentioned above executing commands via sudo
- Adjust APT update intervals
- Setup ufw firewall and allow only SSH access by default (add more ports/networks if you like)
- Adjust sysctl settings (/proc filesystem)
- Change SSH default port (if requested)
- Disable password authentication
- Disable root login
- Disable PermitTunnel
- Deletes /root/.pw file (contains root password on a Scaleway host but will have no impact on hosts of other provider - it just deletes the file if present)
- Install Sshguard and adjust whitelist
Ansible roles can be customized via variables. Let’s talk shortly about the variables you need to specify (some variables have no default values). Since we want to apply the
harden-linux role to all of our Kubernetes hosts we create a file in
group_vars directory e.g.
group_vars/k8s.yml (already mentioned above). Variables in
group_vars directory will be applied to a group of hosts. In our example that’s the host group
k8s. That’s the group we specified in the Ansible
hosts file above. As already explained above the
k8s group basically contains all our hosts. So if we define a variable in
group_vars/k8s.yml all our hosts can use this variables.
We start to fill this file now with the variables that have no defaults but needed for the
harden-linux role to work e.g.:
harden_linux_root_password: crypted_pw harden_linux_deploy_user: deploy harden_linux_deploy_user_password: crypted_pw harden_linux_deploy_user_home: /home/deploy harden_linux_deploy_user_public_keys: - /home/deploy/.ssh/id_rsa.pub
harden_linux_deploy_user_password we specify the password for the
root user and the
deploy user. Ansible won’t encrypt the password for you. To create a encrypted password you can do so e.g. with
python -c 'import crypt; print crypt.crypt("This is my Password", "$1$SomeSalt$")' (You may need
python2 instead of
python in case of Archlinux e.g.).
harden_linux_deploy_user specifies the user we want to use to login at the remote host. As already mentioned the
harden-linux role will disable root user login via SSH for a good reason. So we need a different user. This user will get “sudo” permission which we need for Ansible to do it’s work.
harden_linux_deploy_user_public_keys specifies a list of public SSH key files you want to add to
$HOME/.ssh/authorized_keys of the deploy user on the remote host. If you specify
/home/deploy/.ssh/id_rsa.pub e.g. as a argument the content of that
local file will be added to
.ssh/authorized_keys of the deploy user on the remote host.
The following variables have defaults. So only change if you need another value for the variable.
harden_linux_required_packages specifies the packages this playbook requires to work (so only add packages but don’t remove the ones already specified):
harden_linux_required_packages: - ufw - sshguard - unattended-upgrades
The role changes some SSHd settings by default:
harden_linux_sshd_settings: "^PasswordAuthentication": "PasswordAuthentication no" # Disable password authentication "^PermitRootLogin": "PermitRootLogin no" # Disable SSH root login "^PermitTunnel": "PermitTunnel no" # Disable tun(4) device forwarding "^Port ": "Port 22" # Set SSHd port
Personally I always change the default SSH port as lot of brute force attacks taking place against this port. So if you want to change the port setting for example you can do so:
harden_linux_sshd_settings_user: "^Port ": "Port 22222"
(Please notice the whitespace after “Port”!). The playbook will combine
harden_linux_sshd_settings_user while the settings in
harden_linux_sshd_settings_user have preference which means it will override the
^Port setting/key in
harden_linux_sshd_settings. As you may have noticed all the key’s in
harden_linux_sshd_settings_user begin with
^. That’s because it is a regular expression (regex). One of playbook task’s will search for a line in
^Port (while the
^ means “a line starting with …") and replaces the line (if found) with e.g
Port 22222. This way makes the playbook very flexible for adjusting settings in
sshd_config (you can basically replace every setting). You’ll see this pattern for other tasks too so everything mentioned here holds true in such cases.
Next we have some defaults for our firewall/iptables:
harden_linux_ufw_defaults: "^IPV6": 'IPV6=yes' "^DEFAULT_INPUT_POLICY": 'DEFAULT_INPUT_POLICY="DROP"' "^DEFAULT_OUTPUT_POLICY": 'DEFAULT_OUTPUT_POLICY="ACCEPT"' "^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="DROP"' "^DEFAULT_APPLICATION_POLICY": 'DEFAULT_APPLICATION_POLICY="SKIP"' "^MANAGE_BUILTINS": 'MANAGE_BUILTINS=no' "^IPT_SYSCTL": 'IPT_SYSCTL=/etc/ufw/sysctl.conf' "^IPT_MODULES": 'IPT_MODULES="nf_conntrack_ftp nf_nat_ftp nf_conntrack_netbios_ns"'
This settings are basically changing the values in
/etc/defaults/ufw. While this settings are good default settings we need to change one for Kubernetes networking to work:
DEFAULT_FORWARD_POLICY="ACCEPT". To override this default setting we add the following text to
harden_linux_ufw_defaults_user: "^DEFAULT_FORWARD_POLICY": 'DEFAULT_FORWARD_POLICY="ACCEPT"'
As already mentioned above this playbook will also combine
harden_linux_ufw_defaults_user while the settings in
harden_linux_ufw_defaults_user have preference which means it will override the
^DEFAULT_FORWARD_POLICY setting in
Next we can specify some firewall rules with
harden_linux_ufw_rules. This is the default:
harden_linux_ufw_rules: - rule: "allow" to_port: "22" protocol: "tcp"
So by default only SSH port is allowed. If you changed the SSH “Port” setting above to e.g. “22222” you need to add a firewall rule too to allow incoming traffic. Additionally we also add a firewall rule for PeerVPN which we’ll use in a later blog post:
harden_linux_ufw_rules: - rule: "allow" to_port: "22222" protocol: "tcp" - rule: "allow" to_port: "7000" protocol: "udp"
You can add more settings for a rule like
from_ip, … Please have a look at
tasks/main.yml (search for “Apply firewall rules”) for all possible settings.
You can also allow hosts to communicate on specific networks (without port restrictions). E.g. you should add the IP range you will use for PeerVPN here since that’s the range we’ll also use for Kubernetes services and they should be able to communicate without restrictions. Also add the range you will use later for Flannel overlay network (which is used for pod to pod communication here). The default IP range used in Flannel role is
10.200.0.0/16. If you use the range
10.3.0.0/24 for your PeerVPN network
harden_linux_ufw_allow_networks would have the following entries:
harden_linux_ufw_allow_networks: - "10.3.0.0/24" - "10.200.0.0/16"
If you want to avoid problems regarding the firewall rules blocking your Kubernetes traffic you can start with more relaxed settings and simply allow all three private IP ranges defined in RFC1918 e.g.:
harden_linux_ufw_allow_networks: - "10.0.0.0/8" - "172.16.0.0/12" - "192.168.0.0/16"
Next we change some system variables (sysctl.conf / proc filesystem). This settings are recommendations from Google which they use for their Google Compute Cloud OS images (see https://cloud.google.com/compute/docs/images/building-custom-os and https://cloud.google.com/compute/docs/images/configuring-imported-images). This are the default settings (if you are happy with this settings you don’t have to do anything but I recommend to verify if they work for your setup):
harden_linux_sysctl_settings: "net.ipv4.tcp_syncookies": 1 # Enable syn flood protection "net.ipv4.conf.all.accept_source_route": 0 # Ignore source-routed packets "net.ipv6.conf.all.accept_source_route": 0 # IPv6 - Ignore ICMP redirects "net.ipv4.conf.default.accept_source_route": 0 # Ignore source-routed packets "net.ipv6.conf.default.accept_source_route": 0 # IPv6 - Ignore source-routed packets "net.ipv4.conf.all.accept_redirects": 0 # Ignore ICMP redirects "net.ipv6.conf.all.accept_redirects": 0 # IPv6 - Ignore ICMP redirects "net.ipv4.conf.default.accept_redirects": 0 # Ignore ICMP redirects "net.ipv6.conf.default.accept_redirects": 0 # IPv6 - Ignore ICMP redirects "net.ipv4.conf.all.secure_redirects": 1 # Ignore ICMP redirects from non-GW hosts "net.ipv4.conf.default.secure_redirects": 1 # Ignore ICMP redirects from non-GW hosts "net.ipv4.ip_forward": 0 # Do not allow traffic between networks or act as a router "net.ipv6.conf.all.forwarding": 0 # IPv6 - Do not allow traffic between networks or act as a router "net.ipv4.conf.all.send_redirects": 0 # Don't allow traffic between networks or act as a router "net.ipv4.conf.default.send_redirects": 0 # Don't allow traffic between networks or act as a router "net.ipv4.conf.all.rp_filter": 1 # Reverse path filtering - IP spoofing protection "net.ipv4.conf.default.rp_filter": 1 # Reverse path filtering - IP spoofing protection "net.ipv4.icmp_echo_ignore_broadcasts": 1 # Ignore ICMP broadcasts to avoid participating in Smurf attacks "net.ipv4.icmp_ignore_bogus_error_responses": 1 # Ignore bad ICMP errors "net.ipv4.icmp_echo_ignore_all": 0 # Ignore bad ICMP errors "net.ipv4.conf.all.log_martians": 1 # Log spoofed, source-routed, and redirect packets "net.ipv4.conf.default.log_martians": 1 # Log spoofed, source-routed, and redirect packets "net.ipv4.tcp_rfc1337": 1 # Implement RFC 1337 fix "kernel.randomize_va_space": 2 # Randomize addresses of mmap base, heap, stack and VDSO page "fs.protected_hardlinks": 1 # Provide protection from ToCToU races "fs.protected_symlinks": 1 # Provide protection from ToCToU races "kernel.kptr_restrict": 1 # Make locating kernel addresses more difficult "kernel.perf_event_paranoid": 2 # Set perf only available to root
You can override every single setting. For Kubernetes we’ll override the following settings to allow packet forwarding which is needed for the pod network (again put it into
harden_linux_sysctl_settings_user: "net.ipv4.ip_forward": 1 "net.ipv6.conf.default.forwarding": 1 "net.ipv6.conf.all.forwarding": 1
One of the playbook tasks will combine
harden_linux_sysctl_settings_user while again
harden_linux_sysctl_settings_user settings have preference. Have a look at
defaults/main.yml file of the role for more information about the settings.
If you want UFW logging enabled set:
Possible values are
And finally we’ve the SSHGuard settings. SSHGuard protects from brute force attacks against ssh. To avoid locking out yourself for a while you can add IPs or IP ranges to a whitelist. By default it’s basically only “localhost”:
harden_linux_sshguard_whitelist: - "127.0.0.0/8" - "::1/128"
I recommend to additionaly add your PeerVPN and Flannel IP ranges here too at least. Also think about adding the IP of the host you administer the Kubernetes cluster and/or the IP of the host you run Ansible.
Now we can apply the role to the hosts:
ansible-playbook --tags=role-harden-linux k8s.yml
Afterwards we configure Ansible to use the user provided as value for the
harden_linux_deploy_user variable. That user will have sudo permissions to run all commands. If you add a new worker sometime in the future you should apply this role first (and only this role!) to the new host using. To limit the execution of the playbook to the new host execute
ansible-playbook --tags=role-harden-linux --limit=host.your-domain.tld k8s.yml (replace
host.your-domain.tld with the the actual hostname of course).
harden-linux role was applied we add a few more settings to
group_vars/k8s.yml. If you followed my recommendation to change the SSHd port to e.g. “22222” you also need to change the
ansible_port variable like:
ansible_user: deploy ansible_become: true ansible_port: 22222
Feel free to specify additional Ansible parameters here too of course.
ansible_user tells Ansible which user to use to login at the remote hosts. With
ansible_become Ansible will execute tasks with “sudo” by default. As you remember the
harden-linux role will give the deploy user
sudo permissions to execute tasks. I also changed the default SSH port in the
harden-linux role as already stated. This really reduces the amount of SSH login attacks a lot. This is of course security by obscurity but at least at the moment it makes still sense IMHO. But now I need to specify the port parameter all the time if I ssh’ing to the host you may object. Don’t fear. Just create a file
$HOME/.ssh/config and add e.g.:
Host *.subdomain.tld Port 22222 User deploy
Of course you need to replace
*.subdomain.tld with more useful values (e.g. .example.net if you own that domain ;-) ). Now you can use ssh as you did before and don’t need to worry about the SSH port anymore.
Now that we have secured our hosts head over to part 3 of the tutorial!