Kubernetes DNS and TLS automation 🚀
NOTE: this article assumes that you have some basic knowledge with Kubernetes, DNS providers and TLS certificates.
In the recent years, the DevOps community grew a lot and created new tools and ways to accomplish cumbersome tasks like managing DNS records or issuance of TLS (previously SSL) certificates. Nowadays, doing infrastructure work is like coding, thanks to tools such as Terraform, Chef, Serverless, etc.
However, even with those tools, managing domains still requires a certain amount of boilerplate and management effort because it still remains a static configuration. What we want though, is something more dynamic that is easy to use and to manage.
Going declarative
How nice would it be to simply tell what you want to have and let the system care about how is done?
With Kubernetes, we use yaml
files to describe the resources that we would like to have installed/deployed. The yaml
syntax is traditionally very declarative and easy to read, therefore a perfect match to what we are aiming to achieve.
So how does this relate to “managing domains”?
Well, we can declare what domain we want to be created and that we want a valid TLS certificate for that. In Kubernetes world, we can do that by using annotations, to allow other services to hook into the created resources through the Kubernetes API.
This approach brings certain advantages:
- ✂️ almost zero management hassle (= happier Engineers 🎉)
- 📜 declarative configuration
- 🚒 automatic renewal of expiring TLS certificates (e.g. Let’s Encrypt issues certificates with short lifetime of 90 days, which means improved security)
Use case: branch deployments
To demonstrate how we can implement that and what we need to set up, we can imagine to deploy branches. This is a common use case when you are developing features that should be reviewed and tested by developers and non-developers.
PS: if you look around, there are already awesome solutions out there to deploy a branch after opening a PR and that integrate very well with e.g. GitHub. On top of my head, I’d really recommend Netlify and Now. If that’s good enough for your use-case, go ahead and use one of those! (or both) 😉
Spoiler: the article will explore things from a “branch deployment” point of view. However, the final result can be used for any kind of application.
Requirements for branch deployments
Our requirements are quite simple:
- opt-in to deploy a branch after opening a PR (to reduce the number of used resources)
- use the same tools and setup of our staging/production environment
- have a unique “user-friendly” URL (e.g.
pr-1234.example.com
) - the URL must run on HTTPS
The first two points are covered by the CI setup. For example, on CircleCI you can use an “approval” step to trigger the deployment whenever you want to.
The challenge we are facing is about the last two points: having to dynamically manage a DNS domain and issue valid TLS certificates for running on HTTPS.
To solve those problems, we’re going to look at two specific tools that integrate very well with Kubernetes:
external-dns
to manage DNS resources with DNS providerscert-manager
to manage TLS certificates with Certificate authorities
External DNS
This tool works like a bridge between your Kubernetes resources and your existing DNS providers (Google CloudDNS, AWS Route 53, etc.). It will use the Kubernetes API to retrieve metadata from Kubernetes resource annotations and perform actions based on that, such as create or delete a DNS record.
NOTE: we are going to install this tool in a Kubernetes namespace called external-dns
using the related helm chart.
At commercetools we use Google Cloud, therefore we’re going to look at the integration with the Google CloudDNS.
Create a DNS zone
The first thing to do is to create a DNS zone where all the DNS entries will be created.
Create a service account
Once you have the DNS zone set up, you need to define credentials to access your CloudDNS. For that we need to create a service account with the role roles/dns.admin
.
Then, we need to generate a private key for that service account.
$ gcloud iam service-accounts keys create \
~/key.json \
--iam-account external-dns@<project-key>.iam.gserviceaccount.com
Create a Kubernetes secret
The private key of the service account should be stored in a Kubernetes secret, which can be safely referenced by the external-dns
service.
$ kubectl create secret generic external-dns-credentials \
--from-file=key.json \
--namespace external-dns
Install the external DNS chart
The last step is to install the helm chart.
$ helm upgrade \
--install external-dns \
--namespace external-dns \
-f external-dns-values.yaml \
stable/external-dns
Where the external-dns-values.yaml
contains the following configuration:
That’s it, the service now runs in the cluster and will start looking into Kubernetes metadata annotations to check which DNS entries it needs to create. 🎉
Referencing a DNS entry from an Ingress resource
With the external-dns
running in the cluster, we can manage DNS entries within the Ingress
of the service that you want to deploy.
In the example above, a new DNS entry foobar.example.com
will be created in Google CloudDNS. When the Ingress
resource will be deleted, external-dns
will take care of removing the DNS entry.
Cert manager
This tool works also like a bridge between your Kubernetes resources and TLS certificate issuers (e.g. Let’s Encrypt). It will use the Kubernetes API to retrieve metadata from Kubernetes resource annotations and performs actions based on that, such as provisioning and validating TLS certificates for a given domain and automatically renewing expiring certificates.
NOTE: we are going to install this tool in a Kubernetes namespace called cert-manager
using the related helm chart.
At commercetools we use decided to use Let’s Encrypt, therefore we’re going to look at the integration with their service.
Certificate Authority Authorization (CAA record)
Since Let’s Encrypt will be going to issue certificates for a domain, it needs to be authorized to do so. Therefore, we need to define a CAA
record in our CloudDNS to trust Let’s Encrypt (this assumes that you already have set up a DNS zone).
Note that you can optionally specify an email with iodef
where you want to get error reports about the certificates.
Create a service account
The cert-manager
needs to access the DNS zone as well in order to perform a so called ACME DNS-01
challenge (more on that later). Therefore, we need to create a new service account with the role roles/dns.admin
.
Then, we need to generate a private key for that service account.
$ gcloud iam service-accounts keys create \
~/key.json \
--iam-account cert-manager@<project-key>.iam.gserviceaccount.com
Create a Kubernetes secret
The private key of the service account should be stored in a Kubernetes secret, which can be safely referenced by the external-dns
service.
$ kubectl create secret generic cert-manager-credentials \
--from-file=key.json \
--namespace cert-manager
Install the cert manager chart
We can now proceed on installing the helm chart.
$ helm upgrade \
--install cert-manager \
--namespace cert-manager \
-f cert-manager-values.yaml \
stable/cert-manager
Where the cert-manager-values.yaml
contains the following configuration:
Now the chart is installed but it’s not enough, as we still need to configure the actual “issuer”.
Configure the issuer
This is the crucial piece of the puzzle. An Issuer
or ClusterIssuer
represents a certificate authority from which signed x509 certificates can be obtained.
The difference between an Issuer
and a ClusterIssuer
is that the first only works within a cluster namespace whereas the latter works across all namespaces. Depending on the setup of your cluster, you can choose the one or the other.
In our case, we went with a ClusterIssuer
to be able to use the issuer across our namespaces.
Let’s break down a couple of things here:
- we define the
ClusterIssuer
resource kind to use theACME
protocol - the
ACME
server points to Let’s Encrypt production API, however there is also a staging environment that you can use for testing - the
privateKeySecretRef
is managed bycert-manager
to store the privatetls.key
of the issuer account (account registration is done automatically, you simply need to define a validemail
address) - the
dns01
configuration contains a list of providers that can be used to solve DNS challenges. A challenge is like a procedure of proving the ownership of the domain between the CA and the DNS provider - the provider
google-clouddns
references the service account secret that we created beforehand
Referencing a TLS entry from an Ingress resource
With the cert-manager
now up and running in the cluster, we can manage TLS certificates within the Ingress
of the service that you want to deploy.
In the example above, a new Certificate for the hostname foobar.example.com
will be created by Let’s Encrypt and stored within the namespace of the Ingress
.
Note that it takes some minutes for the certificate to be issued and available.
Final setup
Conclusions
Managing DNS and TLS does not have to be hard and painful. More importantly it should be very easy for every developer to do so, not only for the DevOps team. Using a combination of those tools we were able to abstract away all the management hassles of DNS and TLS and focus on getting our application (for branch deployments) up and running with a declarative configuration.