Kubernetes DNS and TLS automation 🚀

Nicola Molinari
commercetools tech

--

NOTE: this article assumes that you have some basic knowledge with Kubernetes, DNS providers and TLS certificates.

In the recent years, the DevOps community grew a lot and created new tools and ways to accomplish cumbersome tasks like managing DNS records or issuance of TLS (previously SSL) certificates. Nowadays, doing infrastructure work is like coding, thanks to tools such as Terraform, Chef, Serverless, etc.

However, even with those tools, managing domains still requires a certain amount of boilerplate and management effort because it still remains a static configuration. What we want though, is something more dynamic that is easy to use and to manage.

Going declarative

How nice would it be to simply tell what you want to have and let the system care about how is done?

With Kubernetes, we use yaml files to describe the resources that we would like to have installed/deployed. The yaml syntax is traditionally very declarative and easy to read, therefore a perfect match to what we are aiming to achieve.

So how does this relate to “managing domains”?

Well, we can declare what domain we want to be created and that we want a valid TLS certificate for that. In Kubernetes world, we can do that by using annotations, to allow other services to hook into the created resources through the Kubernetes API.

This approach brings certain advantages:

Use case: branch deployments

To demonstrate how we can implement that and what we need to set up, we can imagine to deploy branches. This is a common use case when you are developing features that should be reviewed and tested by developers and non-developers.

PS: if you look around, there are already awesome solutions out there to deploy a branch after opening a PR and that integrate very well with e.g. GitHub. On top of my head, I’d really recommend Netlify and Now. If that’s good enough for your use-case, go ahead and use one of those! (or both) 😉

Spoiler: the article will explore things from a “branch deployment” point of view. However, the final result can be used for any kind of application.

Requirements for branch deployments

Our requirements are quite simple:

  • opt-in to deploy a branch after opening a PR (to reduce the number of used resources)
  • use the same tools and setup of our staging/production environment
  • have a unique “user-friendly” URL (e.g. pr-1234.example.com)
  • the URL must run on HTTPS

The first two points are covered by the CI setup. For example, on CircleCI you can use an “approval” step to trigger the deployment whenever you want to.

The challenge we are facing is about the last two points: having to dynamically manage a DNS domain and issue valid TLS certificates for running on HTTPS.

To solve those problems, we’re going to look at two specific tools that integrate very well with Kubernetes:

  • external-dns to manage DNS resources with DNS providers
  • cert-manager to manage TLS certificates with Certificate authorities

External DNS

This tool works like a bridge between your Kubernetes resources and your existing DNS providers (Google CloudDNS, AWS Route 53, etc.). It will use the Kubernetes API to retrieve metadata from Kubernetes resource annotations and perform actions based on that, such as create or delete a DNS record.

NOTE: we are going to install this tool in a Kubernetes namespace called external-dns using the related helm chart.

At commercetools we use Google Cloud, therefore we’re going to look at the integration with the Google CloudDNS.

Create a DNS zone

The first thing to do is to create a DNS zone where all the DNS entries will be created.

Google CloudDNS basic setup for example.com

Create a service account

Once you have the DNS zone set up, you need to define credentials to access your CloudDNS. For that we need to create a service account with the role roles/dns.admin.

Then, we need to generate a private key for that service account.

$ gcloud iam service-accounts keys create \
~/key.json \
--iam-account external-dns@<project-key>.iam.gserviceaccount.com

Create a Kubernetes secret

The private key of the service account should be stored in a Kubernetes secret, which can be safely referenced by the external-dns service.

$ kubectl create secret generic external-dns-credentials \
--from-file=key.json \
--namespace external-dns

Install the external DNS chart

The last step is to install the helm chart.

$ helm upgrade \
--install external-dns \
--namespace external-dns \
-f external-dns-values.yaml \
stable/external-dns

Where the external-dns-values.yaml contains the following configuration:

That’s it, the service now runs in the cluster and will start looking into Kubernetes metadata annotations to check which DNS entries it needs to create. 🎉

Referencing a DNS entry from an Ingress resource

With the external-dns running in the cluster, we can manage DNS entries within the Ingress of the service that you want to deploy.

In the example above, a new DNS entry foobar.example.com will be created in Google CloudDNS. When the Ingress resource will be deleted, external-dns will take care of removing the DNS entry.

Cert manager

This tool works also like a bridge between your Kubernetes resources and TLS certificate issuers (e.g. Let’s Encrypt). It will use the Kubernetes API to retrieve metadata from Kubernetes resource annotations and performs actions based on that, such as provisioning and validating TLS certificates for a given domain and automatically renewing expiring certificates.

NOTE: we are going to install this tool in a Kubernetes namespace called cert-manager using the related helm chart.

At commercetools we use decided to use Let’s Encrypt, therefore we’re going to look at the integration with their service.

High level overview of the cert-manager integration with Let’s Encrypt (source)

Certificate Authority Authorization (CAA record)

Since Let’s Encrypt will be going to issue certificates for a domain, it needs to be authorized to do so. Therefore, we need to define a CAA record in our CloudDNS to trust Let’s Encrypt (this assumes that you already have set up a DNS zone).

CAA record to authorize Let’s Encrypt

Note that you can optionally specify an email with iodef where you want to get error reports about the certificates.

Create a service account

The cert-manager needs to access the DNS zone as well in order to perform a so called ACME DNS-01 challenge (more on that later). Therefore, we need to create a new service account with the role roles/dns.admin.

Then, we need to generate a private key for that service account.

$ gcloud iam service-accounts keys create \
~/key.json \
--iam-account cert-manager@<project-key>.iam.gserviceaccount.com

Create a Kubernetes secret

The private key of the service account should be stored in a Kubernetes secret, which can be safely referenced by the external-dns service.

$ kubectl create secret generic cert-manager-credentials \
--from-file=key.json \
--namespace cert-manager

Install the cert manager chart

We can now proceed on installing the helm chart.

$ helm upgrade \
--install cert-manager \
--namespace cert-manager \
-f cert-manager-values.yaml \
stable/cert-manager

Where the cert-manager-values.yaml contains the following configuration:

Now the chart is installed but it’s not enough, as we still need to configure the actual “issuer”.

Configure the issuer

This is the crucial piece of the puzzle. An Issuer or ClusterIssuer represents a certificate authority from which signed x509 certificates can be obtained.

The difference between an Issuer and a ClusterIssuer is that the first only works within a cluster namespace whereas the latter works across all namespaces. Depending on the setup of your cluster, you can choose the one or the other.

In our case, we went with a ClusterIssuer to be able to use the issuer across our namespaces.

Let’s break down a couple of things here:

  • we define the ClusterIssuer resource kind to use the ACME protocol
  • the ACME server points to Let’s Encrypt production API, however there is also a staging environment that you can use for testing
  • the privateKeySecretRef is managed by cert-manager to store the private tls.key of the issuer account (account registration is done automatically, you simply need to define a valid email address)
  • the dns01 configuration contains a list of providers that can be used to solve DNS challenges. A challenge is like a procedure of proving the ownership of the domain between the CA and the DNS provider
  • the provider google-clouddns references the service account secret that we created beforehand

Referencing a TLS entry from an Ingress resource

With the cert-manager now up and running in the cluster, we can manage TLS certificates within the Ingress of the service that you want to deploy.

In the example above, a new Certificate for the hostname foobar.example.com will be created by Let’s Encrypt and stored within the namespace of the Ingress.

Note that it takes some minutes for the certificate to be issued and available.

Final setup

Overview of the Kubernetes cluster setup between the different components and the cloud services

Conclusions

Managing DNS and TLS does not have to be hard and painful. More importantly it should be very easy for every developer to do so, not only for the DevOps team. Using a combination of those tools we were able to abstract away all the management hassles of DNS and TLS and focus on getting our application (for branch deployments) up and running with a declarative configuration.

--

--

Software Engineer @commercetools, Dad, Technology Enthusiast. I ❤️ building things.