DX Application Performance Management

 View Only

Kubernetes: DNSSEC and SPLIT DNS setup

By Jörg Mertin posted Oct 11, 2021 04:07 AM

  

What happened

When you setup DNS in modern times, you will setup a DS (Domain Signature) record at your registrar telling all DNS Servers around the world that your DNS Servers will provide RSSIG-Records for each entry in your domain. Note that in case a correctly setup DNS Server requests the IP for a specific hostname, it will also validate this RSSIG. If the signature is valid, it will use the returned answer and propagate it. If the RRSIG does not exist, or is invalid, the DNS Server will assume a interception attempt (Man in the middle) and invalidate the entry returning a SRVFAIL to the requester.

A split DNS setup usually designates the setup of an environment, where 2 or more networks exist, and each LAN will provide its own FQDN resolution based on the requester source subnet. 

Imagine you have one Server that can be accessed from the Internet, and from the Intranet. In kubernetes, you will setup 2 ingress-nginx configurations, one listening to an IP 10.0.0.253 dedicated to LAN requests, and one listening to requests on an IP 192.168.0.253 dedicated to public (Internet) requests.
You however do not want to use different hostname to access the servier www.domain.tld.  So all will use www.domain.tld for accessing the service.

This means, that if a user in the intranet wants to go to the website www.domain.tld, the DNS Server will have to return: 10.0.0.253
And if a user wants to go the the WebSite from the DMZ, the DNS Server will have to return inside of the DMZ 192.168.0.253, and on the Internet the public IP assigned to that hostname as provided by the public DNS Servers.

Usually, this was not a problem with "old" DNS Setups, as all you needed to do was setup the appropriate DNS Server, configure the IP for the hostname, and done.

When using DNSSEC, if the DNS Server is not configured using the key to produce RRSIG's, the Public DNS Servers will return a SERVFAIL. And this is fairly easy to spot if you look for ther RRSIG.

The problem you will now face with kubernetes, is
  1. if a DS record has been configured for the domain
  2. if resources inside kubernetes are setup using FQDN's (ingress configuration)
  3. your internal DNS Server has not setup DNSSEC
all requests inside kubernetes for that resource fill fail with a DNS error: SERVFAIL.

The pain to troubleshoot that is that most workstations do not care about the DS record, hence will accept and show the IP upon request.

And that's where the troubleshooting will start to get interesting.

So in the end, if a DS record exists for a domain, each and every DNS Server processing resolution for that domain, will need to have DNSSEC configured with the key associated to the DS Record at the registrar - or else all queries to that server will fail.

Troubleshooting DNSSEC

First thing to check is if the registrar has a signed delegation record:
$ whois broadcom.com | grep signedDelegation
   DNSSEC: signedDelegation
​

If the signedDelegation record exists, DNSSEC is setup. Note that depending on the top level domain, registrars do have different ways 

To troubleshoot that in your lan, all you have to do is request the dnssec entry which the request.
$ dig A www.broadcom.com +dnssec

; <<>> DiG 9.16.1-Ubuntu <<>> A www.broadcom.com +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1865
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1220
; COOKIE: 045d59b07ec86cf18a8e9d016163edc7036708121effb18e (good)
;; QUESTION SECTION:
;www.broadcom.com.              IN      A

;; ANSWER SECTION:
www.broadcom.com.       3600    IN      CNAME   cdn.broadcom.com.
cdn.broadcom.com.       3600    IN      CNAME   www.broadcom.com.cdn.cloudflare.net.
www.broadcom.com.cdn.cloudflare.net. 68 IN A    104.18.5.158
www.broadcom.com.cdn.cloudflare.net. 68 IN A    104.18.4.158
www.broadcom.com.cdn.cloudflare.net. 68 IN RRSIG A 13 6 300 20211012085055 20211010065055 34505 cloudflare.net. +2tz8fLaPidzw/ditAU0j2KqOfwuwalLpcrO71MQt4eMq2xZGY9K6VN0 fY7Yf5Q0UbHJn5stWFkzhUxEB4Z53w==

;; Query time: 108 msec
;; SERVER: 192.19.189.10#53(192.19.189.10)
;; WHEN: Mon Oct 11 09:54:47 CEST 2021
;; MSG SIZE  rcvd: 282
​


The interesting entry here will be the RSSIG line. If that line is there, and on a well configured setup actually shows up, they you're good to go.
If the RSSIG line does not show up, chances are your local resolver does not check the RSSIG. However, as soon as you will use that domain inside Kubernetes or Openshift using coredns, all requests for that domain will return a SERVFAIL and your deployed services will not run.

Now do the same thing in Kubernetes following this guide:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

Once the environment setup, check out the DNS with:

kubectl exec -i -t dnsutils -- dig A www.broadcom.com +dnssec

which should return the a valid entry if DNSSEC is setup correctly.

0 comments
8 views

Permalink