How is my DNSSEC enabled domain still serving a tiny number of NXDOMAIN response codes?

Question

TL;DR

The lack of NXDOMAIN responses for Cloudflare hosted domains is a consequence of their specific DNSSEC implementation (using so called “black lies”) and not a design of the DNSSEC protocol itself; hence observations will be different with other providers doing DNSSEC.

Initial questions

How are NXDOMAIN responses still possible?

Why wouldn’t they be possible? DNSSEC or not, if you query for a name that doesn’t exist, you get NXDOMAIN reply back.

my understanding is that DNSSEC should, at least in theory, eliminate this response code entirely

Why? And from where do you get that feeling?

Live example with a DNSSEC enabled domain

icann.org is DNSSEC enabled right now. If I query for a name that does not exist under it, I get a NXDOMAIN:

$ dig NS icann.org +short
b.icann-servers.net.
c.icann-servers.net.
ns.icann.org.
a.icann-servers.net.

$ dig @a.icann-servers.net does-not-exist-foobar.icann.org

; <<>> DiG 9.18.4 <<>> @a.icann-servers.net does-not-exist-foobar.icann.org
; (1 server found)
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38891
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 98228e9e0c5ef4e6
;; QUESTION SECTION:
;does-not-exist-foobar.icann.org. IN A

;; QUERY SIZE: 72

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38891
                                       ^^^^^^^^

DNSSEC is an extension of DNS in the sense that for a non validating resolver, answers are not different, even if the domain is DNSSEC enabled. So all return codes work in the same way.

Explanations about NSEC/NSEC3/RRSIG

What it does change, that you can see if adding +dnssec to dig (which doesn’t mean “activate DNSSEC” but means “display DNSSEC related records – those are RRSIG, NSEC and NSEC3 – as they are normally not displayed), is that the AUTHORITY section in case of the NXDOMAIN gives further explanations with NSEC or NSEC3 records:

;; AUTHORITY SECTION:
icann.org.      1h IN SOA sns.dns.icann.org. noc.dns.icann.org. (
                2022070670 ; serial
                10800      ; refresh (3 hours)
                3600       ; retry (1 hour)
                1209600    ; expire (2 weeks)
                3600       ; minimum (1 hour)
                )
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                J9HKD4G746GMUTGGUV6AM37GSJAD6NRR
                A NS SOA MX TXT AAAA RRSIG DNSKEY NSEC3PARAM )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                TE4FB4PVMU1GQNPG9P01ID48U1BTN2G4
                A RRSIG )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                LT4I8S7OTQ7ACOSF73M7LHCIC7C1J17I
                A RRSIG )
icann.org.      1h IN RRSIG SOA 7 2 3600 (
                20220804192816 20220714153322 3425 icann.org.
                NMcD1TeozFyCRDlmqFMoM/V/VmWQUmRNIH0/igPzdj2S
                hemnQHeXDOudBxsUgE/DpSV4KHsgqLQKdgbQruqCO7Dt
                iLK1bCLBZs38LdOadyJs3jWjjuJ9+mEnLXTsqMeeMllw
                YFL6pPyo1TfChZm05KJ+DJNw0SHJw3MWBRtV4iI= )
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724054620 20220703065347 58935 icann.org.
                gmo0VP8k9Li9lutMA3uTrMfABMmFBN23GonYo72Twk9l
                wGYqFvlU/naN0KKtEd3g+zOiYB0Jb1J1270Dveew/vYa
                hTmeMYrwUbEt9gZYCvi74zm6Ss0cQ8uxJ5bZw70nZ7oU
                LAtWYVGJMgupfjtne6021AJoLNB1CaMhFwo+TPo= )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724101659 20220703045347 58935 icann.org.
                hGsUeE4di9yFuDMq8ly1YQEs1OvOFAHVctOQrs6Poixl
                STqcErjC20V2CI0YApX6SbiI8AP/dqMjBm3fZh91mtDf
                aSrZypfScBEO/KVdlqbW9G+y8VR65ryjTAA7TZIzqN+z
                7YyTAESWb8E7T4NCtQPPwYpjl/S9krbEGSiKfaw= )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724151521 20220703105347 58935 icann.org.
                P9qwkFoGkCd+m3aDQkzF/g7SJfn/byt6d4zugLzRKuH1
                rLmYZdlJNOC+fI1saCZySarsP9KavFSBzw6S9GMLobQJ
                hTVpu1ZUkEP9BMOZo28eeRLrGvAbrVb7aB9CWl9TgUMc
                2+s4nG87HTvD2TCJHmyPC1mIbBLYmJoa7iGLGiI= )

NSEC3 is more complicated (less human friendly) as it uses hashes of domain names. But what all the above means in summary is that the name I requested does not exists because it lands between two names that exist (but can’t be seen immediately, because hashed), and that no wildcard exists (which is why you have three NSEC3 records). The RRSIG records sign the NSEC3 ones, so all the above allows a resolving nameserver to indeed double check the NXDOMAIN is legit and not introduced by some on-path attacker, because all the NSEC3 and RRSIG records match the expectations.

Simpler example with NSEC case

Let us take a domain DNSSEC enabled with NSEC instead of NSEC3: the root itself 🙂

If I do dig @g.root-servers.net foobar. +dnssec right now I get NXDOMAIN, again for the same reasons as above and that TLD does not exist (yet?)

But let us look in the results and especially one NSEC record:

foo.            1d IN NSEC food. NS DS RRSIG NSEC

This is an affirmative signed (there is a corresponding RRSIG record) assertion from the nameserver telling me that foobar does not exist in zone, because both foo and food exists, but nothing in between. And per DNSSEC ordering rules foobar would sort between foo and food and hence the above proves that foobar does not exist. Incidentally it proves that a lots of other names do not exist, and some resolver could cache this NSEC and derives answer without requesting anything.

Why? Because if I know that nothing exists between foo and food I immediately know that fooa doesn’t exist, nor fooa42 or foobie or fooccc or similar…

Back to CloudFlare specific case

CloudFlare implements “DNSSEC White Lies” AND “Black Lies”, see https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/ and https://blog.cloudflare.com/black-lies/ for their own various reasons (in part because they do dynamic signatures generation, they generate the RRSIG records at the moment the request come, and not in advance; this is a compromise, both cases have advantages and drawbacks).

What does that mean? They fake existence of ALL names, hence there is almost never an NXDOMAIN.

Let us see one example:

$ dig dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec

; <<>> DiG 9.18.4 <<>> dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: fd8d36048320c848
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.    IN A

;; QUERY SIZE: 87

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.    IN A

;; AUTHORITY SECTION:
cloudflare.com.     5m IN SOA ns3.cloudflare.com. dns.cloudflare.com. (
                2282614227 ; serial
                10000      ; refresh (2 hours 46 minutes 40 seconds)
                2400       ; retry (40 minutes)
                604800     ; expire (1 week)
                300        ; minimum (5 minutes)
                )
dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. 5m IN NSEC \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. RRSIG NSEC

(I removed the RRSIG records).

So what does that tell? First: NOERROR and not NXDOMAIN instead, so the resolver tells me the name I query for exists (but maybe not for the type I asked, A which is default dig type, and this is valid and known as NODATA which means NOERROR but no content either, no ANSWER section, as it happens when the name exists, but not that type).

The AUTHORITY part and specifically that NSEC record tells me that there are no names between dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. (the name I asked for in fact, so not the previous one, just mine), and \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. which may look like a strange name but 1) is totally valid (it is not a valid hostname because \000 means byte value 0 which has to be encoded as \000 for DNS operations, but still a valid domain names, as domain names in the DNS specifications can be any arbitrary bytes) and 2) is, with DNSSEC ordering algorithm, the name “right after” my name (so basically the range of the two names do not include any other name in between).

The RRSIG NSEC part at the end of the NSEC record means that there are no record type A on the name but there are record types RRSIG and NSEC, which makes sense because I am exactly looking at the NSEC record of that name, and as we are in DNSSEC land, of course there is an RRSIG.

So this is called a “lie” because the nameserver is replying to you: this name exists, but not this record type. And no matter which record type you ask for (except NSEC and RRSIG) the nameserver will tell you: “this name does not exist for this record type”.
At the end, if it does not exist for any record type (besides NSEC and RRSIG) it is really as if it (the name) does not exist at all, but it is just presented in a different way for reasons quickly detailed below.

I recommend reading the second link but the gist of it explaining things is (I am skipping the whole points regarding NSEC/NSEC3 and wildcard records, with all the details on “closest encounter” and so on, but those are important if going deep on NSEC stuff):

NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible.

(which is why they don’t use NSEC3 and keep NSEC but then still need another solution to avoid walking the zone and hence enumerating all names)

There are two problems with negative answers:

The first is that the authoritative server needs to return the >previous and next name. As you’ll see, this is computationally >expensive for CloudFlare, and as you’ve already seen, it can leak >information about a zone.

The second is that negative answers require two NSEC records and >their two subsequent signatures (or three NSEC3 records and three >NSEC3 signatures) to authenticate the nonexistence of one name. >This means that answers are bigger than they need to be.

So that part above is the basic explanation of why wanting to avoid using NXDOMAIN and “emulating” it with success (NOERROR) but at the same time responding negatively to any query (name+type for any type requested).

The other point, again very specific to CloudFlare, is that it is difficult in their case to compute the “next” name (because NSEC is really giving a “range” of two names, as a link between two things existing), so instead of using the real next name as existing in their storage, they compute the mimimal “next” one following the DNSSEC algorithm, hence the strange name above with \000. as prefix, a name that obviously don’t exist either, so if you query for it you will get again the same kind of reply, but this time with an NSEC record listing on right \001. or \000.\000. in fact, etc. and so on…

Further down:

For an NXDOMAIN, we always return \000.(the missing name) as the next name, and because we return an NSEC directly on the missing name, we do not have to return an additional NSEC for the wildcard. This way we only have to return SOA, SOA RRSIG, NSEC and NSEC RRSIG, and we do not need to search the database or precompute dynamic answers.

The goal reached with all that is smaller replies. And this is important in DNS land, because of various problems around fragmentation. From their example they go from 1096 bytes to just 357 bytes with black lies, cutting almost 2/3, quite an accomplishment!

All the above may become a “standard” in the future, for those wanting to do the same, as they wrote a document that can become maybe an IETF RFC one day: https://datatracker.ietf.org/doc/html/draft-valsorda-dnsop-black-lies

Do note it has consequences though:

NXDOMAIN is an important signal: various other stuff is built on top of that, see RFC 8020 “NXDOMAIN: There Really Is Nothing Underneath” and RFC 8198 “Aggressive Use of DNSSEC-Validated Cache”, so not having this signal anymore can have side effects (and it wouldn’t be a good idea to change other recursive resolvers to try finding out if the authoritative side is using black lies and then consider them, that would be brittle; that point is exactly discussed in the draft above)
it also impacts ENT or “Empty Non Terminal”, where a name has to exist in the DNS tree not because it has any type attached to it, but just because there are names below it; see https://www.ietf.org/archive/id/draft-huque-dnsop-blacklies-ent-01.html for more details on that topic
no implementation is free of bugs, and DNSSEC is complicated, and tricks around DNSSEC are even more so complicated; now I am not sure anymore and I can’t find references, but I think there was a bug in the beginning, and the returned types (in the NSEC bitmap) were not computed correctly, hence breaking some stuff. Will try to update this if I do find back what I am thinking I have seen, but I could be delusional (easy to be with DNSSEC…); in fact I think it is related to the observation that all their initial examples did put far more types in NSEC last section, where now they put only RRSIG and NSEC. See https://indico.dns-oarc.net/event/40/contributions/899/attachments/862/1563/nsec-bitmaps.pdf for live examples of errors in NSEC bitmaps and their consequences

Ah no in fact I remembered right, a bug in this NSEC bitmap is right at the source of a recent Slack outage :-), but it was not on Cloudflare fault, it was AWS Route53 where the problem was. See https://www.potaroo.net/ispcol/2021-12/oarc36.pdf for those details, but in short:

Now you can lie with NSEC records, [..] But what a server should never do
is return an empty bit-vector in the NSEC record. Because some resolvers, including Google’s Public
DNS service interpret an empty NSEC bit-vector as claiming that there are no resource records at all for
that domain name. This is not a Google DNS bug. It’s a perfectly legitimate interpretation of the
DNSSEC specification. The problem that Slack encountered was that the Route 53 server was returning
a NSEC response with an almost empty RR-type bit-vector when the wildcard entry was used to form
the response and the query type was not defined for the wildcard resource. This was a bug in the Route
53 implementation.

So, in short, lying does have bad consequences some times 🙂
(and/or: DNSSEC is complicated, and wildcards in the DNS do create all sorts of complications too; in fact DNSSEC + wildcards + CNAME records are like 3 sure signs of apocalypse somehow…).

This is only ONE way to do things, the consequences (almost no NXDOMAIN responses) are absolutely not a consequence of the protocol (DNSSEC) but just of their implementation. So don’t take this as granted at all, it will be different with other providers. But does it really change anything for you as owner of the zone or users of it? Not so much. Why were you so worried about NXDOMAIN responses 🙂 ?

PS:

for a theoretical paper on DNSSEC lies: https://casey.byu.edu/papers/2019_pam_dnssec_lies.pdf
for a presentation summarizing things (among others): https://www.slideshare.net/apnic/signing-dnssec-answers-on-the-fly-at-the-edge-challenges-and-solutions

Initial questions

Live example with a DNSSEC enabled domain

Explanations about NSEC/NSEC3/RRSIG

Simpler example with NSEC case

Back to CloudFlare specific case

Leave a Comment Cancel reply