Re: [DNSOP] scanning doesn't scale, draft-thomassen-dnsop-generalized-dns-notify and draft-ietf-dnsop-dnssec-bootstrapping

Peter Thomassen <peter@desec.io> Mon, 16 October 2023 17:42 UTC

Message-ID: <aaa78045-c80e-4761-8826-ffde6a170550@desec.io>
Date: Mon, 16 Oct 2023 19:42:47 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: John R Levine <johnl@taugh.com>, dnsop@ietf.org
References: <20231013174834.0599E36EEB15@ary.local> <f2f317d1-dafa-4b81-85b8-24281d51f458@desec.io> <a62cdb42-4b5e-d035-8921-ddf0f823d939@taugh.com>
From: Peter Thomassen <peter@desec.io>
Cc: Nils Wisiol <nils@desec.io>
In-Reply-To: <a62cdb42-4b5e-d035-8921-ddf0f823d939@taugh.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/BVtwrRO5Ll7ONMmWSogOwYk18gk>
Subject: Re: [DNSOP] scanning doesn't scale, draft-thomassen-dnsop-generalized-dns-notify and draft-ietf-dnsop-dnssec-bootstrapping
Precedence: list

John,

On 10/16/23 18:19, John R Levine wrote:
> On Mon, 16 Oct 2023, Peter Thomassen wrote:
>> 3. the parent obtains a copy of a signaling zone and walks the signaling records published there (at _signal.$NS, such as _signal.jo.ns.cloudflare.com),
> 
> If you think about it for a moment,

I did :-)

> #3 doesn't work very well, since every parent needs to scan all the signalling zones for all the nameservers they might be using.

For bystanders' context, we're discussing Section 4.3 of draft-ietf-dnsop-dnssec-bootstrapping.

No parent needs to scan anything. They might want to do so, if they think it's feasible for them, or they can find other solutions that are better for them, such as outlined below. (Or they might of course not do DS automation at all.)

A DNS operator and a parent may make an arrangement and agree on some method for how the parent can learn about the set of domains under the parent that have new signaling records. This method may be that the parent AXFR's the operator's signaling zone, which is but one of many options. Such an approach seems reasonable for large DNS operators which have many domains changing every day; a parent certainly wouldn't make such arrangements with every DNS operator in the world.

We're getting into a quite interesting topic space here, but it is entirely out of scope for this document. The document is concerned with how the parent can authenticate CDS/CDNSKEY records *after learning about their existence*; it does not deal with how to learn about their existence (beyond the very high-level, illustrative list given in Section 4.3).

> Cloudflare hosts at least a million zones which I'm sure are in every public TLD, so that means we have a thousand TLDs and/or a thousand registrars all scanning Cloudflare's signalling zone which is not going to be small.  How's that going to scale?  For that matter, how's Cloudflare going to give the TSIG or whatever to the thousand scanners to let them do it?

There are a bunch of very specific assumptions in here, "to make it not work". Consider the following:

- DNS operators like Cloudflare could provide a copy of the contents under, say, se._signal.jo.ns.cloudflare.com to the .se registry. If done via a separate zone, there will be no pollution from other TLDs. [*]

- Alternatively, DNS operators can provide TLD-specific catalog zones, or CSV-like feeds, even blockchain. Actually, the method of obtaining the list doesn't matter for the argument in Section 4.3.

- Someone like Cloudflare could allowlist the registry's IP addresses, and nobody needs to set up TSIG.

- The parties can agree to restrict this to only keep entries younger than a certain time period (e.g. 1 day + buffer), and in turn process it regularly. This makes a very well-tailored list of bootstrapping requests.

- It is conceivable that the processing be done by the registry, not the registrar (as is the case for today's deployments), so there is only one ingesting party that needs to be allowlisted (per parent).

- Alternatively, if the registrar does the processing, some registrar endpoint signaling is needed (as is pondered in the NOTIFY draft). But surely this is not in scope of this document.

The main point here is that *none* of this has anything to do with what the parent does *after* the trigger (whichever one) has fired -- which is what the document describes.

Why talk about triggers at all in Section 4.3? -- The parent needs to consider that, depending on the trigger mechanism, the trigger context will or will not convey reliable knowledge of the delegation's NS RRset, so for some trigger types it will have to match against its local delegation database before proceeding. Pointing this out is relevant for the authentication process, and is the purpose of Section 4.3.

We've digressed quite a bit from your criticism, which was your cognitive dissonance that this document "says to scan for DS bootstrap", which it clearly does not. I hope the cognitive dissonance has been cleared up :-)

As for how to learn about which are appropriate triggers, this indeed seems to be a gap where additional standards development could help, and I share some (not all) of your opinions. Let's put it into the NOTIFY draft, or a more general one that discusses other triggers as well (e.g. the feed-like mechanism hinted at above, or something based on catalog zones). But please let's not do it here.

>> 2. the parent chooses to do a scan,
> 
> This is no better.  For CDS scanning the scan is a single query to a single known server only for zones that are already signed.  But for this, it has to scan all the unsigned zones, and since the NS can be anywhere, each scan is a full recursive lookup.  That's a lot of traffic to a lot of servers all over the net.

This does not seem to make sense to me: For a conventional CDS lookup, you have to query the child's nameserver(s), which you need to resolve before you query it. The signaling records are subdomains of these names, so it adds little in terms of resolution overhead (unless you construct a case where the signaling zone is delegated to random other NS -- but why would that be the case). -- So, commonly, most of the resolution path will already be cached from the resolution process performed to get the apex CDS query.

>>> I suggest adjusting the bootstrap draft saying to send NOTIFY(DS) to
>>> the parent of a delegated name to tell it to do the bootstrap rather
>>> than scanning.
>>
>> The draft is not at all concerned about how the child triggers the parent, but only with what the parent does *once the parent has determined* that it is going to attempt DS initialization.
> 
> Right.  We need to fix that gap now, when we can do it easily by updating the drafts in progress.  It's totally normal to have two drafts progressing that depend on each other.

The dependency is artificial. People already do have CDS/CDNSKEY automation for some TLDs, and .ch/.li & Cloudflare (amongst others) have implemented this authentication protocol on parent/child side.

This protocol does not care what trigger mechanism .ch and Cloudflare are using between each other. That's not to say that it shouldn't be standardized; it just shows that the claimed dependency does not exist.

It also doesn't introduce nor endorse scanning. What it does is fixing an authenticity problem for parties processing CDS/CDNSKEY records. I'm not sure why this fix should wait for anything.

As can be seen in this message, the matter of triggers is complicated, and it might take quite some time to arrive at a good system that has consensus. Let's have this important discussion, but let's not hold up things that stand on their own and have no technical dependency -- even more so, as there's running code, and also code running.

> I wouldn't forbid the other approaches but I would note that they are not likely to scale well to large zones like popular TLDs.

A note in that direction seems reasonable. How about adding the following paragraph after the bullet list in Section 4.3:

NEW
    The remainder of this section is concerned with how triggers may
    differ with respect to the context they provide, and in particular
    whether it is safe to infer a Child's set of Signaling Domains from
    the trigger context.  The triggers listed above are intended for
    illustration, and this specification does not recommend any
    particular one.  It is noted that concerns have been expressed
    regarding the scalability of parent-side scans.

Best,
Peter

[*]: This was discussed last year when the structure of the signaling names was figured out. It was noted that when se._signal.$NS is a separate zone, then it can't be a signaling name at the same time (because the semantics of the CDS/CDNSKEY record there would be ambiguous). That's why the _dsboot prefix was added.

-- 
https://desec.io/

[DNSOP] Call for Adoption: draft-thomassen-dnsop-… Suzanne Woolf
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Gavin Brown
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Andrew Newton
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Ralf Weber
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Brian Dickson
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… John Levine
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Suzanne Woolf
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Johan Stenstam
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Lars-Johan Liman
[DNSOP] draft-thomassen-dnsop-generalized-dns-not… John Levine
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… pmevzek@godaddy.com
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Peter Thomassen
Re: [DNSOP] Call for Adoption: draft-thomassen-dn… Ben Schwartz
Re: [DNSOP] draft-thomassen-dnsop-generalized-dns… Peter Thomassen
Re: [DNSOP] scanning doesn't scale, draft-thomass… John R Levine
Re: [DNSOP] scanning doesn't scale, draft-thomass… Peter Thomassen
Re: [DNSOP] draft-thomassen-dnsop-generalized-dns… Brian Dickson
Re: [DNSOP] draft-thomassen-dnsop-generalized-dns… John R Levine