Re: [DNSOP] Working Group Last Call for draft-ietf-dnsop-dnssec-validator-requirements

Peter Thomassen <peter@desec.io> Tue, 16 May 2023 17:08 UTC

Message-ID: <38e840f1-eb95-bac5-a230-0344e06a0f00@desec.io>
Date: Tue, 16 May 2023 19:08:06 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0
Content-Language: en-US
To: dnsop@ietf.org, Viktor Dukhovni <ietf-dane@dukhovni.org>
References: <CADyWQ+FwRaSdpSWXBDqCG9ZPNPiG4pGUx37PVtExbqVPr5ZfmA@mail.gmail.com> <ZF6rFlfM7LKObUnQ@straasha.imrryr.org>
From: Peter Thomassen <peter@desec.io>
In-Reply-To: <ZF6rFlfM7LKObUnQ@straasha.imrryr.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/dVcRSCxgLeKJ-su-uYTv5zPfziY>
Subject: Re: [DNSOP] Working Group Last Call for draft-ietf-dnsop-dnssec-validator-requirements
Precedence: list

On 5/12/23 23:09, Viktor Dukhovni wrote:
> Repost of my belated comments in the thread, apologies about not doing
> it right the first time...

Inspired by Viktor's comments, I spent some time to give the document a thorough review.

I'd like to support Viktor's comments on the dependent RRset TTL cap described in Section 9.

I feel that the recommendation there is potentially harmful while its benefit is unclear. As for the harm, it makes DS updates less flexible because it effectively pushes their TTL towards higher values (so that caches remain effective). While always-low DS TTLs are problematic, too, it doesn't seem like a sound concept for an auth's load to be essentially inversely proportional to the DS TTL when it is set to a low value temporarily.

As for the benefit, the objective appears to be "preponing" the removal of cached RRsets from their scheduled expiry to "as soon as they potentially would not longer validate", as indicated by upstream TTLs related to the trust chain. However, there's no need to do this based on TTLs alone: if one wants to pursue this (optional) objective, it is sufficient to revalidate once an *actual* change in the DNSKEY or DS set is detected. But even in the face of a sudden change in the trust relationship, it's not clear whether ignoring a signed (!) long TTL is beneficial, as that might harm stability and resilience during time periods of configuration errors, which the cache would otherwise help survive.

Second, I'm confused about the normative language in this informational document. (There are about 20 occurrences of MUST and about 40 of SHOULD/RECOMMENDED.)

Third, The document contains several inaccurate or contradictory statements. One example is related to Section 7.1.4, which says:
* DNS resolver MUST validate the TA before starting the DNSSEC
resolver, and a failure of TA validity check MUST prevent the
DNSSEC resolver to be started. Validation of the TA includes
coherence between out-out band values, values stored in the DNS as
well as corresponding DS RRsets.

The recommendation says that a resolver may not be started if it's trust anchors are incoherent with values obtained from the DNS.

My understanding is that the purpose of a trust anchor is to pin a trusted key for a name, in a self-contained fashion, without relying on its confirmation through some other channel (e.g. corresponding DS records). If a trust anchor is required to be coherent with values stored in the DNS, then the trust anchor doesn't appear to be needed in the first place.

It is also left open how the DRO should check "coherence between out-out band values, values stored in the DNS as well as corresponding DS RRsets" for their root trust anchors. There are no DS records, so you can check ... DNSKEY? Hm. Then, what exactly to check? Also, what about IANA's root-anchors.xml file (RFC 7958)? -- The problem here is that "values stored in the DNS" is underspecified, although one MUST comply with it.

What's more, Sections 7.1.2.1 has:
Besides deployments in
networks other than the global public Internet (hence a different
root), operators may want to configure other trust points.

Now, how would the above recommendation (enforce trust anchor coherence with DNS) be enforced in such a setting?

That said, I wrote up some of my (pencil-on-printout) comments from the remainder of the document; you can find them below.

Looking at my scribblings, large parts of the document seem to lack clarity (at least to me). The parts which are not unclear to me (few scribblings) are

- Section 1-4 (intro material and boilerplate)
- Section 6 (importance of correct wall time)
- Section 7.1 until 7.1.2 included (trust anchor intro)
- Section 13 (transport considerations)
- Section 14 (IANA considerations)

... with the missing sections containing the meat of the recommendations. As I find many of the them unclear, I'm not sure I support the document as is, simply because I have a hard time following what is says.

I'd like to emphasize that I appreciate the work and effort that went into the document. I just think that for it to be helpful guidance (and for the actual recommendations and arguments to be discussed), a lot of work on clarity is needed. My review is intended as constructive feedback, not as harsh criticism.

Best,
Peter

Section 5:
A DRO needs to be able to enable DNSSEC validation with sufficient
confidence they will not be held responsible in case their resolver
does not validate the DNSSEC response. The minimization of these
risks

This sounds like a managerial document from a business risk department. -- As the opening paragraph of the section laying out the justification for the different recommendations types, I wonder if this is a sufficiently stringent argument for justifying technical guidance.

In the same section, there are some occurrences of rather obscure language:
The
recommendations do not come with the same level of recommendations

Some recommendations may simply not be
provided by the operated software

I'm not sure what these things mean.

Similarly, in Section 6:
For all recommendations, it is strongly RECOMMENDED that
recommendations are supported by automated processes.

Section 6 also has:
* While operating, a DRO MUST closely monitor time derivations of
the resolvers and maintain the time synchronized.

s/derivations/deviations/

A point that's missing here is (how) to take into account the effects of time adjustments on stored TTLs.

Section 7 explains three types of trust-anchor-related recommendations, namely initial provisioning, updates, and reporting. It then says:
Note that TA update and TA reporting only concerns running resolvers.

It's unclear to me why this is written down here. It's prefectly clear that when nothing's running, then no validation is going on, so there's no reporting or updating of validation trust anchors.

This kind of "requirement fencing" is familiar to me from risk management documents, where the requirements author attempts to prevent a manager potentially not familiar with the topic to enforce certain requirements in contexts where they are not applicable.

I have no idea whether this is the case here, but I remain unconvinced of the need to say that validation-related things only apply to running resolvers. In fact, I find such statements distracting (and as such an anti-pattern), as they make me think "what's this, did I miss something? is there an edge case?".

Section 7.1.2 has:
Although some bootstrapping mechanisms to securely retrieve publish
[RFC7958] and retrieve [UNBOUND-ANCHOR] the Root Zone Trust Anchor
have been defined, it is believed these mechanisms should be extended
to other KSKs or Trust Anchors.

believed by whom?

Another example of fuzzy language is in 7.1.2.1, which says:
For validators that may be used on the global public Internet (with
"may be" referring to general purpose, general release code),
handling the IANA managed root zone KSK trust anchor is a
consideration.

It's a thing to get right (and not a consideration).

Section 7.1.3:
The generation of a configuration file associated to the TA is
expected to be implementation independent. The necessity of tweaking
the data [...]

In general, TA configuration does not require generation of a configuration file. (An implementation might just a well take them from something like /etc/resolver/trust-anchors.d/, with each file therein containing DS-type records, and the the domain somehow encoded in the filename.)

It's not clear what "tweaking the data" means (neither which data, nor in what way they are tweaked).

This suggests that the author of this text has a specific context in mind, from which the line of argument descends (similarly to the managerial framing in other sections).

Section 7.2:
This includes for a DRO the ability to
check which TA are in used as well as to resolve in collaboration of
authoritative servers and report the used TAs.

I am not sure what this means. Resolve what -- DNS queries, or trust anchor issues? Something else?

Section 7.2.1:
Trust is inherently a matter of an operations policy. As such, a DRO
will need to be able to update the list of Trust Anchors. TA updates
are not expected to be handled manually. This introduces a
potentially huge vector for configuration errors

Probably the opposite is meant? (No manual handling --> less configuration errors?)

Instead DRO will rely on "Automated Updates to DNSSEC
Trust Anchors" [RFC5011]

Well, perhaps; implementation is not mandatory.

The two SHOULD recommendations (check TA publisher commitment to RFC 5011, and enable RFC 5011 automatic updates) in that section are phrased as independent, but they are not. (There's no point to the first recommendation when the second is not conditional on the outcome of the first.)

The first paragraph of Section 7.2.2 says:
A DRO SHOULD regularly check the trust anchor used by the DNSSEC
resolver is up-to-date and that values used by the resolvers are
conform to the ones in the configuration

I find this quite fuzzy. Does this mean that the software should detect configuration changes and reload the trust anchor?

In any case, this section is about "regular checks", but its first explicit recommendation is for "STARTUP", which seems inconsistent.

In the case of a key roll
over, the resolver is moving from an old value to an up-to date
value. This up-to-date value does not need to survive reboot, and
there is no need to update the configuration file of the running
instances - configuration is updated by a separate process. To put
it in other words, the updated value of the TA is only expected to be
stored in the resolver's memory. Avoiding the configuration file to
be updated prevents old configuration file to survive to writing
error on read-only file systems.

I'm not convinced. Rollover from a very old trust anchor to a new one may not be possible indefinitely, like when you reboot three years later.

Also, booting with a trust anchor that was broken long ago is insecure, as an attacker might exploit that by subsequently forging the rollover. It seems more prudent to write rollovers to permanent storage, at least when the algorithm or key size is changed. Not doing so is effectively trusting the old key indefinitely, against better knowledge.

The recommendation there says:
* DRO SHOULD enable "Signaling Trust Anchor Knowledge in DNS
Security Extensions (DNSSEC)" [RFC8145] to provide visibility to
the TA used by the resolver. The TA can be queried using a DNSKEY
query.

This is not about querying the TA.

Note also that [RFC8145] does not only concern Trust Anchor but is
instead generic to DNSKEY RRsets. As a result, unless for the root
zone, it is not possible to determine if the KSK/ZSK or DS is a Trust
Anchor or a KSK/ZSK obtained from regular DNSSEC resolutions.

DROs (who are the subject of the document) can easily determine whether a key or DS has been obtained from a trust anchor or from regular resolution is easily possible: just look at whether a trust anchor is configured for the name, or whether a DS query was issued.

Transferring the note from my print-out, I realize that perhaps what was meant is that the recipient of an RFC 8145 signal cannot tell whether the signaled key is a trust anchor. I did not realize that the first few times I read it.

A failed key roll over or any other abnormal situation MUST trigger
an alarm.

What does "alarm" mean in this context? (It's underspecified but mandatory.)

If the mismatch is due to a failed key roll-over, this SHOULD be
considered as a bug by the DRO. The DRO MUST restart the resolver
with updated TA.

Why should it be considered a bug? It may just be a misconfiguration.

The situation here is after a failed rollover. Restart with what updated TA? Is the intention here that the DRO handles this manually? (That is discouraged in other parts of the document.)

* A DRO SHOULD be able to check the status of a TA as defined in
Section 3 of [RFC7583].

I can't find anything like that in this section. (It deals with key rollover timing, not with trust anchor checks.)

Section 8:
The intent of this section is to position these
guidelines toward the operational recommendations provided in this
document.

This is not technical advice. It sounds like an internal compliance document. Who is the audience of this document?

* DRO SHOULD set automated procedures to determine the NTA of DNSSEC
resolvers.

What does that mean?

A failure in signaling validation is associated to a mismatch between
the key and the signature.

What signaling?

A validation mismatch is not necessarily between key and signature, it may also be between data and signature.

In addition, DRO are likely to
have specific communication channel with TA maintainer which eases
trouble shooting.

Why should that be so / what's the basis for the likelihood statement?

A signature validation failure is either an attack or a failure in
the signing operation on the authoritative servers.

Or something else, like a misconfiguration of a DS record, or a validation bug, or ...

The last recommendation in this section is MAY (which admits either way), although it is labeled a "recommendation" (which implies a preference for doing it).

Section 9:
the DNSSEC validator performs a DNSSEC query to
the authoritative server that returns the RRset signed with the new
KSK / ZSK. The DNSSEC validator may not be able to retrieve the new
KSK / ZSK

Why should it be the case that the resolver can query some RRset, but not the DNSKEY RRset?

This either results in a bogus resolution or in an
invalid signature check.

What's the difference?

Note that by comparing the Key Tag Fields,
the DNSSEC validator is able to notice the new KSK / ZSK used for
signing differs from the one used to generate the received generated
signature.

The key tags may be the same even when the key differs.

However, the DNSSEC validator is not expected to retrieve
the new ZSK / KSK, as such behavior could be used by an attacker.

I am confused what this could mean.

Note also that even though the data may
not be associated to the KSK / ZSK that has been used to validate the
data, the link between the KSK / ZSK and the data is still stored in
the cache using the RRSIG.

This seems highly implementation-dependent.

All of the comments so far on Section 9 relate to two paragraphs, which I don't think are necessary for what follows. Instead of fixing the inaccuracies, it may be better to just drop them.

Further down in the section, the text mentions "TTL associated with FQDNs", which is not accurate as a name can have several RRsets with different TTLs.

Apart from that, I disagree with the recommendation in this section (see beginning of this message).

Section 10.1:
A DRO MAY regularly report the Trust Anchor used to the authoritative
server. This would at least provide insight to the authoritative
server and provide some context before moving a key roll over
further.

The question is what the authoritative should do with this information, if lots of trust anchors are report that have not been updated.

That's probably out of scope for this document, but nevertheless an immediate question: Should the rollover process shall be stalled? That would open up a trivial path to block the rollover. If not, what then? -- Perhaps it's better to not get into this and drop the last half sentence.

Section 10.2
Similarly, a DRO may be informed by other channel a rogue
or unwilling DNSKEY has been emitted.

What's an unwilling DNSKEY?

* A DRO MUST be able to flush the cached data subtree associated to
a DNSKEY

It seems to me that at the MUST level, it's sufficient to flush the cache as a whole.

Section 11:
* A DRO SHOULD regularly request and monitor the signature scheme
supported by an authoritative server.

What does that mean?

* A DRO SHOULD report a "Unsupported DNSKEY Algorithm" as defined in
[RFC8914] when a deprecated algorithm is used for validation.

Is this meant for rcode 0 responses?

One inconvenient to such strategy i sthat it does not let one DRO to
take advantage of more recent cryptographic.

Why?

Section 12:
12. Invalid Reporting Recommendations

This section title seems confusing.

An invalid response may be the result of an attack or a
misconfiguration, and the DNSSEC validator may play an important role
in sharing this information with the authoritative server or domain
name owner.

I'm not sure I agree with this. It's probably not a good idea if all validating resolvers start contacting a specific domain owner.

Section 13:
RUNTIME: * DRO SHOULD regularly discover MTU

I'm no expert here; does this really need regular checks, or is there a value that's generally considered safe? If regular checking is done, how frequently would be reasonable?

Section 15:
Providing inappropriate information can lead to misconfiguring the
DNSSEC validator, and thus disrupting the DNSSEC resolution service.

Not sure what "providing inappropriate information" means here.

RRSet that were
cached require a DNSSEC resolution over the Internet

when queried.

An attacker may ask the DNSSEC validator to consider a rogue KSK/ZSK,
thus hijacking the DNS zone.

How so?

An attacker (cf. Section 7) can advertise a "known insecure" KSK or
ZSK is "back to secure"

How so?

--
https://desec.io/

[DNSOP] Working Group Last Call for draft-ietf-dn… Tim Wicinski
Re: [DNSOP] Working Group Last Call for draft-iet… Brian Dickson
Re: [DNSOP] Working Group Last Call for draft-iet… Tim Wicinski
Re: [DNSOP] Working Group Last Call for draft-iet… Brian Dickson
Re: [DNSOP] Working Group Last Call for draft-iet… Vladimír Čunát
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Vladimír Čunát
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Florian Obser
Re: [DNSOP] Working Group Last Call for draft-iet… Vladimír Čunát
Re: [DNSOP] Working Group Last Call for draft-iet… Peter Thomassen
Re: [DNSOP] Working Group Last Call for draft-iet… Peter Thomassen
Re: [DNSOP] Working Group Last Call for draft-iet… Vladimír Čunát
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Vladimír Čunát
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Tim Wicinski
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Tim Wicinski
Re: [DNSOP] Working Group Last Call for draft-iet… Livingood, Jason
Re: [DNSOP] Working Group Last Call for draft-iet… Viktor Dukhovni
Re: [DNSOP] Working Group Last Call for draft-iet… Peter Thomassen
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Peter Thomassen
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault
Re: [DNSOP] Working Group Last Call for draft-iet… Peter Thomassen
Re: [DNSOP] Working Group Last Call for draft-iet… Daniel Migault