[DNSOP] Feedback on draft-ietf-dnsop-domain-verification-techniques-01

Erik Nygren <erik+ietf@nygren.org> Thu, 30 March 2023 20:53 UTC

MIME-Version: 1.0
From: Erik Nygren <erik+ietf@nygren.org>
Date: Thu, 30 Mar 2023 16:52:43 -0400
Message-ID: <CAKC-DJiNXLzOXBnZ6XCzuqO=YpVbtFGPvVdXhJ8tuhSaJadkFw@mail.gmail.com>
To: dnsop WG <dnsop@ietf.org>, Shumon Huque <shuque@gmail.com>, paul.wouters@aiven.io, shivankaulsahib@gmail.com
Cc: Erik Nygren <nygren@akamai.com>
Content-Type: multipart/alternative; boundary="000000000000d8112105f824468d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/BAawUY5zAEppSkzAMenuGeV-ImU>
Subject: [DNSOP] Feedback on draft-ietf-dnsop-domain-verification-techniques-01
Precedence: list

Hello,

Thank you for pulling together this draft!  Having worked on related
systems a number of times it will be valuable to have something here
standardized.

A number of comments and suggestions:

1) APEX domains, and hostnames vs domains

You define APEX but don't then reference this.  This is an important topic
to cover in considerably more detail, however.  In particular, some systems
want to validate an apex domain while others want to validate each
particular hostname.  It is critical that validation record and its
contents are unambiguous as to which of these is the case.

As an example, ACME has separate mechanisms for wildcard certs (eg, "*.
example.com") vs individual names (eg, "bar.example.com").
This is likely to apply across the board to these systems: sometimes they
want to validate usage for a domain and sometimes for just specific names.

For the individual hostname case, it is important to clarify that the
challenge should be "_foo-challenge.bar.example.com".

For the whole domain case this could be  "_
foo-wildcard-challenge.example.com"  or have an attribute in the TXT token
(eg, wildcard=true).    ACME (rfc8555 section-8.4) doesn't seem to have
this differentiation, which seems unfortunate, unless I'm misreading.
I'd think that it should be unambiguous to domain admins whether a
challenge is for just the "example.com" name, for "*.example.com", or for "
example.com, *.example.com, *.*.example,com, etc".

What it means to validate a hostname or a domain or a wildcard set of
hostnames may vary widely per application, and we may want to talk more
about the security considerations here.

Ambiguities about whether a given verification token grants powers over a
specific hostname or an entire domain also introduce security challenges
that we may wish to talk about in Security Considerations.  DNS domain
administrators need to be able to understand the consequences of adding in
particular challenge entries into their domain, especially in cases like a
multi-tenant Enterprise environment.

2) Public suffixes

We may wish to encourage (or require) validating against Public Suffix
lists (eg, https://publicsuffix.org/), in the absence of a more general
DBOUND solution.  At a minimum we should discuss this in security
considerations.

One Security Consideration is that services operating a public suffix
should take extreme care about when they allow underscore labels to be
created within a shared domain.  As an example, if a service provider
allows "_foo-challenge.publicsuffix.example" to be registered as a domain
(for a DNS registrar) or to be created as a CNAME or TXT record (eg, for a
dynamic DNS provider or cloud provider) then this might grant unintended
powers over all of "publicsuffix.example".

We may also want to (encourage? require?) confirming that a user isn't
trying to place a validation token on a public suffix.  ACME has this as a
"CA Policy Consideration" (Section 10.5 of rfc8555).  There are some
legitimate use-cases here, but caution (and perhaps extra validation?) is
needed.

(For the Appendix, another example would be the PSL itself.  Per
https://github.com/publicsuffix/list/wiki/Guidelines
It uses "_psl.alphaexample.com TXT
https://github.com/publicsuffix/list/pull/100" for validation.)

3) SaaS/Paas/intermediary provider cases (eg, CDNs)

A common use-case is for delegation of control over to an intermediate.
For example, indicating that a SaaS provider or CDN may manage certificates
for "foo.example.com".  One way to handle this is to have CNAME the
challenge to that intermediary and then the intermediary returns the TXT
record.  For example, you might have:

_acme-challenge.foo.example.com. IN CNAME
${TOKENA}.intermediate-provider.example.
${TOKENA}.intermediate-provider.example.  IN TXT  ${TOKENB}

This allows .intermediate-provider.example to keep updating TOKENB
for each renewal.  (It's not reasonable for the intermediate provider to
tell their customer
to go back and require updating _acme-challenge.foo.example.com every three
months.)

This is often going to be done alongside delegating the hostname
to the intermediate provider.  For example, there will likely also be a
CNAME of:

   foo.example.com. IN CNAME foo-example-com.cdn.example.

The separate CNAMEs (ie, these being distinct labels) are important because
the certificate and validation needs to happen before actually moving the
hostname over.

This is a case where the CNAME for "_acme-challenge.foo.example.com"
generally needs to be persistent
for frequent/periodic renewals.

Of critical importance is that TOKENA is also secure and has enough entropy
and is tied to the particular customer account that provisioned
foo.example.com.

In the draft we probably want to talk about this as cases where there is a
CNAME to the TXT record,
and that the target of the CNAME needs to itself always have a token with
adequate cryptographic entropy.

We might mention in A.1.4 on Time-bound checking that cert renewals are a
case where
persistence is required, at least of a CNAME to a provider who may be
managing the renewals.

3a) Leveraging ACME challenges for other purposes

A related question worth considering:  when is it acceptable to leverage
ACME challenge for other purposes?  For example, if moving a domain onto a
CDN that is going to get a certificate for the domain prior to the
migration but which also wants to validate that it is authorized for the
domain to be transferred to it, when can the ACME challenge also be
leveraged for both purposes?
I'm not sure we need to go into this, but perhaps it should be discussed.

4) Multi-provider / multi-CDN setups

A related and messier corner-case are multi-provider / multi-CDN setups.
For example, "foo.example.com" may CNAME to one of three different CDNs.
Each one of these needs to be able to manage a certificate and renew it
every three months.
This likely applies to some of these other cases as well.  I don't have
good answers --- ACME doesn't
handle this terribly well today --- but it is worth some thought as to how
to handle.

5) Token format / construction

It seems like the actual token contents should have more flexibility.
I don't think we want a "MUST" on that particular construct.  It may be
worth
a MUST that there is at least 128 bits of secure entropy, and that the
token is
either base64 or hex encoded.  But there may be a need to use other
constructs in the future (eg, not SHA256).   Giving the current example
as a MAY seems reasonable.

There may be reasons for other constructs that embed state within the token.
For example:  "HMAC-SHA256(private_key, label+account+domain)" may be
appropriate in some cases,
although has enough security considerations that I'm not sure we want to
include that.

6) Binding tokens to requests

We should have a note on the critical importance of binding the token to
the requesting account and to the requested name.
At a minimum this should be in Security Considerations, but it may also
wish to be normative.
Usage here typically follows a flow of:
  a) user/account requests a token for a given $name from a service provider
  b) user/account has their DNS admin put their token in for
_challenge.$name
  c) service validates that _challenge.$name has $token and then grants
access to the user/account

There are chains of custody here and linkages that need to happen, and are
exploitable if they break down.
For example, if steps (a) and (c) aren't explicitly linked then a different
user on a different account
could potentially jump in at step c and grab access.  There may be other
corner cases here,
and it may be worth some more detailed formal analysis to be able to
express what properties are critical
for safety.

There may be a related Security Consideration that I'm not sure how to
handle where a MitM style attack could jump in before step (a).  For
example, if the user is phished into talking to a different service
provider than they thought they were talking to.  (I'm not sure this needs
to be discussed, but is a risk.)

7) TTL recommendations

We should provide some TTL recommendations for the TXT record, and perhaps
also provide
a warning on long SOA (negative caching) TTLs.

This seems like a case where we'd want to recommend using short TTLs on the
TXT record
to allow recovering from misconfigurations.  These shouldn't be polled
frequently so cachability
is unlikely to be an issue, but if there's a typo and the TTL is long then
there may not be a way
to recover since the validator may have the bad entry cached for the TTL.

A long SOA TTL (ie, negative caching TTL) could also cause issues.
Once the service provider issues the challenge the validator may start
polling for its presence.
The first attempts are likely to get an NXDOMAIN, and if the NXDOMAIN is
cached too long
this could cause user confusion and/or delay the validation.
(I'm not sure it's reasonable to suggest that validators bound the maximum
NXDOMAIN caching time?)

8) Policy constraints as a variant

Within ACME, challenge tokens exist as only one part of the validation
process.
They act as an explicit "allow this particular name to be issued this
particular cert based on a CSR".
There is also another safeguard, however, which is the CAA record.  That
acts as a policy-based constraint.

As we are generalizing the challenges, it may be worth considering
generalizing the policy-based constraints.
For example, in an enterprise environment "example.com" may wish to limit
the use of _foo-challenge
under their domain so that bar.quux.example.com can't put in "_
foo-challenge.bar.quux.example.com".
(More concretely, example.com may wish to limit the CDNs and/or SaaS
providers that can be used
within their domain.)

This is almost certainly substantial scope creep for this draft, but
without it domain admins
may be unable to apply policies or manage the sort of risks managed with
CAA records.

This might be as simple as allowing the definition of _
foo-constraint.example.com as a TXT record,
with whoever defines _foo-challenge also defining the format of
_foo-constraint.
As part of validating _foo-challenge.bar.quux.example.com, validators
should look for
_foo-constraint.example.com and _foo-constraint.quux.example.com and _
foo-constraint.bar.quux.example.com
and implementing their constraints when present.

9) Registry of labels?

I hate to ask it, but is there a need for a registry of _foo-challenge
labels?
It seems like there could be potential security and operational risks
of multiple entities starting to use "_foo-challenge" for unrelated
purposes.

10) Security review

Given that domain verification is often used as part of security systems,
it seems like it would be worth getting some additional security review,
such as bringing this to SAAG?

Thanks again for working on this much needed draft!

      Erik

[DNSOP] Feedback on draft-ietf-dnsop-domain-verif… Erik Nygren
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Shumon Huque
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Tim Wicinski
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Tim Wicinski
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Shumon Huque