Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-verification-techniques-01

Shumon Huque <shuque@gmail.com> Thu, 06 July 2023 02:10 UTC

MIME-Version: 1.0
References: <CAKC-DJiNXLzOXBnZ6XCzuqO=YpVbtFGPvVdXhJ8tuhSaJadkFw@mail.gmail.com>
In-Reply-To: <CAKC-DJiNXLzOXBnZ6XCzuqO=YpVbtFGPvVdXhJ8tuhSaJadkFw@mail.gmail.com>
From: Shumon Huque <shuque@gmail.com>
Date: Wed, 05 Jul 2023 22:10:32 -0400
Message-ID: <CAHPuVdVC0MzN6N8hXXp4JacNmkeM=-aMt6DRtZ+4aiN6qiUv4A@mail.gmail.com>
To: Erik Nygren <erik+ietf@nygren.org>
Cc: dnsop WG <dnsop@ietf.org>, paul.wouters@aiven.io, shivankaulsahib@gmail.com, Erik Nygren <nygren@akamai.com>
Content-Type: multipart/alternative; boundary="0000000000000b52b105ffc80640"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/8lF86tKf-_sx1lpsRdd3et3uaps>
Subject: Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-verification-techniques-01
Precedence: list

On Thu, Mar 30, 2023 at 4:52 PM Erik Nygren <erik+ietf@nygren.org> wrote:

> Hello,
>
> Thank you for pulling together this draft!  Having worked on related
> systems a number of times it will be valuable to have something here
> standardized.
>

Thanks for the review Erik, sorry for our very late response, and thanks to
Tim for prodding us. IETF is around the corner, and it's time to unbury
myself from the day job!

First, I wanted to let the working group know that we've renamed the title
(and topic) of the draft to: "Domain Control Validation using DNS", DCV
being the widely used and more commonly known industry term for this. (In
the github repo, we'll push out the latest version before the draft cutoff).

> A number of comments and suggestions:
>
> 1) APEX domains, and hostnames vs domains
>
> You define APEX but don't then reference this.  This is an important topic
> to cover in considerably more detail, however.  In particular, some systems
> want to validate an apex domain while others want to validate each
> particular hostname.  It is critical that validation record and its
> contents are unambiguous as to which of these is the case.
>
> As an example, ACME has separate mechanisms for wildcard certs (eg, "*.
> example.com") vs individual names (eg, "bar.example.com").
> This is likely to apply across the board to these systems: sometimes they
> want to validate usage for a domain and sometimes for just specific names.
>
> For the individual hostname case, it is important to clarify that the
> challenge should be "_foo-challenge.bar.example.com".
>
> For the whole domain case this could be  "_
> foo-wildcard-challenge.example.com"  or have an attribute in the TXT
> token (eg, wildcard=true).    ACME (rfc8555 section-8.4) doesn't seem to
> have this differentiation, which seems unfortunate, unless I'm misreading.
> I'd think that it should be unambiguous to domain admins whether a
> challenge is for just the "example.com" name, for "*.example.com", or for
> "example.com, *.example.com, *.*.example,com, etc".
>

Yes, indeed. The authors discussed 'scope of validation' during the initial
drafting of this document and the inclusion of apex in the definitions
section was likely related to that, but the topic apparently was forgotten
along the way somewhere.

I agree that this distinction is very important, and I have to deal with it
in my day job. We frequently need to seek clarification from the requester
of a DNS validation record about whether it applies to a single domain name
or the entire domain rooted at the name, before deciding whether the
request should be granted.

Certificate issuance could benefit from the addition of a wildcard naming
convention or attribute as you suggest. If we propose that, I think it
would be best to get alignment from the ACME folks first.

The more general case of an application service (like Atlassian, Docusign,
Google Workspace, etc) being authorized to provide a service on behalf of a
domain owner is harder to judge without reviewing more details of the
service. But in my experience it is almost alway for the entire domain
rooted at the name.

What it means to validate a hostname or a domain or a wildcard set of
> hostnames may vary widely per application, and we may want to talk more
> about the security considerations here.
>
> Ambiguities about whether a given verification token grants powers over a
> specific hostname or an entire domain also introduce security challenges
> that we may wish to talk about in Security Considerations.  DNS domain
> administrators need to be able to understand the consequences of adding in
> particular challenge entries into their domain, especially in cases like a
> multi-tenant Enterprise environment.
>

Yes, agreed.

> 2) Public suffixes
>
> We may wish to encourage (or require) validating against Public Suffix
> lists (eg, https://publicsuffix.org/), in the absence of a more general
> DBOUND solution.  At a minimum we should discuss this in security
> considerations.
>
> One Security Consideration is that services operating a public suffix
> should take extreme care about when they allow underscore labels to be
> created within a shared domain.  As an example, if a service provider
> allows "_foo-challenge.publicsuffix.example" to be registered as a domain
> (for a DNS registrar) or to be created as a CNAME or TXT record (eg, for a
> dynamic DNS provider or cloud provider) then this might grant unintended
> powers over all of "publicsuffix.example".
>
> We may also want to (encourage? require?) confirming that a user isn't
> trying to place a validation token on a public suffix.  ACME has this as a
> "CA Policy Consideration" (Section 10.5 of rfc8555).  There are some
> legitimate use-cases here, but caution (and perhaps extra validation?) is
> needed.
>
> (For the Appendix, another example would be the PSL itself.  Per
> https://github.com/publicsuffix/list/wiki/Guidelines
> It uses "_psl.alphaexample.com TXT
> https://github.com/publicsuffix/list/pull/100" for validation.)
>

Yes. Application providers should make sure a prospective domain name being
validated is not a public suffix (as long as that application isn't the PSL
itself apparently!).

3) SaaS/Paas/intermediary provider cases (eg, CDNs)
>
> A common use-case is for delegation of control over to an intermediate.
> For example, indicating that a SaaS provider or CDN may manage certificates
> for "foo.example.com".  One way to handle this is to have CNAME the
> challenge to that intermediary and then the intermediary returns the TXT
> record.  For example, you might have:
>
> _acme-challenge.foo.example.com. IN CNAME
> ${TOKENA}.intermediate-provider.example.
> ${TOKENA}.intermediate-provider.example.  IN TXT  ${TOKENB}
>
> This allows .intermediate-provider.example to keep updating TOKENB
> for each renewal.  (It's not reasonable for the intermediate provider to
> tell their customer
> to go back and require updating _acme-challenge.foo.example.com every
> three months.)
>

Yes, we've encountered a few examples of this in the field already, and the
latest draft includes text about this usage.

> This is often going to be done alongside delegating the hostname
> to the intermediate provider.  For example, there will likely also be a
> CNAME of:
>
>    foo.example.com. IN CNAME foo-example-com.cdn.example.
>
> The separate CNAMEs (ie, these being distinct labels) are important
> because
> the certificate and validation needs to happen before actually moving the
> hostname over.
>

Yes. (Although foo.example.com could be an HTTPS record in the new world).

> This is a case where the CNAME for "_acme-challenge.foo.example.com"
> generally needs to be persistent
> for frequent/periodic renewals.
>
>
Yes.

> 3a) Leveraging ACME challenges for other purposes
>
> A related question worth considering:  when is it acceptable to leverage
> ACME challenge for other purposes?  For example, if moving a domain onto a
> CDN that is going to get a certificate for the domain prior to the
> migration but which also wants to validate that it is authorized for the
> domain to be transferred to it, when can the ACME challenge also be
> leveraged for both purposes?
> I'm not sure we need to go into this, but perhaps it should be discussed.
>

If we have something concrete to say, sure. Are there any common solutions
to this scenario from the ACME world already? The delegated CNAME->TXT
indirection model may not work here, since the CNAME is a singleton record
that can only point to one target (or "one target per response" if using
querier specific response "tricks").

> 4) Multi-provider / multi-CDN setups
>
> A related and messier corner-case are multi-provider / multi-CDN setups.
> For example, "foo.example.com" may CNAME to one of three different CDNs.
> Each one of these needs to be able to manage a certificate and renew it
> every three months.
> This likely applies to some of these other cases as well.  I don't have
> good answers --- ACME doesn't
> handle this terribly well today --- but it is worth some thought as to how
> to handle.
>

Can you elaborate on how such setups work? Is the DNS provider here
producing querier specific responses, where some responses provide a CNAME
to CDN1, others to CDN2, etc?

I guess we'd have to think about the emerging world of HTTPS/SVCB directed
multi-CDN too.

5) Token format / construction
>
> It seems like the actual token contents should have more flexibility.
> I don't think we want a "MUST" on that particular construct.  It may be
> worth
> a MUST that there is at least 128 bits of secure entropy, and that the
> token is
> either base64 or hex encoded.  But there may be a need to use other
> constructs in the future (eg, not SHA256).   Giving the current example
> as a MAY seems reasonable.
>
> There may be reasons for other constructs that embed state within the
> token.
> For example:  "HMAC-SHA256(private_key, label+account+domain)" may be
> appropriate in some cases,
> although has enough security considerations that I'm not sure we want to
> include that.
>

I agree on having such flexibility to future proof the draft. The minimum
bits of entropy point is already addressed in the github draft. We should
relax the restrictions on any specific hash algorithm, but can cite SHA256
as an example. The current text mentions base64, but I agree we should not
disallow hex or other potential encodings. Also, since we propose that the
TXT RDATA can contain key=value attributes (token=..., expire=...) I wonder
if it's safer not to use an alphabet that contains the '=' character,
although since that's only a padding character, maybe that is unambiguous
and won't cause any problems.

6) Binding tokens to requests
>
> We should have a note on the critical importance of binding the token to
> the requesting account and to the requested name.
> At a minimum this should be in Security Considerations, but it may also
> wish to be normative.
> Usage here typically follows a flow of:
>   a) user/account requests a token for a given $name from a service
> provider
>   b) user/account has their DNS admin put their token in for
> _challenge.$name
>   c) service validates that _challenge.$name has $token and then grants
> access to the user/account
>
> There are chains of custody here and linkages that need to happen, and are
> exploitable if they break down.
> For example, if steps (a) and (c) aren't explicitly linked then a
> different user on a different account
> could potentially jump in at step c and grab access.  There may be other
> corner cases here,
> and it may be worth some more detailed formal analysis to be able to
> express what properties are critical
> for safety.
>
> There may be a related Security Consideration that I'm not sure how to
> handle where a MitM style attack could jump in before step (a).  For
> example, if the user is phished into talking to a different service
> provider than they thought they were talking to.  (I'm not sure this needs
> to be discussed, but is a risk.)
>

Such sequence of steps and associated account linkages are implicit in all
of these schemes, but this may be an important enough subject to call out
in its own normative section. The precise details will surely vary from
provider to provider, and I suspect that's where we may find some corner
cases worth elaborating on.

7) TTL recommendations
>
> We should provide some TTL recommendations for the TXT record, and perhaps
> also provide
> a warning on long SOA (negative caching) TTLs.
>
> This seems like a case where we'd want to recommend using short TTLs on
> the TXT record
> to allow recovering from misconfigurations.  These shouldn't be polled
> frequently so cachability
> is unlikely to be an issue, but if there's a typo and the TTL is long then
> there may not be a way
> to recover since the validator may have the bad entry cached for the TTL.
>
> A long SOA TTL (ie, negative caching TTL) could also cause issues.
> Once the service provider issues the challenge the validator may start
> polling for its presence.
> The first attempts are likely to get an NXDOMAIN, and if the NXDOMAIN is
> cached too long
> this could cause user confusion and/or delay the validation.
> (I'm not sure it's reasonable to suggest that validators bound the maximum
> NXDOMAIN caching time?)
>

Yes, we should say something about TTLs. Recommending a (relatively) short
TTL on the validation record itself sounds reasonable for error recovery.

I'm not sure about the SOA negative TTL. That's a zone wide property, and
often chosen deliberately by zone operators for specific reasons (tradeoffs
between visibility time of new names, query load and cost, etc). I'm not
sure we, for example, would adjust that down just to cater to domain
control validation requests. Similar considerations apply to the max
negative cache setting (if any) deployed by resolver operators.

It may be better to advise application providers attempting DCV to not
immediately query for challenge records and wait for a signal from the
domain owner that they are ready. In fact, I think that's how many of these
systems already work today.

8) Policy constraints as a variant
>
> Within ACME, challenge tokens exist as only one part of the validation
> process.
> They act as an explicit "allow this particular name to be issued this
> particular cert based on a CSR".
> There is also another safeguard, however, which is the CAA record.  That
> acts as a policy-based constraint.
>
> As we are generalizing the challenges, it may be worth considering
> generalizing the policy-based constraints.
> For example, in an enterprise environment "example.com" may wish to limit
> the use of _foo-challenge
> under their domain so that bar.quux.example.com can't put in "_
> foo-challenge.bar.quux.example.com".
> (More concretely, example.com may wish to limit the CDNs and/or SaaS
> providers that can be used
> within their domain.)
>
> This is almost certainly substantial scope creep for this draft, but
> without it domain admins
> may be unable to apply policies or manage the sort of risks managed with
> CAA records.
>
> This might be as simple as allowing the definition of _
> foo-constraint.example.com as a TXT record,
> with whoever defines _foo-challenge also defining the format of
> _foo-constraint.
> As part of validating _foo-challenge.bar.quux.example.com, validators
> should look for
> _foo-constraint.example.com and _foo-constraint.quux.example.com and _
> foo-constraint.bar.quux.example.com
> and implementing their constraints when present.
>

I understand the rationale for expressing such policies, but yes, that will
be a substantial increase in the scope of this draft. So, I'd recommend
deferring such work until we have a sufficient number of folks interested
in this.

Just from my own experience, despite working in a large enterprise
environment, I don't think we presently have a need for such a feature.
When we've delegated subzones deeper down the hierarchy, we've also granted
the requester autonomous operation of that part of the infrastructure. All
uses of 3rd party services (if they choose to use them) must be approved by
means of a security review anyway (which involves human security engineers
in the loop assessing many details of the nature of the application, not
just a policy that says which CDNs for example are allowed, so automating
such a policy alone doesn't help us very much).

9) Registry of labels?
>
> I hate to ask it, but is there a need for a registry of _foo-challenge
> labels?
> It seems like there could be potential security and operational risks
> of multiple entities starting to use "_foo-challenge" for unrelated
> purposes.
>

We asked the same question in the early days of this draft, and I believe
there was a strong feeling that it would be very difficult to manage such a
registry due to the potentially arbitrary space of names of the services
(maybe a first come first serve registry might work?).

> 10) Security review
>
> Given that domain verification is often used as part of security systems,
> it seems like it would be worth getting some additional security review,
> such as bringing this to SAAG?
>

Ben Kaduk has done a review from secdir already, and our (Security AD)
co-author has already responded. I have a few minor additional comments on
Ben's review, which I will follow-up with shortly.

More reviews would be welcome, and I would be fine bringing this to SAAG.
Let's wait til we push out an updated
revision incorporating recent feedback first.

Shumon.

[DNSOP] Feedback on draft-ietf-dnsop-domain-verif… Erik Nygren
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Shumon Huque
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Tim Wicinski
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Tim Wicinski
Re: [DNSOP] Feedback on draft-ietf-dnsop-domain-v… Shumon Huque