Re: [DNSOP] Questions / concerns with draft-ietf-dnsop-svcb-https (in RFC Editor queue)

Brian Dickson <brian.peter.dickson@gmail.com> Tue, 23 August 2022 21:51 UTC

Return-Path: <brian.peter.dickson@gmail.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D2721C1522B7; Tue, 23 Aug 2022 14:51:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lR89iViS9wxQ; Tue, 23 Aug 2022 14:51:46 -0700 (PDT)
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 05B58C14CE24; Tue, 23 Aug 2022 14:51:45 -0700 (PDT)
Received: by mail-pj1-x1031.google.com with SMTP id f21so15219529pjt.2; Tue, 23 Aug 2022 14:51:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=nJlGgu+UiWg0jGLCCkvK9Q3ZCA8u9YpCohfg9DM04FQ=; b=SZHdPRgoyrl/CWmeiuuziE8fxDG0dK9XSn8sjUsYhAC0Q4+l5vBpHIIAAJISwuPG5J vXRIL/u9c0n0lSYWRlGX+V19hthAKbcnPtvQmAnPIykZax9I9/LbGCov0c+1r7d7CJxO m8nLn5PZFlYbl1h7V86VjuZoUb0yITSwa1/Jxc8WEhGqXEwwKIQJol6m+o0i39YjxkRV vyfo+veHJj3qHm2G/MkLGa1bggEvxMrV4Ui2hHleUKPUaxRTepIKckz3IhdeNJZvUaW/ M0snYvkOBnybximGfKAMs8PrOhDFDxAzsCS1BsOTQ8Sg/86lW4bLKyuRsc29azCMVLYs bklw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=nJlGgu+UiWg0jGLCCkvK9Q3ZCA8u9YpCohfg9DM04FQ=; b=R5wcGyH7VcTufK5Yg6oiY+6ORlo3VacdIoB/BObKFw6dZCrJpAyCL1Cho87FNAe6Ps tOWIXq/712XR+nOw+68kXA5FNkFw71OvBqzt21pMrayEvB/MGCh/gLwzCwNicI2jvSVg ZtVZP9UtsAN/UxKrojBYSxd8m1Nl1qm50TClqLlMQRn8GKI2dDifkHmlHewhM2yqXY7c 6y4GdcBvHx50ob20zjokIHEExY346/MvOJKMqivls/l9toEGvhTLdSL9VrCAQBL0wKLA YeRiVnNr9nriN0/S7G1Ga98xodr0No1gyprXnHMq/uVwZgoZIOeJegxBvqxNwdWMQ88u OZMw==
X-Gm-Message-State: ACgBeo3NYwTuCmizWOjoRqbPlm4yWcy8exEHPTcaWN6mjPNgXClxYAuE k2VI8Icera/qjgFuUIcxWVIOdk9yrR9shUCskVOVnWY7BcI=
X-Google-Smtp-Source: AA6agR7hGw6AylEhM6xPgaUj5lDDND544QLAjZIuDeZuIYA/K34uMCtwsjCunQEuGBAGaLFCO7cgoY9XLoGiwWCXlMs=
X-Received: by 2002:a17:902:ba8e:b0:172:ddb9:fe45 with SMTP id k14-20020a170902ba8e00b00172ddb9fe45mr13781017pls.86.1661291505043; Tue, 23 Aug 2022 14:51:45 -0700 (PDT)
MIME-Version: 1.0
References: <CAHw9_iKZJndu1100LBU3TiuhF9ACb0As2deA1oZWD2eA46tBbA@mail.gmail.com>
In-Reply-To: <CAHw9_iKZJndu1100LBU3TiuhF9ACb0As2deA1oZWD2eA46tBbA@mail.gmail.com>
From: Brian Dickson <brian.peter.dickson@gmail.com>
Date: Tue, 23 Aug 2022 14:51:33 -0700
Message-ID: <CAH1iCiqryY=u6MN2mkf7krHLmc7TQkoDaXe0k=ZZ+0e9uiMb-Q@mail.gmail.com>
To: Warren Kumari <warren@kumari.net>
Cc: dnsop <dnsop@ietf.org>, draft-ietf-dnsop-svcb-https.all@ietf.org
Content-Type: multipart/alternative; boundary="00000000000006d19e05e6ef92a2"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/0GpKWg5eLkUlNk9ygHIVIkZ1XrQ>
Subject: Re: [DNSOP] Questions / concerns with draft-ietf-dnsop-svcb-https (in RFC Editor queue)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Aug 2022 21:51:47 -0000

On Sat, Aug 20, 2022 at 10:07 AM Warren Kumari <warren@kumari.net> wrote:

> Brian Dickson recently reached out to one of the DNSOP chairs to raise
> some technical concerns related to the AliasMode functionality in
> draft-ietf-dnsop-svcb-https.
>
> Although this document has already passed WGLC, IETF LC, IESG Eval, and
> was approved and sent to the RFC Editor, I want to make sure that the DNSOP
> working group has a chance to discuss any lingering concerns.  Accordingly,
> I have asked the RFC Editor to hold publication for now (note that the hold
> itself is not expected to delay publication of the document, which is
> blocked anyway due to missing references).
>
> As the document was already extensively discussed and approved, we should
> only make substantive changes if they are very clearly warranted (e.g
> something that would otherwise be an errata, or "OMG! That clearly doesn't
> work, 1+1 doesn't equal 17…") —  this is *not*  an opportunity to
> re-litigate existing  decisions, make non-required changes, etc.
>
> I believe that Brian is on vacation this week, and I wasn't really able to
> parse his issue with the document, so I ask him to clearly state the issue
> on-list when he returns. I would like to have whatever discussions wrapped
> up within 2 weeks from then so that I can release it back to the RFC
> Editor.
>
> Pausing publication is an unusual, but definitely not unprecedented, step.
> Although we are able to make changes until a document is published as an
> RFC, once it is approved and sent to the RFC Editor, we should only make
> (non-editorial) changes in exceptional circumstances…
>
> I'd like to also thank the authors and WG in advance for their time and
> for keeping this discussion focused,
> W
>
>
Thank you Warren.

I'll try to first raise the highest-level concern, which is that there are
some elements which appear to have some level of ambiguity, that result in
implementations doing different things.

The place where these ambiguities exist is on the client side of things,
meaning the procedures followed by clients, including how to interpret DNS
responses that originate from authoritative DNS servers (either directly,
or via resolvers, or via stub libraries).

To be clear: the wire format parts are fine, and what an zone administrator
should publish is not in any way impacted.

The differences in interpretation, and the client behavior under one of
those interpretations, are the problem.

The easiest way forward, I think, is to try to add enough clarification to
have a discussion about which interpretation has consensus. IMNSHO, the
draft needs to reach the point where only one interpretation is possible,
so that all implementations are in agreement, at least in the fundamental
aspect of the how clients should behave.

Once there is some clarification on the proposed text (e.g. with two
alternative approaches equally clearly described), then the conversation
can progress to "which of these is what DNSOP wants to be published"?

So, having prefaced things this way, here are the specific elements that
are apparently ambiguous.

I'll summarize as much as I can, but I will also include a couple of emails
from a thread between myself and the authors, chairs, one implementer,
included mostly to demonstrate that the interpretation in question is quite
specific and well articulated, i.e that I'm not possibly mis-interpreting
something someone said.

   - The problem is whether/when/how the DNS queries are considered
   failures, and whether/when/how some sort of fall-back procedure is followed
   in those cases.
   - This includes ambiguity over whether further DNS queries/responses are
   required, if HTTP connection failures occur with resolved TARGET values.
   - The ONLY concern is whether an AliasMode record (particularly at the
   zone apex) is treated EXACTLY the same as a constrained CNAME (i.e.
   unconditional QNAME rewrite if the RRTYPE is appropriate).
      - Unconditional would imply that an HTTPS-aware (or SVCB-aware, if
      you prefer) client never backtracks to the origin name to look up A/AAAA
      records for use, or more precisely, if the client does look up the A/AAAA
      records speculatively, if it gets an AliasMode record, it does not use
      those A/AAAA records under any conditions.
      - Conditional would imply that there are conditions under which the
      client MIGHT use sibling A/AAAA records instead of a valid AliasMode
      record, even if the AliasMode record was cryptographically protected and
      did not have a Chain-Length error. This situation, even if only "under
      certain circumstances", is the ANAME behavior.

Here is a longer description where the problems/ambiguities appear to
exist, which should be clarified first, and then discussed to decide what
to do about them.
There are some phrases or terms that are not defined, or inconsistently
used, or less than comprehensively enumerated:

   - In section 3, the term "SVCB-optional" specifically only refers to
   "ServiceMode Records" (FIXME qv section 3.1 and 10.1 exceptions)
   - The enumerated steps uses the phrase "SVCB resolution has failed".
   - "whether successful or not" plus appending the final $QNAME without
   SvcParams, is followed by reference to "falling back to non-SVCB connection
   modes".
      - A "connection mode" is an HTTP(S) thing, but this does not specify
      the DNS component of whatever is intended by "falling back".
      - This is immediately followed in the parenthesized text by ensuring
      that SVCB-optional clients will make use of an AliasMode record.
      - Two paragraphs later, we have: "If the client is SVCB-optional, and
      connecting using this list of endpoints has failed, the client
now attempts
      to use non-SVCB connection modes."
         - This is not consistent with the use of AliasMode records vs
         CNAME records, meaning a CNAME and an AliasMode record as alternative
         methods of delegating authority, would behave differently.
         - I.e. this behavior conflicts with the stated intent and behavior
         from Introduction (1.), Goals (1.1), and AliasMode (2.4.2).
      - Also, the AliasMode section (2.4.2) has some text that conflates
   multiple issues, which appears to be one potential source of one of the
   major problems (ANAME behavior):


   - As legacy clients will not know to use this record, service operators
   will likely need to retain fallback AAAA and A records alongside this SVCB
   record, although in a common case the target of the SVCB record might offer
   better performance, and therefore would be preferable for clients
   implementing this specification to use.
   - The conflation is between "legacy clients", and "preferable for
      clients implementing the specification".
      - Legacy support is not "fallback", which is where the conflation is
      introduced
      - Non-legacy clients (which might better be described as SVCB-aware)
      should NOT be using records intended (by the zone administrator) for
      legacy-only usage.
      - Non-legacy clients using legacy-only records (A/AAAA records with
      the same owner name as an AliasMode SVCB record) is what causes the ANAME
      behavior to occur
      - ANAME was soundly rejected by DNSOP. Introducing ANAME-like
      behavior is a major problem
      - This behavior is introduced implicitly, rather than explicitly.
      - Having this documented explicitly is essential to resolving the
      client behavior ambiguity.

The Client behavior section (3.):

   - In section 3.1 (Handling resolution failures), there are some partial
   enumerations that leave unstated what the alternative situation requires.
      - "If DNS responses are not cryptographically protected, clients MAY
      treat SVCB resolution failure as fatal or non-fatal".
         - What if DNS responses ARE cryptographically protected? And does
         that differ between protection mechanisms (DNSSEC vs
encrypted transport)?
         - The first sentence (regarding cryptographically protected
         responses) only partially enumerates the cases, i.e. specific
sources of
         resolution failure.
            - Explicit declaration of NXDOMAIN as being either a resolution
            failure, or not a resolution failure, would clarify this
considerably (and
            IMNSHO, should be a non-failure).
            - Similarly, NOERROR/NODATA response handling should be
            described to avoid ambiguity.
         - The case consisting of a single AliasMode record (without CNAME,
      which MUST be handled per 2.4.2 "This limit MUST NOT be zero, i.e.
      implementations MUST be able to follow at least one AliasMode record."),
      which is cryptographically protected, and which does not have any of the
      enumerated resolution failures, appears to not be covered under the
      category of "resolution failure". It also appears to not be
covered by the
      "MAY treat as non-fatal" clause.
         - If the TARGET of a single AliasMode is unreachable, or is
         NXDOMAIN, or has NOERROR,NODATA for A/AAAA record queries,
how should this
         be handled?
         - *This appears to be the specific place *where the ambiguity can
         result in ANAME-like behavior, and where implementations may
diverge in
         behavior.

Apex usage of A/AAAA purposes, compared:

   - There are multiple possible reasons for inclusion of A/AAAA records at
   a zone apex:
      - Serving HTTPS-enabled zones to legacy clients, when a CDN serving
      the domain has stable A/AAAA addresses
      - Alerting legacy clients that they are not supported, using error
      pages specific to the client (e.g. User-Agent based response pages)
      - Non-WWW services available via IP address (SMTP, SSH, FTP, etc)
      - If no WWW services are present at such IP addresses, client
      connection attempts could negatively impact other services.
   - Multiple SVCB-compatible RR types may be present at a zone apex
      - Each such SVCB-compatible record type could have equally-legitimate
      fall-back address requirements
      - The current specification for HTTPS effectively forecloses any
      other use of apex A/AAAA records, if the interpretation of "fall-back" is
      to regress all the way to the origin name and using A/AAAA records at the
      zone apex
   - Brittle A/AAAA addressing (fast flux addresses with low TTLs) are
   incompatible with use of apex A/AAAA records for HTTPS-aware clients
   - Everything bad about ANAME would be incorporated if apex A/AAAA
   addresses are required to be applicable to HTTPS-aware clients
   - The existence of an HTTPS AliasMode record at a zone apex SHOULD cause
   an HTTPS-aware client to never use the A/AAAA records at a zone apex, even
   if the SVCB process fails or the client is unable to connect to the service
   over any SVCB ServiceMode end-points or the IP address(es) of the final
   $QNAME.

Here is a summary of what I would like to have happen, personally:

   1. I am strongly in favor of clarifying these issues, and once any
   problems are resolved, quickly moving to publication.
   2. The HTTPS AliasMode record is something we want to start using as
   soon as possible, e.g as soon as the majority of browser vendors have
   implemented the correct client behavior in a major release that is widely
   adopted (we may be near that point modulo the Chrome issue)
   3. The client implementations appear to have complexity introduced when
   the "fall-back" logic is required, which go away if/when the "fall-back"
   process is removed. This is likely a gating factor in at least one browser
   deploying the support for AliasMode.
   4. We currently only care about AliasMode, from the perspective of
   authoritative DNS zones operated by us. We have implemented it and deployed
   it.
   5. The main issue is not the interoperable wire format stuff, or the
   implementable state of the specification. Those are fine, and we have
   implemented and deployed HTTPS AliasMode support already.
   6. The main issue is usability of AliasMode records -- putting HTTPS
   records at the zone apex (for lots of zones we manage). The "always follow
   the AliasMode without fallback" for HTTPS-aware clients is the key
   requirement.
   7. Resurrecting ANAME behavior in corner cases is every bit as bad as
   choosing ANAME as the standard.
   8. The fallback that appears to have ended up (in some corner cases) is
   really unfortunate, and effectively useless. If the HTTPS AliasMode record
   results in unreachable web sites, that's an entirely acceptable outcome in
   all cases. Fallback at best would partially mask the problems, making
   identification and correction more difficult, while also resulting in poor
   user experiences.
   9. In the worst case, broken Targets for AliasMode records have the
   ability to cause problems for any legacy-only resources, including
   potential financial impacts (via consuming of unbudgeted resources), again,
   for no real benefit.

Brian Dickson

Included quoted texts: email from me to authors; response by Ben (one of
the authors).

Hi, everyone,
>
> I have been working through some implementation challenges in interpreting
> the proscribed behavior in the current draft.
>
> (I'm with GoDaddy's DNS team, and have been working with the Google Chrome
> folks on handling of AliasMode records.)
>
> The TL;DR: is that there is some ambiguity that needs to be cleared up.
>
> I'm hoping these issues can be cleared up with some additional text.
> The exact wording isn't crucial, so much as that the client resolution
> process can be made unambiguous.
>
> (There is an implicit familiarity expectation with core DNS specs 1033,
> 1034, and 1035, where those specs themselves are somewhat lacking, and
> outside of the DNS industry, not many folks have the necessary experience
> to work around those issues.)
>
> The main issues are as follows:
>
>    - Clarification on NXDOMAIN aka Rcode==3, as relates to Section 3.1.
>       - NXDOMAIN should be explicitly included in 3.1 as "not a
>       resolution failure per se".
>       - AliasMode Targets with NXDOMAIN resolution results MUST be
>       handled the same as CNAME resolution with NXDOMAIN.
>    - Clarification on the overall handling of AliasMode records.
>       - I believe the intent is that AliasMode records should ALWAYS be
>       followed, even if the ultimate disposition of ServiceMode lookups fail, in
>       there should not be any backtracking to before any AliasMode lookups.
>       - In other words, there may need to be an extra terminology entry
>       for QNAME that means "QNAME after following all CNAME and AliasMode records"
>       - The resolution steps at the start of Section 3 might need to be
>       cleaned up to distinguish AliasMode and ServiceMode related lookup results.
>    - Inclusion of specification of what is meant by "fall back to
>    non-SVCB connection modes".
>       - This is referenced in a few places, but is not defined or
>       specified.
>       - IMNSHO, this connection mode should be declared as "use that last
>       QNAME from following AliasMode redirects (and CNAME redirects), and make
>       the connection using no SvcParams and using only A/AAAA records resolved by
>       the looking up the last QNAME.
>       - Is it possibly the case that the parenthetical portion of the
>       fourth-last paragraph in section 3 is intended to DEFINE the fall-back
>       mode, rather than the last thing to try before falling back to the
>       currently undefined fall-back mode?
>    - Clarification on use of SVCB record not existing, in 3.1
>       - I think the intent here is actually "SVCB ServiceMode record",
>       and to treat the result as if the "SVCB ServiceMode record did not exist",
>       and to use the name of any redirections from CNAME and AliasMode records as
>       the service endpoint to use.
>    - Clarification on soft vs hard failures
>       - The "fatal vs non-fatal" should apply only to ServiceMode records
>       - NXDOMAIN results on AliasMode lookups MUST be treated as hard
>       failures
>       - Any other resolution failures on AliasMode records MUST be
>       treated as hard failures
>    - NXDOMAIN handling of parallel queries
>       - When there are parallel queries for (SVCB or HTTPS) records along
>       with A and AAAA records, an NXDOMAIN response for any of them MUST be
>       treated as an NXDOMAIN result for all of them. (This is a tautology, BTW.)
>       - It may be worth adding words to that effect, so that implementers
>       can avoid delays waiting for now-moot queries. This would allow faster
>       progression to alternative ServiceMode records, and/or terminating all
>       queries (if no path forward exists at an AliasMode record).
>
> Sorry for the late timing of this.
> We (authoritative DNS implementers) considered the spec not terribly clear
> but at least unambiguous.
> It was only when communicating with browser vendors doing implementation
> of the client side (Google Chrome in particular) that the issues surfaced.
>
> I think we all want this to be consistently implemented, and to be
> consistent with the authors' intents.
>
> Please correct me if my overall understanding (AliasMode == CNAME at apex,
> including obeying ALL of the behavior limits and RCODE results associated
> with CNAME) isn't corect.
>
> Thanks,
> Brian Dickson
>

Response from Ben:
[Apologies, my mail client wouldn't quote this correctly, so the rest of
this message is Ben's response, not indented/quoted properly.]


On Sun, Jul 24, 2022 at 4:30 PM Brian Dickson <brian.peter.dickson@gmail.com>
wrote:

> Hi, everyone,
>
> I have been working through some implementation challenges in interpreting
> the proscribed behavior in the current draft.
>
> (I'm with GoDaddy's DNS team, and have been working with the Google Chrome
> folks on handling of AliasMode records.)
>
> The TL;DR: is that there is some ambiguity that needs to be cleared up.
>
> I'm hoping these issues can be cleared up with some additional text.
> The exact wording isn't crucial, so much as that the client resolution
> process can be made unambiguous.
>
> (There is an implicit familiarity expectation with core DNS specs 1033,
> 1034, and 1035, where those specs themselves are somewhat lacking, and
> outside of the DNS industry, not many folks have the necessary experience
> to work around those issues.)
>
> The main issues are as follows:
>
>    - Clarification on NXDOMAIN aka Rcode==3, as relates to Section 3.1.
>       - NXDOMAIN should be explicitly included in 3.1 as "not a
>       resolution failure per se".
>
> The current text is "... fails due to an authentication error, SERVFAIL
response, transport error, or timeout".  It seems to me that NXDOMAIN is
clearly not on that list.  Are you sure this needs clarification?

>
>    - AliasMode Targets with NXDOMAIN resolution results MUST be handled
>    the same as CNAME resolution with NXDOMAIN.
>
> I don't think normative comparisons to CNAME are a good idea.  CNAME and
SVCB have conceptual parallels but work quite differently.  Also, I'm not
sure this would be correct.  The current text says

   If the client is SVCB-optional, and connecting using this list of
   endpoints has failed, the client now attempts to use non-SVCB
   connection modes.

In the event of an AliasMode record pointing to NXDOMAIN, I would expect
SVCB-optional clients to retry with non-SVCB connection.

>
>    - Clarification on the overall handling of AliasMode records.
>       - I believe the intent is that AliasMode records should ALWAYS be
>       followed, even if the ultimate disposition of ServiceMode lookups fail, in
>       there should not be any backtracking to before any AliasMode lookups.
>
> As noted above, I believe this would be a substantive change from the
present specification.

>
>    - In other words, there may need to be an extra terminology entry for
>       QNAME that means "QNAME after following all CNAME and AliasMode records"
>       - The resolution steps at the start of Section 3 might need to be
>       cleaned up to distinguish AliasMode and ServiceMode related lookup results.
>    - Inclusion of specification of what is meant by "fall back to
>    non-SVCB connection modes".
>       - This is referenced in a few places, but is not defined or
>       specified.
>
> I can't think of a clearer formal way to say "connect however you would
have connected if this specification did not exist".

>
>    - IMNSHO, this connection mode should be declared as "use that last
>    QNAME from following AliasMode redirects (and CNAME redirects), and make
>    the connection using no SvcParams and using only A/AAAA records resolved by
>    the looking up the last QNAME.
>
> This is addressed in the draft, and it is not "non-SVCB connection
establishment":

>
>    - Is it possibly the case that the parenthetical portion of the
>    fourth-last paragraph in section 3 is intended to DEFINE the fall-back
>    mode, rather than the last thing to try before falling back to the
>    currently undefined fall-back mode?
>
> For posterity, that text is:

   SVCB-
   optional clients SHALL append to the priority list an endpoint
   consisting of the final value of $QNAME, the authority endpoint's
   port number, and no SvcParams.  (This endpoint will be attempted
   before falling back to non-SVCB connection modes.  This ensures that
   SVCB-optional clients will make use of an AliasMode record whose
   TargetName has A and/or AAAA records but no SVCB records.)

This is not considered "non-SVCB connection establishment" because SVCB has
still influenced the QNAME.

>
>    - Clarification on use of SVCB record not existing, in 3.1
>       - I think the intent here is actually "SVCB ServiceMode record",
>       and to treat the result as if the "SVCB ServiceMode record did not exist",
>       and to use the name of any redirections from CNAME and AliasMode records as
>       the service endpoint to use.
>
> For posterity, the text is:

   If the client is unable to complete SVCB resolution due to its chain
   length limit, the client MUST fall back to the authority endpoint, as
   if the origin's SVCB record did not exist.

The intent here is indeed to fall back all the way to the authority
endpoint.  If clients would only fall back to some intermediate point in
the alias chain based on their length limit, operators would become
obligated to offer the service from every intermediate name in the chain.
By falling back all the way to the authority endpoint, we ensure that
operators are only required to offer service at the authority endpoint
(i.e. non-SVCB connection) and the actual SVCB service endpoints.

This is essentially parallel to CNAME: service operators are not obligated
to offer the service at each step of a CNAME chain.

>
>    - Clarification on soft vs hard failures
>       - The "fatal vs non-fatal" should apply only to ServiceMode records
>
> For posterity, the text is:

   If DNS responses are not cryptographically protected, clients MAY
   treat SVCB resolution failure as fatal or nonfatal.

I'm not sure what you're saying here.  When a SVCB DNS query fails, the
client doesn't know whether that query would have returned an AliasMode or
a ServiceMode query.

Regardless, the point of this line is merely to reiterate that the
downgrade protections considered by this section are largely moot if there
is no security between the client and resolver.

>
>    - NXDOMAIN results on AliasMode lookups MUST be treated as hard
>    failures
>
>
>    - Any other resolution failures on AliasMode records MUST be treated
>    as hard failures
>
> As I think is clear from Section 3, failure to resolve an AliasMode
TargetName is indeed a hard failure for SVCB-required clients, but
SVCB-optional clients can tolerate this by abandoning SVCB entirely.

We could change this, by declaring that SVCB-optional clients MUST disable
their fallback in this case, but I see no advantage to this.  These clients
would still have the fallback logic for other cases, and excluding this
case seems like more work than including it, for client implementors.  For
operators, excluding this fallback increases operational fragility in the
event of error, and conveys no obvious benefit.

>
>    - NXDOMAIN handling of parallel queries
>       - When there are parallel queries for (SVCB or HTTPS) records along
>       with A and AAAA records, an NXDOMAIN response for any of them MUST be
>       treated as an NXDOMAIN result for all of them. (This is a tautology, BTW.)
>       - It may be worth adding words to that effect, so that implementers
>       can avoid delays waiting for now-moot queries. This would allow faster
>       progression to alternative ServiceMode records, and/or terminating all
>       queries (if no path forward exists at an AliasMode record).
>
>
I'm not aware of any such rule in Happy Eyeballs, which is the basis for
this kind of parallel querying.  Diverging from the Happy Eyeballs rules
(or overspecifying the behavior here to conflict with Happy Eyeballs) would
prevent client implementors from reusing their Happy Eyeballs
implementation.

This optimization is an interesting observation, but it seems clear to me
that a NXDOMAIN TargetName is always an operator error, so I don't think it
is worth defining performance optimizations for that case.


> Sorry for the late timing of this.
> We (authoritative DNS implementers) considered the spec not terribly clear
> but at least unambiguous.
> It was only when communicating with browser vendors doing implementation
> of the client side (Google Chrome in particular) that the issues surfaced.
>
> I think we all want this to be consistently implemented, and to be
> consistent with the authors' intents.
>
> Please correct me if my overall understanding (AliasMode == CNAME at apex,
> including obeying ALL of the behavior limits and RCODE results associated
> with CNAME) isn't corect.
>

It's certainly not identical to CNAME in general, as it only applies to a
single "scheme" on the hostname, not the entire hostname.  However, it is a
fairly close parallel, including the failure modes, in the SVCB-required
case.  For now, SVCB-optional is the common case, because most protocols
predate SVCB, and here there is a substantial difference because both the
beginning and the end(s) of the chain are considered valid endpoints.