Re: [DNSOP] [Ext] Questions / concerns with draft-ietf-dnsop-svcb-https (in RFC Editor queue)

Brian Dickson <brian.peter.dickson@gmail.com> Sat, 27 August 2022 04:22 UTC

Return-Path: <brian.peter.dickson@gmail.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AF7DC14CEFC for <dnsop@ietfa.amsl.com>; Fri, 26 Aug 2022 21:22:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n-V7yV3pP6Si for <dnsop@ietfa.amsl.com>; Fri, 26 Aug 2022 21:22:17 -0700 (PDT)
Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 82DF2C14F737 for <dnsop@ietf.org>; Fri, 26 Aug 2022 21:22:17 -0700 (PDT)
Received: by mail-pf1-x434.google.com with SMTP id t129so3332166pfb.6 for <dnsop@ietf.org>; Fri, 26 Aug 2022 21:22:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=h9D1bkE35kqsig8P977MxQayEVra3qLj/j7cubA5XdI=; b=RFF7HdjLKp67TEUVxPmMJO5yuyUWLdD+CawkhtLVq7J+zuzVBFXjX+eS9dXUEcr4BX rExTLRKhN27/meBlvl1X0t7HToiwbFToYTC5y9If2NwiLb/0srTahYZJJHphbrN6lCo4 mb+aQWuJ7GpEjZc45vPPpSunSU5sm0Z+Wpm7jIaFGKr6qusC1nKXtrdESwgEfsQKixYa i1di4P+QMbyL7zvYlqTvMG3QVuSpMto1g5MZI+OWDwWE97B+94tsucH7AI+NUpqgLC7h OPR7RLnmbvZWno8RYxTqE4Vn4v1BOhqL5t4hZeV3Gg8M94nL9elEMMdGDkYvoFnFRNQG aExQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=h9D1bkE35kqsig8P977MxQayEVra3qLj/j7cubA5XdI=; b=FkAip3TbgVq0G5rcjoCyF1cdfulXJXS+mGsYTKpSa0Jawx2tluNHYlMqvmbeomXYAy a4c86nc6KdDuupSnwgWDqA5zCuBaORpuxqbpwsfevdI1IYBuRNK8gx4ao9o3329NJ0jS RcC24yelvBdrT5t4AdIWeaVxgKT16hspKj3yJiMghaPGJIUDvgLLWoFBfApKmT7U8X1t 6osg7SodyEysDgqjZ6y/69Z/T0bfw/XTHObkCFC39e089b+y5OieFbo1fJ+Llqzr0XkS R5sTfuLrwgGhv+mDRazbTPCneFCgYOijdDIiV9iHD0NFvcUKzDYtf+FNpX8GVdCSIznQ sKXg==
X-Gm-Message-State: ACgBeo3SvrYLJG/8pANMoAQrIV9U2nxHJ1KwAhYZ8uBEqsDzjpic8Mrp 5u2hHHHlxoh+ylH+MLSWo9BaIOOqX//8t0nmF8GjV3/4
X-Google-Smtp-Source: AA6agR6YcEpjxavzrrW7grHlMBdUSsP+2Zb/eR/QJ6ue2FosknQVnnblu2h4ZNGQAT8se0P0e8HbvxGa1pa3bBKp5ik=
X-Received: by 2002:a63:da52:0:b0:42a:ad57:3264 with SMTP id l18-20020a63da52000000b0042aad573264mr5668906pgj.175.1661574136680; Fri, 26 Aug 2022 21:22:16 -0700 (PDT)
MIME-Version: 1.0
References: <CAHw9_iKZJndu1100LBU3TiuhF9ACb0As2deA1oZWD2eA46tBbA@mail.gmail.com> <CAH1iCiqryY=u6MN2mkf7krHLmc7TQkoDaXe0k=ZZ+0e9uiMb-Q@mail.gmail.com> <YwaQrnoA3hifxCQW@straasha.imrryr.org> <CAMOjQcEcKQSWvb_LqmfkGwZ2dt_561jLZxHTMuMO0pMy2s9mbw@mail.gmail.com> <CAH1iCirnWdDY0p2-grQKN3PQWOM=JLevxbNskFFEzGwHvisGZA@mail.gmail.com> <B024358C-77FD-4E63-8E18-1CBCEA6C6B14@icann.org> <CAH1iCiry3VDS+dM+wEkPH5a_TSt5pEddxPjKOhL9_M20e_dR0A@mail.gmail.com> <8B970775-22CF-403B-9B8A-84DCC0932D76@icann.org> <CAHbrMsC_RO1J6qp_yOWOc3P4zpZ-cOCB6adXRwjoSQP7_yrWug@mail.gmail.com>
In-Reply-To: <CAHbrMsC_RO1J6qp_yOWOc3P4zpZ-cOCB6adXRwjoSQP7_yrWug@mail.gmail.com>
From: Brian Dickson <brian.peter.dickson@gmail.com>
Date: Fri, 26 Aug 2022 21:22:05 -0700
Message-ID: <CAH1iCioA0+zi_0RrmgKiEEUOErkwtj_-_5fqsZkGQHDiYtZ3ag@mail.gmail.com>
To: Ben Schwartz <bemasc@google.com>
Cc: Paul Hoffman <paul.hoffman@icann.org>, "dnsop@ietf.org WG" <dnsop@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000002f54df05e7316076"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/1qs52PFuNs9tCq-RCAp8Nr7nugM>
Subject: Re: [DNSOP] [Ext] Questions / concerns with draft-ietf-dnsop-svcb-https (in RFC Editor queue)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Aug 2022 04:22:18 -0000

On Thu, Aug 25, 2022 at 1:35 PM Ben Schwartz <bemasc@google.com> wrote:

>
> Brian proposes a use case of serving only a warning message on the origin
> endpoint, in order to minimize the load on IP addresses that are likely
> hardcoded into a customer's zone.
>

So, the major update to add to this is:

   - We (GoDaddy) have revisited this approach, and are now considering a
   much better design (summary follows below)


The design we are considering is deployment of Web redirect servers (via
apex A/AAAA records) which do HTTP 301 permanent redirect responses.
These would respond to connections to the apex domain ("example.com") and
redirect the client to a non-apex name ("www.example.com").
The non-apex name would have a CNAME to redirect to the actual delegated
authority.
The RDATA on the CNAME would be identical to the RDATA on the apex HTTPS
record.

Note the following:

   - This will provide legacy clients the same eventual connectivity as the
   HTTPS record, including connecting to the correct (aka "best") target node
   at the CNAME/HTTPS target name, since both are resolved by the client's
   resolver
   - Legacy clients will have a one-time latency penalty for the HTTP 301
   connection and redirect. This penalty is once per domain only, per client.
   - The apex A/AAAA, HTTPS, and www CNAME records are all cacheable, and
   likely to have long TTLs
   - The target name is identical, and client-resolver caching provides
   benefits to both legacy and HTTPS-aware clients.

Note also, the following:

   - The target of both the HTTPS and CNAME records are the same
   - Resolution failures or connection failures will have a shared fate,
   between legacy and HTTPS-aware clients
   - An HTTPS-aware client, which attempts to do the fallback procedure,
   will experience the legacy-mode delay due to the HTTP 301 rewrite, but will
   still end up hitting the same issue that triggered the fallback
      - In other words, for this publication scheme, fallback will NEVER
      achieve its desired/expected goal
      - Individual instances of fallback working due to temporary issues,
      would have had the same success achieved by merely retrying the
connection
      or resolution (tautologically!)

If we do deploy this, we will do so on all our customer domains using HTTPS.
This means that for those domains (in the millions or tens of millions),
the fallback in the draft will only result in added overhead while never
actually achieving any successful connections (due to shared fate between
legacy and HTTPS).

We know this will be the case for these domains.
The logical approach would be to do one of the following things:

   - Allow the domain owner to signal that fallback will not work, e.g.:
      - An AliasMode SvcParam (e.g. example.com HTTPS 0 mycdn.example.net
      nofallback)
      - or a third HTTPS "mode" record, to signal no fallback (e.g.
      example.com HTTPS 65534 . where 65534 is the "no fallback" mode
      signal, and "." is simply a placeholder domain needed for RRDATA
structural
      consistency)
      - Both of these would require significant changes to the draft, to
      clients, and to authority servers..
      - Strongly not recommended
   - Allow the domain owner to only supply fallback addresses explicitly,
   and in the absence of those, not do the fallback (e.g. using an attrleaf
   prefix label)
      - This is the "presence/absence is the signal" (e.g. _
      https_fallback.example.com A 192.0.2.1 // chosen from one of the RFC
      5737 blocks )
      - This is also extensible, since the attrleaf prefix would be
      (presumably) SVCB-record specific
      - HTTPS would have its own attrleaf prefix, and each new
      SVCB-compatible record would have its own attrleaf prefix
      - Would require changes to the draft, to clients, and to authority
      server zone publication automation (but not to the authority server
      software)
      - Not recommended
   - Remove fallback from the draft
      - Signaling is only needed if fallback is included in the draft
      - Much less work; only clients would require changes, and only to
      remove code/logic
      - Fail fast
      - Deterministic and reliable behavior
      - Interoperable across client implementations and server
      implementations
      - Still requires changes to the draft
      - Least of three "evils"

Removing the language from the draft does not force implementers to not do
their own thing. Individual client implementations could still do the
fallback thing, but would not be required to do so.
It does, however, put more responsibility on the implementers to respond to
issues raised if adverse effects result. It might be advisable to be a
user-configurable option, possibly off-by-default.
Implementers would not be able to deflect blame for problems via the "it's
what the RFC says" response, if problems do occur.



> Instead, the draft attempts to ensure that deploying and implementing the
> HTTPS record "does no harm", by giving participating clients no worse
> reliability than legacy clients.
>

This is one place where quantitative data would help the conversation
immensely.
Is there data concerning the failures observed (DNS resolution or HTTP
connections) in following CNAME records from authoritative zones to CDNs?
If the failure rates are really low, is that worth the effort in adding
this fallback flow?
If the failure rates are highly variable (by topology, DNS resolver
instance, client machine specs, network environment, etc.), is there any
experimental data to support a statistically significant improvement using
different approaches?
Is the DNS component a major contributor, or not?
If not, perhaps the benefits of ServiceMode actually become more important,
and falling back is actually likely to degrade, rather than improve, the
user experience?

Is the implementation of fallback strictly speculative?
If so, perhaps leaving it out of the draft, presenting results at DNS-OARC
once data is available, and publishing a -bis draft to include fallback (if
the data supports doing so) is a better approach?


> For example, post-deployment data from browsers may show that we could
> eliminate the final fallback without reducing reliability.
>

Among the problems introduced by HTTPS-aware clients successfully obtaining
AliasMode records, and then subsequently connecting via apex A/AAAA records
(when fallback occurs) is that DNS-level observations are adversely
affected.
This is true whether observing the authoritative servers for the zone, or
the recursive resolvers that clients are querying.
Looking only at the DNS traffic will yield data that is difficult to
correlate and interpret.
There will not be a clean "signal" identifying legacy-only clients.
There will not be any ability to correlate fallback behavior with client
software (browser "brand" generally, or brand+version).

So, attempting to optimize for failure can actually negatively impact
measurement of failure and root cause analysis.



> Viktor notes with concern that AliasMode is a "non-deterministic
> redirect".  Instead, the draft attempts to model the client behavior as a
> preference ordered stack of endpoints:
>
> 1. Basic: the origin endpoint (status quo ante)
> 2. Better: the endpoint at the end of the AliasMode chain
> 3. Best: the ServiceMode records
>
> I think it's best to think of AliasMode as an alias that is optional when
> SVCB is optional, and mandatory when SVCB is mandatory.
>


> This seems natural enough to me, and allows it to be used in environments
> like the web where "fail fast" is not an appealing option.
>

 Fail fast may not be appealing, but in some (probably the majority of)
cases, it may be the most correct option.

It may also be the case that the zone owner knows whether this is the case.
I think it is much more likely that explicitly declaring the situation (if
known) is more useful than having several billion clients independently
attempting to infer whether the first option will even work, let alone
provide a useful alternative to the second or third.

Brian