Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-tcp-requirements-13: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 24 December 2021 19:28 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E50923A0FF2; Fri, 24 Dec 2021 11:28:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.496
X-Spam-Level:
X-Spam-Status: No, score=-1.496 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.399, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id elowLh7dV6OH; Fri, 24 Dec 2021 11:28:02 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 102AA3A0FDA; Fri, 24 Dec 2021 11:28:01 -0800 (PST)
Received: from mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 1BOJRqtT016096 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Dec 2021 14:27:58 -0500
Date: Fri, 24 Dec 2021 11:27:51 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: "Wessels, Duane" <dwessels@verisign.com>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-dnsop-dns-tcp-requirements@ietf.org" <draft-ietf-dnsop-dns-tcp-requirements@ietf.org>, "dnsop-chairs@ietf.org" <dnsop-chairs@ietf.org>, "dnsop@ietf.org" <dnsop@ietf.org>, Suzanne Woolf <suzworldwide@gmail.com>
Message-ID: <20211224192751.GA11486@mit.edu>
References: <163520226600.2076.6225006958067294469@ietfa.amsl.com> <B779A165-3FB3-49F0-B4BD-65AD68E9A933@verisign.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <B779A165-3FB3-49F0-B4BD-65AD68E9A933@verisign.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/3egjnMjdURAcSHlLotxrIsOMob4>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-tcp-requirements-13: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Dec 2021 19:28:06 -0000

Hi Duane,

Sorry for the very delayed response.
I will go update my ballot position shortly, but please see
https://github.com/jtkristoff/draft-ietf-dnsop-dns-tcp-requirements/pull/11
for some further edits to the text for discuss point (1).

On Mon, Nov 08, 2021 at 10:35:21PM +0000, Wessels, Duane wrote:
> Hi Ben, thank you for the detailed review.  It has taken me a while to work through
> all of your comments and suggestions, but hopefully this addresses them sufficiently.
> 
> 
> 
> > On Oct 25, 2021, at 3:51 PM, Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > (1) This should be pretty easy to resolve, but this text from §4.4
> > does not seem to match up with the referenced document:
> > 
> >   The use of TLS places even stronger operational burdens on DNS
> >   clients and servers.  Cryptographic functions for authentication and
> >   encryption requires additional processing.  Unoptimized connection
> >   setup takes two additional round-trips compared to TCP, but can be
> >   reduced with TCP Fast Open, TLS session resumption [RFC8446] and TLS
> >   False Start [RFC7918].
> > 
> > Two additional round trips was true of TLS 1.2 and prior versions, but
> > as of TLS 1.3 the application data from the client can be sent after
> > only 1 round trip, accompanying the client Finished (and authentication
> > messages, if in use).  Given the nature of the rest of the sentence, we
> > might want to specifically mention TLS 1.3 as an improvement over TLS
> > 1.2, but there are probably a number of ways that we could fix it.  Note
> > additionally that for TLS 1.3, session resumption is not a reduction in
> > the number of round trips unless 0-RTT data is used (but AFAIK there is
> > not a published application profile specifying acceptable DNS content
> > for TLS 0-RTT data, so use of TLS 0-RTT data for DNS is forbidden), but
> > is still an efficiency gain due to the reduced number of cryptographic
> > operations (including certificate validation).
> 
> Is this better?
> 
>    The use of TLS places even stronger operational burdens on DNS
>    clients and servers.  Cryptographic functions for authentication and
>    encryption requires additional processing.  Unoptimized connection
>    setup with TLS 1.3 [RFC8446] takes one additional round-trip compared
>    to TCP.  Connection setup times can be reduced with TCP Fast Open,
>    and TLS False Start [RFC7918].  TLS session resumption does not
>    reduce round-trip latency becase no application profile for use of
>    TLS 0-RTT data with DNS has been published at the time of this
>    writing.  However, TLS session resumption can reduce the number of
>    cryptographic operations.

As mentioned above,
https://github.com/jtkristoff/draft-ietf-dnsop-dns-tcp-requirements/pull/11
has a few tweaks to this to account for the differences between TLS 1.2 and
TLS 1.3, and the skew in feature availability between the TLS versions.
(Highlights: RFC 7918 is 1.2-only, and TLS 1.2 session resumption does save
a round-trip (the current text implicitly assumes 1.3).)

> 
> > (2) Trivial to address, but the section heading for Appendix A.8
> > references RFC 3326 (The Reason Header Field for the Session Initiation
> > Protocol (SIP)), not RFC 3226 (DNSSEC and IPv6 A6 aware server/resolver
> > message size requirements)
> 
> Thanks, this has been corrected.
> 
> 
> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > This document targets BCP status, while proposing an Updates:
> > relationship with an Internet Standard and an Informational document.
> > As mentioned in
> > https://secure-web.cisco.com/16mLP8dG_ibZ9_hePr-XOEAMh0ghoochCF5XPpWWxHVs2rqFvgauZy7CTG1l4e8I_O8UaNtqmDrRRxvsmORmD2R1aCRr-SZvwMWwE5IcT1ky6ggdBCttsM0zPB1RTtvux53rPHrjpxbLUVVLSGGOUyh6nVhhkw1KE2dqgrO18iMEmjsLol05Dkj15p5m2l-O0-ZirQbfx_OkxB8W8tw_WkmVpMMsLm7IQwjG-dVQgbr5pK_hBp8VLx7I4WgryUGMW/https%3A%2F%2Fmailarchive.ietf.org%2Farch%2Fmsg%2Flast-call%2FoMhqRigr4nbRSMll5NY4zXpZu20%2F
> > there is some motivation for having updates to standards-track documents
> > occur via other standards-track documents, but given that this document
> > is mostly about giving operational guidance for DNS-over-TCP usage,
> > there seems to be some argument that BCP is a more appropriate status
> > for it.  However, I do wonder if there is some existing BCP that this
> > document would become part of, or if it would need to be given a new BCP
> > number.  It's not entirely clear how much scope there is for future
> > additions that would become part of that same BCP, and whether a BCP
> > number is truly needed in that scenario.  It would be good to hear
> > others' thoughts on this topic, especially in the form of references to
> > previous WG discussion.
> > 
> > I made a pull request on github with some editorial suggestions, most of
> > which by volume are classifying the "Standards Track" documents in
> > Appendix A properly as Internet Standard or Proposed Standard (there
> > don't seen to be any lingering Draft Standards in the list):
> 
> This has been merged.
> 
> > 
> > Section 2.4
> > 
> >   headers.  Unfortunately, it is quite common for both ICMPv6 and IPv6
> >   extension headers to be blocked by middleboxes.  According to
> >   [HUSTON] some 35% of IPv6-capable recursive resolvers were unable to
> >   receive a fragmented IPv6 packet.  [...]
> > 
> > I looked through [HUSTON] and wasn't able to find this "35%" figure.
> > There is a remark that "37% of endpoints used IPv6-capable DNS resolvers
> > that were incapable of receiving a fragmented IPv6 response", but that
> > seems to be a somewhat different statement in addition to a somewhat
> > different percentage value.  In particular, the sampling domain for the
> > [HUSTON] statement as written appears to be "all endpoints", but the
> > statement in this draft uses the sampling domain of "IPv6-capable
> > recursive resolvers".  However, the corresponding calculation in
> > [HUSTON] looks to be using "IPv6-capable resolvers" as the sampling
> > domain, just as this document states, which suggests that the error in
> > phrasing is in [HUSTON] and not in this document.
> 
> The 35% figure is derived from this statement in the reference:
> 
>    We saw 10,115 individual IPv6 addresses used by IPv6-capable recursive
>    resolvers.  Of this set of resolvers, we saw 3,592 resolvers that
>    consistently behaved in a manner that was consistent with being unable
>    to receive a fragmented IPv6 packet.
> 
> 3592/10115 is 35.51% and I took the liberty of rounding down rather than up.

Ah, that makes perfect sense.  Sorry for not having read the reference more
closely.

> 
> > 
> > Section 3
> > 
> > As the directorate review noted, it's a little surprising to see the
> > OLD/NEW formulation for one of the updates to §6.1.3.2 of RFC 1123 but
> > not the others.
> > 
> > In particular, using OLD/NEW would allow us to implicitly reiterate that
> > the "MUST send a UDP query first" requirement for non-zone-transfers
> > is no longer present (by virtue of the update made by RFC 7766).
> 
> This has been changed as noted in my reply to John Scudder.
> 
> > 
> >   *  Recursive servers (or forwarders) MUST support and service all TCP
> >      queries so that they do not prevent large responses from a TCP-
> >      capable server from reaching its TCP-capable clients.
> > 
> > This might benefit from a bit of unpacking.  I think that "MUST support
> > and service ... queries" refers to the stub/recursive side of things,
> > with "responses from a TCP-capable server" would refer to the
> > recursive/authoritative side.  But the rest of the sentence seems to be
> > assuming that if TCP is used on one side then it is used on the other
> > side, so that limitations of TCP use on one side do not carry over to
> > the other.  However, I didn't think that there was a requirement on
> > recursives to go forward with TCP for queries received over TCP, so I'm
> > not entirely sure what the actual guidance here is intended to be.
> 
> Agreed, hopefully this is better:
> 
>    o  Authoritative servers MUST support and service TCP for receiving
>       queries, so that resolvers can reliably receive responses that are
>       larger than what fits in a single UDP packet.
> 
>    o  Recursive servers (and forwarders) MUST support and service TCP
>       for receiving queries, so their TCP-capable clients can reliably
>       receive responses that are larger than what fits in a single UDP
>       packet.
> 
>    o  Recursive servers (and forwarders) MUST support TCP for sending
>       queries, so that they can retry truncated UDP responses as
>       necessary.

[The version downthread in the the exchange with Joe looks good]

> 
> > 
> > Section 4.2
> > 
> >   DNS server software SHOULD provide a configurable limit on the total
> >   number of established TCP connections.  If the limit is reached, the
> >   application is expected to either close existing (idle) connections
> >   or refuse new connections.  Operators SHOULD ensure the limit is
> >   configured appropriately for their particular situation.
> > 
> > I think that one of the directorate reviews touched on this topic, but I
> > wonder if we can give more guidance on what factors of a particular
> > situation might be relevant for determining what is appropriate.  In
> > this case, that might include the number of requests the hardware is
> > capable of serving and the number of requests expected from legitimate
> > clients; we do seem to provide a bit more detail in the following
> > paragraph (not quoted here) regarding "number and diversity of users"
> 
> How about this?
> 
>    DNS server software SHOULD provide a configurable limit on the total
>    number of established TCP connections.  If the limit is reached, the
>    application is expected to either close existing (idle) connections
>    or refuse new connections.  Operators SHOULD ensure the limit is
>    configured appropriately for their particular situation, which
>    includes factors such as the number of users or clients, typical
>    traffic levels, and hardware resource constraints.
> 
>    DNS server software MAY provide a configurable limit on the number of
>    established connections per source IP address or subnet.  This can be
>    used to ensure that a single or small set of users cannot consume all
>    TCP resources and deny service to other users.  Operators SHOULD
>    ensure this limit is configured appropriately, based on their number
>    and diversity of users, and whether users connect from unique IP
>    addresses or through a shared Network Address Translator.
> 
> 
> > 
> > Section 8
> > 
> > There's a lot of good advice interspersed in the main body text already;
> > thank you for that!
> > 
> > The discussion in §4.1 suggests ("SHOULD") to share a TFO server key
> > amongst servers in a server farm, but this introduces the usual security
> > considerations for a group-shared symmetric key.  The highlights are
> > that any member of the group can impersonate any other member, and
> > compromise of one machine compromises all members' use of the key.
> > While there's not a great fully generic treatment of these issues in the
> > RFC series that I know of (yet, at least), I've seen RFC 4046 cited for
> > it sometimes, and draft-ietf-core-oscore-groupcomm has a section on
> > "security of the group mode" that also has some overlap with the
> > relevant considerations for sharing TFO keys.
> 
> I feel like this point should’ve been brought up in the TFO RFC (7413),
> rather than this document.  Section 6.3.4 talks about server farms but doesn’t
> mention security concerns about sharing keys.
> 
> Perhaps it would be appropriate in this document to say that server clusters
> should either use the same TFO server key (as recommended by 7413 sec 6.3.4),
> or just disable TFO?

It would have been nice if 7413 covered this topic, yes, but it didn't.
The reasons that 7413 recommends for clusters to share the same key remain
valid, but it comes with a caveat that a compromise of one such server
facilitates attacks on the others.  Your proposal here doesn't mention that
caveat, so it seems like the guidance is incomplete (even if the overall
dichotomy of choice is essentially complete).

To make a concrete (though perhaps straw-man) proposal:

  This document recommends that DNS Servers enable TFO when possible.  RFC
  7413 recommends that a pool of servers behind a load balancer with shared
  server IP address also share the key used to generate Fast Open cookies,
  to prevent inordinate fallback to the 3WHS.  This guidance remains
  accurate, but comes with a caveat: compromise of one server would reveal
  this group-shared key, and allow for attacks involving the other servers
  in the pool by forging invalid Fast Open cookies.

I recognize this is a lot of text for a fairly minor issue, but didn't come
up with anything shorter in the time allotted.

> 
> > 
> > In a similar vein, in §6 we again SHOULD-level recommend that
> > applications capturing network packets do TCP segment reassembly in
> > order to defeat obfuscation techniques involving TCP segmentation.  I am
> > happy to see that we go on to caution against resource exhaustion
> > attacks while doing so, but have two related comments: first, that
> > caution might merit mention again here, and second, that we should note
> > (either here or there) that when applying resource limits, there's a
> > tradeoff between allowing service and allowing some attacks to succeed.
> > Giving up on segmentation reassembly due to resource usage means that a
> > potential attack could succeed, but dropping streams where segmentation
> > recovery uses excess resources might deny legitimate service.
> 
> Is this sort of what you had in mind?
> 
>    As mentioned in Section 6, applications that implement TCP stream
>    reassembly need to limit the amount of memory allocated to connection
>    tracking.  A failure to do so could lead to a total failure of the
>    logging or monitoring application.  Imposition of resource limits
>    creates a tradeoff between allowing some stream reassembly to
>    continue and allowing some evasion attacks to succeed. 
> 
> 
> > 
> > We might also consider reiterating that the core DNS over TCP security
> > considerations (RFC 1035, ???) continue to apply.
> 
> 1035 doesn’t have a lot to say, but maybe you are thinking about whats
> in section 4.2.2?
> 
> Even so, this document is meant to be operational requirements and I suspect
> you are maybe thinking of protocol/implementation requirements, which are
> covered by RFC 7766?

Ah, yes, 7766 looks like a good thing to replace the "???" above.  (I
didn't check what was in 1035 when writing the above comment.)

The rest of the stuff (both above and below) looks good, and thank you
again for the detailed responses.

-Ben

> 
> > 
> > Clients that keep state about whether a given server supports TCP (per
> > discussion in §4.1) might be susceptible to an attacker that is on-path
> > in one location disrupting TCP in that location and causing the client
> > to store state that a given server does not support TCP, when TCP
> > connections from a different location, where the attacker is not on
> > path, would succeed.
> 
> The opening paragraph of section 4.1 has been updated due to other comments
> on this same topic.  It now reads:
> 
>    Resolvers and other DNS clients should be aware that some servers
>    might not be reachable over TCP.  For this reason, clients MAY track
>    and limit the number of TCP connections and connection attempts to a
>    single server.  Reachability problems can be caused by network
>    elements close to the server, close to the client, or anywhere along
>    the path between them.  Mobile clients that cache connection failures
>    MAY do so on a per-network basis, or MAY clear such a cache upon
>    change of network.
> 
> Does that address your concern?
> 
> 
> > 
> >   short-lived DNS transactions over TCP may pose challenges.  In fact,
> >   [DAI21] details a class of IP fragmentation attacks on DNS
> >   transactions if the IP ID field can be predicted and a system is
> >   coerced to fragment rather than retransmit messages.  [...]
> > 
> > I suggest more detail on the "IP ID field" (including IPv4/v6
> > differences).
> 
> Thanks, is this better?
> 
>    In fact,
>    [DAI21] details a class of IP fragmentation attacks on DNS
>    transactions if the IP Identifier field (16 bits in IPv4 and 32 bits
>    in IPv6) can be predicted and a system is coerced to fragment rather
>    than retransmit messages.
> 
> 
> > 
> > Section 9
> > 
> >   being queried).  DNS over TLS or DTLS is the recommended way to
> >   achieve DNS privacy.
> > 
> > Is it really the (sole) recommended way?  It certainly suffices, but
> > what is the status of DoH/DoQ?  Perhaps "DNS over TLS or DTLS serves to
> > provide DNS privacy" optionally followed by a note about DoH or "other
> > mechanisms" in general.  (May be superseded by Roman's Discuss.)
> 
> Updated:
> 
>    A number of protocols have recently been developed
>    to provide DNS privacy, including DNS over TLS [RFC7858], DNS over
>    DTLS [RFC8094], DNS over HTTPS [RFC8484], with even more on the way.
> 
> 
> > 
> > Section 11.1
> > 
> > I recommend re-review of the classification of the references.
> > Just because a reference is on the standards-track does not mean that we
> > must reference it normatively -- e.g., RFC 1995/1996 are mentioned only
> > in the listing of "other standards related to DNS transport over TCP",
> > but they are not required reading in order to understand and implement this
> > document.  See
> > https://secure-web.cisco.com/1LAKZvPIvRrtY7gk6aYxiUqcEPVzMhQCQvufa-Aml_Nz9I7q1WXzR2996DBooUfTPwaaOtq4ifa_Eu7GYG78WGP0Nu3e6e-dNvPIbRiN18gWSfQ39FVHBgFHUpmFVc_k9EDG76jRRivXT60d1eXptpTxnxMR0C8g-ghgoTjrdffE3D42xSerFsEHxd9A5s7k1j_ZyNWxB7infwBRY-6emNntk28Su5YD9C_IwpQ3WkD9vEkA76s9CXU_14MazOgfy/https%3A%2F%2Fwww.ietf.org%2Fabout%2Fgroups%2Fiesg%2Fstatements%2Fnormative-informative-references%2F
> 
> Thanks, I’ve moved the references only mentioned in the appendix to the Informative section.
> 
> DW
> 
> 
>