Re: [TLS] Transport Issues in DTLS 1.3
Eric Rescorla <ekr@rtfm.com> Fri, 26 March 2021 22:08 UTC
Return-Path: <ekr@rtfm.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 235A93A11F0 for <tls@ietfa.amsl.com>; Fri, 26 Mar 2021 15:08:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rtfm-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S0O3OaL9zmwN for <tls@ietfa.amsl.com>; Fri, 26 Mar 2021 15:08:40 -0700 (PDT)
Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 38BC33A11ED for <tls@ietf.org>; Fri, 26 Mar 2021 15:08:40 -0700 (PDT)
Received: by mail-lj1-x230.google.com with SMTP id r20so9083868ljk.4 for <tls@ietf.org>; Fri, 26 Mar 2021 15:08:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rtfm-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IzR3DWPHd6ZTxyAgwPmCiNS/CRbUsmbFyriatYNTsSc=; b=NFEWFDWWPWAO0JSrNtr19CnYpvuc0kH8O0Tzw82p6klW5pEKWlDOD4iKSyJlohkCdq qEVdOrA4W20C/VsXQFbGHHuvxInEO1ybg0Vn+jf2lbOlHV83XQ67MLhl/zEJIoAMDwM0 VziFb7ceMH2kw1ODhA4sHpFTJhAcw2q/PBphohls59ihZupgNc6HX3EtEAtxKQU2R6/N 8xVXJCoRztM6ZHwsEQZER9dZv1VpdERa8zcBAgsRO9MZrSmH4i1gGEoEGqsJi3n3CvDX VGOwzua5lRIHf+nIRN7sBVES7o427EgW2uv0T1I79YxGEk1Q8mtkG7kAwd3EU5JXDhfQ aVrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IzR3DWPHd6ZTxyAgwPmCiNS/CRbUsmbFyriatYNTsSc=; b=hDve7N8P9bJkEZBwhpg2z2NCn9DaIC3Sf6c8iIVYa9Cuv1jBqjP5A43BbBFVc0UYAR RYZLpMTmEvKS2q2L6fqKZSq/bNq292a4VsQS93wQcQhA9R6TyKAIJXCTOBAhlt8s41X7 C/uMiIwVb/iWedbzxOeyTbx/zWMRMX6GYX95jYsErOR480uQM5fDwCwxnf3X8MMpnrCB yqraHaq3bNYk1yqiwEy2cY7AAtYbwSSCia4tstvZ3Kfg7i6fU0eUK98igA8Zajf19BTt NkTg2qYbSCnTdIMt9wevjJ+rxT11I0MLYYGp0n8knRLRDsLCJdqtP8EviFh29YXrH/91 3x/w==
X-Gm-Message-State: AOAM5335tKr3v0xr6SaEFfU/DK0PWp9ob02BFBDxjQ18DyNXjDiMkZgv WUsniiKgcUQ2rkCKaVlg+qO6FXdHQaGyeCNpUSIxkQ==
X-Google-Smtp-Source: ABdhPJzZqGeQA1uCVBOK7Xu3mwtY/VtiGJG050Je2nF8l/IfVr1EtPYEJYge+r1NYmsvJeeWFR1IeJO6tF12iUB2Fj0=
X-Received: by 2002:a2e:b053:: with SMTP id d19mr10066817ljl.82.1616796517445; Fri, 26 Mar 2021 15:08:37 -0700 (PDT)
MIME-Version: 1.0
References: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com>
In-Reply-To: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com>
From: Eric Rescorla <ekr@rtfm.com>
Date: Fri, 26 Mar 2021 15:08:01 -0700
Message-ID: <CABcZeBMS5fUej0q5XhbxM5sMLQwAAyCgyAfbkTORQjvMM+jb7A@mail.gmail.com>
To: Martin Duke <martin.h.duke@gmail.com>
Cc: draft-ietf-tls-dtls13.all@ietf.org, Mark Allman <mallman@icsi.berkeley.edu>, Lars Eggert <lars@eggert.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, "<tls@ietf.org>" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000018c58b05be77c655"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/56-FTPW_h3acT1t7aC4d53orTqs>
Subject: Re: [TLS] Transport Issues in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Mar 2021 22:08:45 -0000
Hi folks, This is a combined response to Martin Duke and to Mark Allman. Before I respond in detail I'd like to level set a bit. First, DTLS does not provide a generic reliable bulk data transmission capability. Rather, it provides an unreliable channel (a la UDP). That channel is set up with a handshake protocol and DTLS provides relibaility for that protocol. However, that protocol is run infrequently and generally involves relatively small amounts (typically << 10KB) of data being sent. This means that we have rather more latitude in terms of how aggressively we retransmit because it only applies to a small fraction of the traffic. Second, DTLS 1.2 is already widely deployed. It uses a simple "wait for the timer to expire and retransmit everything" approach, with the timer being doubled on each retransmission. This doesn't always provide ideal results, but also has not caused the network to collapse. I don't know much about how things are deployed in the IoT setting (paging Hannes Tschofenig) but at least in the WebRTC context, we have found the 1000ms guidance to be unduly long (as a practical matter, video conferencing just won't work with delays over 100-200ms). Firefox uses 50ms and AIUI Chrome uses a value derived from the ICE handshake (which is probably better because there are certainly times where 50ms is too short). Martin Duke's Comments: > In Sec 5.8.2, it is a significant change from DTLS 1.2 that the > initial timeout is dropping from 1 sec to 100ms, and this is worthy of > some discussion. This violation of RFC8961 ought to be explored > further. For a client first flight of one packet, it seems > unobjectionable. However, I'm less comfortable with a potentially > large server first flight, or a client second flight, likely leading > to a large spurious retransmission. With large flights, not only is a > short timeout more dangerous, but you are more likely to get an ACK in > the event of some loss that allows you to shortcut the timer anyway > (i.e. the cost of long timeout is smaller) You seem to be implicitly assuming that there is individual packet loss rather than burst loss. If the entire flight is lost, you want to just fall back to retransmitting. > Relatedly, in section 5.8.3 there is no specific recommendation for a > maximum flight size at all. I would think that applications SHOULD > have no more than 10 datagrams outstanding unless it has some OOB > evidence of available bandwidth on the channel, in keeping with de > facto transport best practice. I agree that this is a reasonable change. > Finally, I am somewhat concerned that the lack of any window reduction > might perform poorly in constrained environments. I'm skeptical that this is actually the case. As a practical matter, TLS flights rarely exceed 5 packets. For instance, Fastly's data on QUIC [0] indicates that the server's first flight (the biggest flight in the TLS 1.3 handshake) is less than 5 packets for the vast majority of handshakes, even without certificate compression. Given that constrained environments have more incentive to reduce bandwidth, I would expect them to typically be smaller, either via using smaller certificates or using some of the existing techniques for reducing handshake size such as cert compression or cached info. > Granted, doubling > the timeout will reduce the rate, but when retransmission is > ack-driven there is essentially no reduction of sending rate in > response to loss. I don't believe this is correct. Recall that unlike TCP, there's generally no buffer of queued packets waiting to be transmitted. Rather, there is a fixed flight of data which must be delivered. With one exceptional case [1], an ACK will reflect that some but not all of the data was delivered and processed; when retransmitting, the sender will only retransmit the un-ACKed packets, which naturally reduces the sending rate. Given the quite small flights in play here, that reduction is likely to be quite substantial. For instance, if there are three packets and 1 is ACKed, then there will be a reduction of 1/3. > I want to emphasize that I am not looking to fully recreate TCP here; > some bounds on this behavior would likely be satisfactory. > > Here is an example of something that I think would be workable. It is > meant to be a starting point for discussion. I've asked for some input > from the experts in this area who may feel differently. > > - In general, the initial timeout is 100ms. > - The timeout backoff is not reset after successful delivery. > This > allows the "discovery" in bullet 1 to be safely applied to larger > flights. Note that the timeout is actually only reset after successful loss-free delivery of a flight: Implementations SHOULD retain the current timer value until a message is transmitted and acknowledged without having to be retransmitted, at which time the value may be reset to the initial value. There seems to be some confusion here (perhaps due to bad writing). When the text says "resets the retransmission timer" it means "re-arm it with the current value" not "re-set it to the initial default". For instance, suppose that I send flight 1 with retransmit timer value T. After T seconds, I have not received anything and so I retransmit it, doubling to 2T. After I get a response, I now send a new flight. The timer should be 2T, not T. With that said, I think it would be reasonable to re-set to whatever the measured RTT was, rather than the initial default. This would avoid potentially resetting to an overly low default (though it's not clear to me how this could happen because if your RTT estimate is too low you will never get a delivery without retransmission). > - For a first flight of > 2 packets, the sender MUST either (a) set > the initial timeout to 1 second OR (b) retransmit no more than 2 > packets after timeout. > - flights SHOULD be limited to 10 packets > - on timeout or ack-indicated retransmission, no more than half > (minimum one) of the flight should be retransmitted > > The theory here is that it's responsive to RTTs > 100ms, but small > flights can be more aggressive, and large flows are likely to have > ack-driven retransmission. I think it would be useful to distinguish two sets of concerns here: 1. That timeout-driven retransmission is too aggressive due to too-short timers. 2. That ACK-driven retransmission will be too aggressive (presumably due to the ACK indicating congestion-driven loss; if the loss is due to burst errors, then we want to retransmit aggressively). On point (1), I think that the fact that we have extensive deployment of timeout-driven retransmission in the field with short timers is fairly strong evidence that it will not destroy the Internet and more generally that the "retransmit the whole flight" design is safe in this case. I certainly agree that there might be settings in which 100ms is too short. Rather than litigate the timer value, which I agree is a judgement call, I suggest we increase the default somewhat (250? 500) and then indicate that if the application has information that a shorter timer is appropriate, it can use one. As far as point (2) goes, I don't think that any change is indicated here. As I indicated above, there is a finite amount of data to transmit and the design of the ACKs is such that you will continue to make forward progress (and if you're not, you won't be getting ACKs). Given the small fraction of the network traffic that will be DTLS handshakes, the primary risk here seems to be that on a very constrained network, you will get suboptimal performance for your handshake, but even that should resolve in a small number of round trips, especially if the receiver buffers out of order packets (which you obviously want to do in a constrained network). And if you do have random loss rather than congestion loss, backing off will have a very negative impact on the handshake for minimal reduction in packets transmitted [2]. With that said, given that your concern seems to be large flights, I could maybe live with halving the *window* rather than the size of the flight. In your example, you suggest an initial window of 10, so this would give us 10, 5, 3, ... This would have little practical impact on the vast majority of handshakes, but I suppose might slightly improve things on the edge cases where you have a large flight *and* a high congestion network. Mark Allman's comments: > A few specific things (in addition to what Gorry said, which I > absolutely agree with): > > - "Though timer values are the choice of the implementation, > mishandling of the timer can lead to serious congestion > problems" > > + Gorry flagged this and I am flagging it again. If this is > something that can lead to serious problems, let's not just > leave it to "choice of the implementation". Especially if we > have some idea how to make it less problematic. I'm not sure what you'd like here. I think the guidance in this specification is reasonable, so I'd be happy to just remove this text. > - "Implementations SHOULD use an initial timer value of 100 msec > (the minimum defined in RFC 6298 [RFC6298])" > > + I wrote RFC 6298 and I have no idea where this is coming from! > > + Even if this value of 100msec is OK for DTLS it shouldn't lean > on RFC 6298 because RFC 6298 doesn't say that is OK. I.e., > the parenthetical is objectively wrong. > > + RFC 6298 says the INITIAL RTO should be 1sec (point (2.1) in > section 2). RFC 8961 affirms this and also says the INITIAL > RTO should be 1sec (requirement (1) in section 4). Yeah, I'm not sure what happened here. I could go track down the PRs but I'll just plead editorial error. I suggest we just remove the parenthetical because it's not helping here. > - "Note that a 100 msec timer is recommended rather than the > 3-second RFC 6298 default in order to improve latency for > time-sensitive applications." > > + Again, this mis-states RFC 6298, which says the initial RTO is > 1sec (not 3sec). (Previous to RFC 6298 the initial RTO was > 3sec, which is probably where the notion comes from. Most of > the purpose of RFC 6298 was to drop the initial RTO to 1sec.) My bad. I'll fix this. > + This is a statement of desire, not any sort of principled > justification for using 100msec. At the least this should be > much better argued. See my note to Martin Duke above. What's appropriate in a very low volume handshake protocol is different from what's appropriate in a bulk transport protocol. With that said, as I said to Martin, I don't think litigating the precise value is that helpful, so I propose we just increase it to a somewhat larger value and explicitly acknowledge that specific settings may want to use a shorter value. > - "The retransmit timer expires: the implementation transitions to > the SENDING state, where it retransmits the flight, resets the > retransmit timer, and returns to the WAITING state." > > + Maybe this is spec sloppiness, but boy does it sound like the > recipe TCP used before VJCC to collapse the network. I.e., > expire and retransmit the window. Rinse and repeat. It may > be the intention is for backoff to be involved. But, that > isn't what it says. It says it elsewhere, in the section you quoted: a congested link. Implementations SHOULD use an initial timer value of 100 msec (the minimum defined in RFC 6298 {{RFC6298}}) and double the value at each retransmission, up to no less than 60 seconds (the RFC 6298 maximum). As I said to Martin, I think some of the confusion is that this specification uses "reset" to mean both "re-arm" and "set the value back to the initial" and depends on context to clarify that. Obviously that's not been entirely successful, so I propose to use re-arm" where I mean "start a timer with the now current value". As noted above, this piece of the retransmission algorithm is already quite widely deployed (it was in DTLS 1.2) so I think there's a reasonably strong presumption that it is not horribly dangerous, though concededly suboptimal (hence the addition of ACKs in this specification), > - “When they have received part of a flight and do not immediately > receive the rest of the flight (which may be in the same UDP > datagram). A reasonable approach here is to set a timer for 1/4 the > current retransmit timer value when the first record in the flight > is received and then send an ACK when that timer expires.” > > + Where does 1/4 come from? Why is it "reasonable"? This just > feels like a complete WAG that was pulled out of the air. Yes, it was in fact pulled out of the air (though I did discuss it with Ian Swett a bit). To be honest, any value here is going to be somewhat pulled out of the air, especially because during the handshake the retransmit timer values are incredibly imprecise, consisting as they do of (at most) one set of samples. In general, this value is a compromise between ACKing too aggressively (thus causing spurious retransmission of in-flight packets) and ACKing too conservatively (thus causing spurious retransmission of received packets). If you have a different proposal, I'm certainly open to it. FWIW, QUIC's max_ack_delay is 25ms, and that would certainly be fine with me. -Ekr [0] https://www.fastly.com/blog/quic-handshake-tls-compression-certificates-extension-study [1] When SH is lost. [2] In fact, there will be *more* packets transmitted because you now will have ACKs for each chunk of the flight, though of course they will be transmitted over a longer time scale. On Thu, Mar 25, 2021 at 9:51 AM Martin Duke <martin.h.duke@gmail.com> wrote: > Hello all, > > The outcome of the telechat was that I agreed to start a thread on how to > fix the significant transport issues with the DTLS 1.3 draft. If I am > correct, there was no early TCPM or TSVWG review. A major protocol with > significant transport-layer functionality would benefit from such review in > the future. > > *Who is in this thread*: > > For easy reference, here is my DISCUSS, which goes so far as to express a > straw man design that would come closer to addressing the concerns: > https://mailarchive.ietf.org/arch/msg/tls/3g20CQkKWPGX-BAqfuEagR2ppGY/ > > Besides TLSWG, I've added Lars (RFC8085 > <https://datatracker.ietf.org/doc/rfc8085/>), Mark Allman (RFC8961 > <https://datatracker.ietf.org/doc/rfc8961/>), and Gorry Fairhurst (also > RFC8085). Mark and Gorry have already sent me private comments that I > invite them to resend here. To summarize briefly, they amplified my > DISCUSS, made the new point that 8085 is directly relevant here, and are > concerned there aren't enough MUSTs > > If people think there would be value in advertising this thread to the > TCPM and TSVWG working groups, I can do so, at the risk of introducing more > ancillary document churn. > > *Suggested plan:* > > Anyway, as a first step perhaps we can have Mark, Gorry, and Lars add > anything they'd like and then invite the draft authors to either make a > proposal or push back. If there are non-kosher things that DTLS 1.2 has > done with no observable problems, that would be an interesting data point: > within limits, introducing a latency regression into DTLS 1.3 would be > perverse. > > DTLS is a very important protocol and it is worth the time to get these > things right. > > Thanks, > Martin Duke > Transport AD >
- [TLS] Transport Issues in DTLS 1.3 Martin Duke
- Re: [TLS] Transport Issues in DTLS 1.3 Gorry Fairhurst
- Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla
- Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla
- Re: [TLS] Transport Issues in DTLS 1.3 Martin Duke
- Re: [TLS] Transport Issues in DTLS 1.3 Hannes Tschofenig
- Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
- Re: [TLS] Transport Issues in DTLS 1.3 Martin Duke
- Re: [TLS] Transport Issues in DTLS 1.3 Bill Frantz
- Re: [TLS] Transport Issues in DTLS 1.3 Gorry Fairhurst
- Re: [TLS] Transport Issues in DTLS 1.3 Hannes Tschofenig
- Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
- Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
- Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla