Re: [TLS] Transport Issues in DTLS 1.3

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Wed, 31 March 2021 13:01 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ABE0E3A27C2; Wed, 31 Mar 2021 06:01:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0W7sR54k6R0D; Wed, 31 Mar 2021 06:01:52 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [IPv6:2001:630:42:150::2]) by ietfa.amsl.com (Postfix) with ESMTP id F393E3A27C7; Wed, 31 Mar 2021 06:01:23 -0700 (PDT)
Received: from GF-MBP-2.lan (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id 256F71B001BF; Wed, 31 Mar 2021 14:01:12 +0100 (BST)
To: Martin Duke <martin.h.duke@gmail.com>, Mark Allman <mallman@icsi.berkeley.edu>
Cc: Eric Rescorla <ekr@rtfm.com>, draft-ietf-tls-dtls13.all@ietf.org, Lars Eggert <lars@eggert.org>, tls@ietf.org
References: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com> <CABcZeBMS5fUej0q5XhbxM5sMLQwAAyCgyAfbkTORQjvMM+jb7A@mail.gmail.com> <E43A7F98-6AE3-402B-B166-077B6D74B97A@icsi.berkeley.edu> <CAM4esxR+4NWHW6PadAVUsnwMZzE+yw75fdk2m2s3jV7V3inuQw@mail.gmail.com>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Message-ID: <f441bf26-3acc-8bbc-ac46-7b1c2c686ccb@erg.abdn.ac.uk>
Date: Wed, 31 Mar 2021 14:01:12 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.9.0
MIME-Version: 1.0
In-Reply-To: <CAM4esxR+4NWHW6PadAVUsnwMZzE+yw75fdk2m2s3jV7V3inuQw@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------F5969C837458471CCF0DC10B"
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/1Sr3zwqIdiWil6gNNRewhN7Rv5E>
Subject: Re: [TLS] Transport Issues in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Mar 2021 13:01:57 -0000

On 30/03/2021 19:47, Martin Duke wrote:
> Thank you Eric (and Mark).
>
> To reiterate, I believe introducing latency regressions with respect 
> to DTLS 1.2 would be bad for the internet. So what's new in the area 
> under discussion is (a) lowering the timeout from 1s to 100ms, and (b) 
> the introduction of ACKs.
>
> I would characterize ekr's reply as making the following points:
>
> (1) *DTLS practice at Mozilla and elsewhere already uses timeouts << 1 
> sec*.
>
> Thanks for this report about the real world. I have no doubt that for 
> WebRTC and other use cases, a short timeout is fine. However, DTLS is 
> a general-purpose protocol and the standard should be quite 
> conservative about the paths this thing is going to run over. 
> Obviously, people are going to ignore this requirement when they think 
> they can get an advantage no matter what the RFC says.
>
> I see three acceptable ways to proceed:
> (a) stick with 1 second with words saying that given some OOB 
> knowledge you can go lower;
> (b) the same, but having an explicit floor of 100ms or 200ms; or
> (c) having a shorter threshold for small flights, as I proposed in my 
> DISCUSS
>
> (2) *DTLS 1.2 does full retransmissions on each timeout, and there is 
> no window halving.*
> *
> *
> This is a good point, but I will note that 1.2 always has an RTO-based 
> timeout, so the sending rate is halved because the timeout doubles 
> each time. With an ACK, there will be no rate halving, unless the ACK 
> clears half the window or more.
>
> That said, Mark doesn't seem to be too concerned about it. The 
> constrained-network problem where these bursts are just too large 
> already exists in DTLS 1.2 so I'm increasingly persuaded that it's OK 
> to drop this issue.
>
> Mark said a lot about RTT measurement in his reply. I gather from the 
> draft that there is no such measurement going on, but including it 
> would be another way to address some of the backoff issues.
>
> (3) *The applicability of this algorithm is at most a few packets, 
> which strictly limits the risk in a way that renders RFC 8085, etc. 
> considerations largely irrelevant.*
> *
> *
> The strawman in my DISCUSS was that bursts of <= 2 packets could be 
> more aggressive; that's a negotiable number, and the de jure TCP 4*MSS 
> initial window, for example, is one I can easily be persuaded of. I 
> feel some desire to guard against giant post-quantum certificates, or 
> what have you, but some sufficiently wide guardrails here will 
> probably have little or no short-term real-world impact, and I trust 
> we can reach a mutually agreeable number. The largest flights today in 
> DTLS 1.2 seem like a good number that addresses my concerns while 
> respecting my no-regressions principle.
>
> Thanks,
> Martin
>
> On Tue, Mar 30, 2021 at 10:48 AM Mark Allman 
> <mallman@icsi.berkeley.edu <mailto:mallman@icsi.berkeley.edu>> wrote:
>
>
>     Hi Ekr!
>
>     > This means that we have rather more latitude in terms of how
>     > aggressively we retransmit because it only applies to a small
>     > fraction of the traffic.
>
>     (Strikes me as a bit of a weird formulation.)
>
>     > Firefox uses 50ms and AIUI Chrome
>     > uses a value derived from the ICE handshake (which is probably
>     > better because there are certainly times where 50ms is too short).
>
>     Yes- the best thing to do is to use a measured value instead of
>     assuming on static number will always work.  But, you have to get a
>     measurement to do that, so you have to start somewhere.
>
>     >> Relatedly, in section 5.8.3 there is no specific recommendation
>     for a
>     >> maximum flight size at all. I would think that applications SHOULD
>     >> have no more than 10 datagrams outstanding unless it has some OOB
>     >> evidence of available bandwidth on the channel, in keeping with de
>     >> facto transport best practice.
>     >
>     > I agree that this is a reasonable change.
>
>     I like this, too.  I think that limits the impact of any sort of
>     badness.
>
>     >> Granted, doubling the timeout will reduce the rate, but when
>     >> retransmission is ack-driven there is essentially no reduction of
>     >> sending rate in response to loss.
>     >
>     > I don't believe this is correct. Recall that unlike TCP, there's
>     > generally no buffer of queued packets waiting to be transmitted.
>     > Rather, there is a fixed flight of data which must be delivered.
>     > With one exceptional case [1], an ACK will reflect that some but
>     > not all of the data was delivered and processed; when
>     > retransmitting, the sender will only retransmit the un-ACKed
>     > packets, which naturally reduces the sending rate. Given the quite
>     > small flights in play here, that reduction is likely to be quite
>     > substantial. For instance, if there are three packets and 1 is
>     > ACKed, then there will be a reduction of 1/3.
>
>     I tend to agree with ekr here.  This doesn't tend to worry me
>     greatly.
>
>     > Note that the timeout is actually only reset after successful
>     loss-free
>     > delivery of a flight:
>     >
>     >    Implementations SHOULD retain the current timer value until a
>     >    message is transmitted and acknowledged without having to
>     >    be retransmitted, at which time the value may be
>     >    reset to the initial value.
>     >
>     > There seems to be some confusion here (perhaps due to bad
>     > writing).  When the text says "resets the retransmission timer" it
>     > means "re-arm it with the current value" not "re-set it to the
>     > initial default". For instance, suppose that I send flight 1 with
>     > retransmit timer value
>     > T. After T seconds, I have not received anything and so I retransmit
>     > it, doubling to 2T. After I get a response, I now send a new
>     > flight. The timer should be 2T, not T.
>
>     I agree that is how to manage the timer.
>
>     > With that said, I think it would be reasonable to re-set to whatever
>     > the measured RTT was, rather than the initial default. This would
>     > avoid potentially resetting to an overly low default (though it's
>     > not clear to me how this could happen because if your RTT estimate
>     > is too low you will never get a delivery without retransmission).
>
>     That's one problem with a too-low initial RTT and a reason why RFCs
>     8085 & 8961 use a conservative initial.
>
>     However, I might suggest not setting the timeout to the measured
>     RTT, but to something based on the measured RTT.  The best guidance
>     here (8085 & 8961) is that this value should be based on both the
>     RTT and the variance in the RTT.  With one sample you don't have
>     variance.  TCP handles this by setting the RTO to 3 times the first
>     measured RTT.  That's just old VJCC.  It has always struck me as a
>     bit conservative, but ultimately this is a blip in the TCP context
>     and so I have never thought deeply about it.  But, perhaps if you
>     did something like 1.5 times the measured RTT you'd account for a
>     bit of variance that will no doubt be present.
>
>     > On point (1), I think that the fact that we have extensive
>     > deployment of timeout-driven retransmission in the field with
>     > short timers is fairly strong evidence that it will not destroy
>     > the Internet and more generally that the "retransmit the whole
>     > flight" design is safe in this case. I certainly agree that there
>     > might be settings in which 100ms is too short. Rather than
>     > litigate the timer value, which I agree is a judgement call, I
>     > suggest we increase the default somewhat (250? 500) and then
>     > indicate that if the application has information that a shorter
>     > timer is appropriate, it can use one.
>
>     I think that sounds fine.  And, if you could wedge some words about
>     experience into the document that'd seem useful, as well, IMO.
>
>     > With that said, given that your concern seems to be large flights,
>     > I could maybe live with halving the *window* rather than the size
>     > of the flight. In your example, you suggest an initial window of
>     > 10, so this would give us 10, 5, 3, ... This would have little
>     > practical impact on the vast majority of handshakes, but I suppose
>     > might slightly improve things on the edge cases where you have a
>     > large flight *and* a high congestion network.
>
>     I dunno ... I'd be interested in Martin's thought here. But, at
>     these levels I am just not sure if the complexity of tracking a
>     flight size is really worth it.
>
>     >>   - "Though timer values are the choice of the implementation,
>     >>     mishandling of the timer can lead to serious congestion
>     >>     problems"
>     >>
>     >>     + Gorry flagged this and I am flagging it again.  If this is
>     >>       something that can lead to serious problems, let's not just
>     >>       leave it to "choice of the implementation". Especially if we
>     >>       have some idea how to make it less problematic.
>     >
>     > I'm not sure what you'd like here. I think the guidance in this
>     > specification is reasonable, so I'd be happy to just remove this
>     > text.
>
>     I don't find the two halves of the sentence consistent with each
>     other and therefore the message seems muddled.
>
>     Removing is fine.
>
>     >>   - "The retransmit timer expires: the implementation
>     transitions to
>     >>     the SENDING state, where it retransmits the flight, resets the
>     >>     retransmit timer, and returns to the WAITING state."
>     >>
>     >>     + Maybe this is spec sloppiness, but boy does it sound like the
>     >>       recipe TCP used before VJCC to collapse the network.  I.e.,
>     >>       expire and retransmit the window.  Rinse and repeat.  It may
>     >>       be the intention is for backoff to be involved.  But, that
>     >>       isn't what it says.
>     >
>     > It says it elsewhere, in the section you quoted:
>     >
>     >    a congested link.  Implementations SHOULD use an initial
>     timer value
>     >    of 100 msec (the minimum defined in RFC 6298 {{RFC6298}}) and
>     double
>     >    the value at each retransmission, up to no less than 60 seconds
>     >    (the RFC 6298 maximum).
>     >
>     > As I said to Martin, I think some of the confusion is that this
>     > specification uses "reset" to mean both "re-arm" and "set the
>     > value back to the initial" and depends on context to clarify
>     > that. Obviously that's not been entirely successful, so I propose
>     > to use re-arm" where I mean "start a timer with the now current
>     > value".
>
>     I agree this is mostly a writing issue.  I would suggest looking for
>     the word "reset" and just using more than one word so it's
>     absolutely clear what you mean.  E.g., something like "double the
>     timeout value and start a new timer" instead of "reset" or "rearm".
>
>     >>   - “When they have received part of a flight and do not
>     immediately
>     >>     receive the rest of the flight (which may be in the same UDP
>     >>     datagram). A reasonable approach here is to set a timer for
>     1/4 the
>     >>     current retransmit timer value when the first record in the
>     flight
>     >>     is received and then send an ACK when that timer expires.”
>     >>
>     >>     + Where does 1/4 come from?  Why is it "reasonable"?  This just
>     >>       feels like a complete WAG that was pulled out of the air.
>     >
>     > Yes, it was in fact pulled out of the air (though I did discuss it
>     > with Ian Swett a bit). To be honest, any value here is going to be
>     > somewhat pulled out of the air, especially because during the
>     > handshake the retransmit timer values are incredibly imprecise,
>     > consisting as they do of (at most) one set of samples. In
>     > general, this value is a compromise between ACKing too
>     > aggressively (thus causing spurious retransmission of in-flight
>     > packets) and ACKing too conservatively (thus causing spurious
>     > retransmission of received packets).
>
>     Well, perhaps what is needed here is some of the words from your
>     email.  I.e., a bit of an explanation of things instead of simply
>     declaring 1/4 to be reasonable.
>
>     allman
>
Just for the record, I agree with Mark's comments. That's largely 
because Mark and I talked about this a lot while the two BCPs were 
developed.

The only thing I'd like to add is that, there's a combination of 
"explanations needed", to ensure the reasons why this is OK is 
explained, and a few important but relatively small changes. I do think 
it is important that the explanations exist, applying the same logic to 
another protocol might not have the same outcome; and the appropiate 
additions are included.

Gorry