Re: [TLS] Transport Issues in DTLS 1.3

Eric Rescorla <ekr@rtfm.com> Tue, 06 April 2021 00:12 UTC

MIME-Version: 1.0
References: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com> <CABcZeBMS5fUej0q5XhbxM5sMLQwAAyCgyAfbkTORQjvMM+jb7A@mail.gmail.com> <E43A7F98-6AE3-402B-B166-077B6D74B97A@icsi.berkeley.edu> <CAM4esxR+4NWHW6PadAVUsnwMZzE+yw75fdk2m2s3jV7V3inuQw@mail.gmail.com>
In-Reply-To: <CAM4esxR+4NWHW6PadAVUsnwMZzE+yw75fdk2m2s3jV7V3inuQw@mail.gmail.com>
From: Eric Rescorla <ekr@rtfm.com>
Date: Mon, 05 Apr 2021 17:11:37 -0700
Message-ID: <CABcZeBPk1b08tg-KJ_PfC4cGgZtu+yZpHf24PNQ-m7xuWk84gg@mail.gmail.com>
To: Martin Duke <martin.h.duke@gmail.com>
Cc: Mark Allman <mallman@icsi.berkeley.edu>, draft-ietf-tls-dtls13.all@ietf.org, Lars Eggert <lars@eggert.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, "<tls@ietf.org>" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000008d5f7905bf42aa03"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/qK0UXJU40dJG7KIfBf0KWAjbdNY>
Subject: Re: [TLS] Transport Issues in DTLS 1.3
Precedence: list

Thanks for the discussion. I have pushed the following PR to address your
comments:

https://github.com/tlswg/dtls13-spec/pull/226

Here is a summary of the changes:
- Change the default retransmission timer to 1s and
  allow people to do otherwise if they have side knowledge.

- Cap any given flight to 10 records

- Don't re-set the timer to the initial value but to 1.5
  times the measured RTT.

- A bunch more clarity about the reliability algorithms
  and timers (including changing reset to re-arm)

PTAL

-Ekr



On Tue, Mar 30, 2021 at 11:47 AM Martin Duke <martin.h.duke@gmail.com>
wrote:

> Thank you Eric (and Mark).
>
> To reiterate, I believe introducing latency regressions with respect to
> DTLS 1.2 would be bad for the internet. So what's new in the area under
> discussion is (a) lowering the timeout from 1s to 100ms, and (b) the
> introduction of ACKs.
>
> I would characterize ekr's reply as making the following points:
>
> (1) *DTLS practice at Mozilla and elsewhere already uses timeouts << 1
> sec*.
>
> Thanks for this report about the real world. I have no doubt that for
> WebRTC and other use cases, a short timeout is fine. However, DTLS is a
> general-purpose protocol and the standard should be quite conservative
> about the paths this thing is going to run over. Obviously, people are
> going to ignore this requirement when they think they can get an advantage
> no matter what the RFC says.
>
> I see three acceptable ways to proceed:
> (a) stick with 1 second with words saying that given some OOB knowledge
> you can go lower;
> (b) the same, but having an explicit floor of 100ms or 200ms; or
> (c) having a shorter threshold for small flights, as I proposed in my
> DISCUSS
>
> (2) *DTLS 1.2 does full retransmissions on each timeout, and there is no
> window halving.*
>
> This is a good point, but I will note that 1.2 always has an RTO-based
> timeout, so the sending rate is halved because the timeout doubles each
> time. With an ACK, there will be no rate halving, unless the ACK clears
> half the window or more.
>
> That said, Mark doesn't seem to be too concerned about it. The
> constrained-network problem where these bursts are just too large already
> exists in DTLS 1.2 so I'm increasingly persuaded that it's OK to drop this
> issue.
>
> Mark said a lot about RTT measurement in his reply. I gather from the
> draft that there is no such measurement going on, but including it would be
> another way to address some of the backoff issues.
>
> (3) *The applicability of this algorithm is at most a few packets, which
> strictly limits the risk in a way that renders RFC 8085, etc.
> considerations largely irrelevant.*
>
> The strawman in my DISCUSS was that bursts of <= 2 packets could be more
> aggressive; that's a negotiable number, and the de jure TCP 4*MSS initial
> window, for example, is one I can easily be persuaded of. I feel some
> desire to guard against giant post-quantum certificates, or what have you,
> but some sufficiently wide guardrails here will probably have little or no
> short-term real-world impact, and I trust we can reach a mutually agreeable
> number. The largest flights today in DTLS 1.2 seem like a good number that
> addresses my concerns while respecting my no-regressions principle.
>
> Thanks,
> Martin
>
> On Tue, Mar 30, 2021 at 10:48 AM Mark Allman <mallman@icsi.berkeley.edu>
> wrote:
>
>>
>> Hi Ekr!
>>
>> > This means that we have rather more latitude in terms of how
>> > aggressively we retransmit because it only applies to a small
>> > fraction of the traffic.
>>
>> (Strikes me as a bit of a weird formulation.)
>>
>> > Firefox uses 50ms and AIUI Chrome
>> > uses a value derived from the ICE handshake (which is probably
>> > better because there are certainly times where 50ms is too short).
>>
>> Yes- the best thing to do is to use a measured value instead of
>> assuming on static number will always work.  But, you have to get a
>> measurement to do that, so you have to start somewhere.
>>
>> >> Relatedly, in section 5.8.3 there is no specific recommendation for a
>> >> maximum flight size at all. I would think that applications SHOULD
>> >> have no more than 10 datagrams outstanding unless it has some OOB
>> >> evidence of available bandwidth on the channel, in keeping with de
>> >> facto transport best practice.
>> >
>> > I agree that this is a reasonable change.
>>
>> I like this, too.  I think that limits the impact of any sort of
>> badness.
>>
>> >> Granted, doubling the timeout will reduce the rate, but when
>> >> retransmission is ack-driven there is essentially no reduction of
>> >> sending rate in response to loss.
>> >
>> > I don't believe this is correct. Recall that unlike TCP, there's
>> > generally no buffer of queued packets waiting to be transmitted.
>> > Rather, there is a fixed flight of data which must be delivered.
>> > With one exceptional case [1], an ACK will reflect that some but
>> > not all of the data was delivered and processed; when
>> > retransmitting, the sender will only retransmit the un-ACKed
>> > packets, which naturally reduces the sending rate. Given the quite
>> > small flights in play here, that reduction is likely to be quite
>> > substantial. For instance, if there are three packets and 1 is
>> > ACKed, then there will be a reduction of 1/3.
>>
>> I tend to agree with ekr here.  This doesn't tend to worry me
>> greatly.
>>
>> > Note that the timeout is actually only reset after successful loss-free
>> > delivery of a flight:
>> >
>> >    Implementations SHOULD retain the current timer value until a
>> >    message is transmitted and acknowledged without having to
>> >    be retransmitted, at which time the value may be
>> >    reset to the initial value.
>> >
>> > There seems to be some confusion here (perhaps due to bad
>> > writing).  When the text says "resets the retransmission timer" it
>> > means "re-arm it with the current value" not "re-set it to the
>> > initial default". For instance, suppose that I send flight 1 with
>> > retransmit timer value
>> > T. After T seconds, I have not received anything and so I retransmit
>> > it, doubling to 2T. After I get a response, I now send a new
>> > flight. The timer should be 2T, not T.
>>
>> I agree that is how to manage the timer.
>>
>> > With that said, I think it would be reasonable to re-set to whatever
>> > the measured RTT was, rather than the initial default. This would
>> > avoid potentially resetting to an overly low default (though it's
>> > not clear to me how this could happen because if your RTT estimate
>> > is too low you will never get a delivery without retransmission).
>>
>> That's one problem with a too-low initial RTT and a reason why RFCs
>> 8085 & 8961 use a conservative initial.
>>
>> However, I might suggest not setting the timeout to the measured
>> RTT, but to something based on the measured RTT.  The best guidance
>> here (8085 & 8961) is that this value should be based on both the
>> RTT and the variance in the RTT.  With one sample you don't have
>> variance.  TCP handles this by setting the RTO to 3 times the first
>> measured RTT.  That's just old VJCC.  It has always struck me as a
>> bit conservative, but ultimately this is a blip in the TCP context
>> and so I have never thought deeply about it.  But, perhaps if you
>> did something like 1.5 times the measured RTT you'd account for a
>> bit of variance that will no doubt be present.
>>
>> > On point (1), I think that the fact that we have extensive
>> > deployment of timeout-driven retransmission in the field with
>> > short timers is fairly strong evidence that it will not destroy
>> > the Internet and more generally that the "retransmit the whole
>> > flight" design is safe in this case. I certainly agree that there
>> > might be settings in which 100ms is too short. Rather than
>> > litigate the timer value, which I agree is a judgement call, I
>> > suggest we increase the default somewhat (250? 500) and then
>> > indicate that if the application has information that a shorter
>> > timer is appropriate, it can use one.
>>
>> I think that sounds fine.  And, if you could wedge some words about
>> experience into the document that'd seem useful, as well, IMO.
>>
>> > With that said, given that your concern seems to be large flights,
>> > I could maybe live with halving the *window* rather than the size
>> > of the flight. In your example, you suggest an initial window of
>> > 10, so this would give us 10, 5, 3, ... This would have little
>> > practical impact on the vast majority of handshakes, but I suppose
>> > might slightly improve things on the edge cases where you have a
>> > large flight *and* a high congestion network.
>>
>> I dunno ... I'd be interested in Martin's thought here.  But, at
>> these levels I am just not sure if the complexity of tracking a
>> flight size is really worth it.
>>
>> >>   - "Though timer values are the choice of the implementation,
>> >>     mishandling of the timer can lead to serious congestion
>> >>     problems"
>> >>
>> >>     + Gorry flagged this and I am flagging it again.  If this is
>> >>       something that can lead to serious problems, let's not just
>> >>       leave it to "choice of the implementation".  Especially if we
>> >>       have some idea how to make it less problematic.
>> >
>> > I'm not sure what you'd like here. I think the guidance in this
>> > specification is reasonable, so I'd be happy to just remove this
>> > text.
>>
>> I don't find the two halves of the sentence consistent with each
>> other and therefore the message seems muddled.
>>
>> Removing is fine.
>>
>> >>   - "The retransmit timer expires: the implementation transitions to
>> >>     the SENDING state, where it retransmits the flight, resets the
>> >>     retransmit timer, and returns to the WAITING state."
>> >>
>> >>     + Maybe this is spec sloppiness, but boy does it sound like the
>> >>       recipe TCP used before VJCC to collapse the network.  I.e.,
>> >>       expire and retransmit the window.  Rinse and repeat.  It may
>> >>       be the intention is for backoff to be involved.  But, that
>> >>       isn't what it says.
>> >
>> > It says it elsewhere, in the section you quoted:
>> >
>> >    a congested link.  Implementations SHOULD use an initial timer value
>> >    of 100 msec (the minimum defined in RFC 6298 {{RFC6298}}) and double
>> >    the value at each retransmission, up to no less than 60 seconds
>> >    (the RFC 6298 maximum).
>> >
>> > As I said to Martin, I think some of the confusion is that this
>> > specification uses "reset" to mean both "re-arm" and "set the
>> > value back to the initial" and depends on context to clarify
>> > that. Obviously that's not been entirely successful, so I propose
>> > to use re-arm" where I mean "start a timer with the now current
>> > value".
>>
>> I agree this is mostly a writing issue.  I would suggest looking for
>> the word "reset" and just using more than one word so it's
>> absolutely clear what you mean.  E.g., something like "double the
>> timeout value and start a new timer" instead of "reset" or "rearm".
>>
>> >>   - “When they have received part of a flight and do not immediately
>> >>     receive the rest of the flight (which may be in the same UDP
>> >>     datagram). A reasonable approach here is to set a timer for 1/4 the
>> >>     current retransmit timer value when the first record in the flight
>> >>     is received and then send an ACK when that timer expires.”
>> >>
>> >>     + Where does 1/4 come from?  Why is it "reasonable"?  This just
>> >>       feels like a complete WAG that was pulled out of the air.
>> >
>> > Yes, it was in fact pulled out of the air (though I did discuss it
>> > with Ian Swett a bit). To be honest, any value here is going to be
>> > somewhat pulled out of the air, especially because during the
>> > handshake the retransmit timer values are incredibly imprecise,
>> > consisting as they do of (at most) one set of samples.  In
>> > general, this value is a compromise between ACKing too
>> > aggressively (thus causing spurious retransmission of in-flight
>> > packets) and ACKing too conservatively (thus causing spurious
>> > retransmission of received packets).
>>
>> Well, perhaps what is needed here is some of the words from your
>> email.  I.e., a bit of an explanation of things instead of simply
>> declaring 1/4 to be reasonable.
>>
>> allman
>>
>

[TLS] Transport Issues in DTLS 1.3 Martin Duke
Re: [TLS] Transport Issues in DTLS 1.3 Gorry Fairhurst
Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla
Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla
Re: [TLS] Transport Issues in DTLS 1.3 Martin Duke
Re: [TLS] Transport Issues in DTLS 1.3 Hannes Tschofenig
Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
Re: [TLS] Transport Issues in DTLS 1.3 Martin Duke
Re: [TLS] Transport Issues in DTLS 1.3 Bill Frantz
Re: [TLS] Transport Issues in DTLS 1.3 Gorry Fairhurst
Re: [TLS] Transport Issues in DTLS 1.3 Hannes Tschofenig
Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
Re: [TLS] Transport Issues in DTLS 1.3 Mark Allman
Re: [TLS] Transport Issues in DTLS 1.3 Eric Rescorla