[TLS] Re: DTLS 1.3 ACK sorting, and when to clear the ACK buffer

Ilari Liusvaara <ilariliusvaara@welho.com> Sun, 03 November 2024 15:49 UTC

Return-Path: <ilariliusvaara@welho.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 82776C19ECB3 for <tls@ietfa.amsl.com>; Sun, 3 Nov 2024 07:49:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id olv05rG0Ycc5 for <tls@ietfa.amsl.com>; Sun, 3 Nov 2024 07:49:48 -0800 (PST)
Received: from welho-filter4.welho.com (welho-filter4b.welho.com [83.102.41.30]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 65967C18DBB9 for <tls@ietf.org>; Sun, 3 Nov 2024 07:49:47 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by welho-filter4.welho.com (Postfix) with ESMTP id 90E9367DB9 for <tls@ietf.org>; Sun, 3 Nov 2024 17:49:44 +0200 (EET)
X-Virus-Scanned: Debian amavisd-new at pp.htv.fi
Received: from welho-smtp2.welho.com ([IPv6:::ffff:83.102.41.85]) by localhost (welho-filter4.welho.com [::ffff:83.102.41.26]) (amavisd-new, port 10024) with ESMTP id j-fCvQ48dgR1 for <tls@ietf.org>; Sun, 3 Nov 2024 17:49:44 +0200 (EET)
Received: from LK-Perkele-VII2 (87-92-153-79.rev.dnainternet.fi [87.92.153.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by welho-smtp2.welho.com (Postfix) with ESMTPSA id DD39A287 for <tls@ietf.org>; Sun, 3 Nov 2024 17:44:13 +0200 (EET)
Date: Sun, 03 Nov 2024 17:44:13 +0200
From: Ilari Liusvaara <ilariliusvaara@welho.com>
To: "<tls@ietf.org>" <tls@ietf.org>
Message-ID: <ZyeaTZctp5mX-ck3@LK-Perkele-VII2.locald>
References: <CAF8qwaCKtjis=h6npApRAtT=AbHtDHO333zAXKPMowH_V5K8NQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <CAF8qwaCKtjis=h6npApRAtT=AbHtDHO333zAXKPMowH_V5K8NQ@mail.gmail.com>
Sender: ilariliusvaara@welho.com
Message-ID-Hash: CXFWV6AF6X6JLX24U6PEZGQVT3B57JWO
X-Message-ID-Hash: CXFWV6AF6X6JLX24U6PEZGQVT3B57JWO
X-MailFrom: ilariliusvaara@welho.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tls.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [TLS] Re: DTLS 1.3 ACK sorting, and when to clear the ACK buffer
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/ey8ddIWDbKnY4F1wKq79babzlkA>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Owner: <mailto:tls-owner@ietf.org>
List-Post: <mailto:tls@ietf.org>
List-Subscribe: <mailto:tls-join@ietf.org>
List-Unsubscribe: <mailto:tls-leave@ietf.org>

On Sun, Nov 03, 2024 at 12:49:59PM +0000, David Benjamin wrote:
> Hi all,
> 
> So, Section 7 says the ACK contains:
> > A list of the records containing handshake messages in the current flight
> which the endpoint has received and either processed or buffered, in
> numerically increasing order.
> https://www.rfc-editor.org/rfc/rfc9147.html#name-ack-message
> 
> First, it is ambiguous what "numerically increasing order" means when there
> are two integers in a packet number, not one. 

I would interpret "numerically increasing order" to mean primarily
sorted by epoch, secondarily by record sequence number.

However, I do not think there are any other flights that should span
epochs besides the ServerHello-ServerFinished one. But I think this is
one of those reasonable-but-not-required-by-spec things.

And that flight seems pretty special in terms of ACKing. For example,
any epoch 2+ ACK implicitly ACKs complete ServerHello message (it is
impossible to enter epoch 2+ without it). And one should be very
careful about epoch 0 (unencrypted) ACKs.

(The DTLS 1.3 spec allows all sorts of stuff that seems pretty
unreasonable, like fragment sequence in a record jumping backwards.)

 
> In particular, it seems a natural implementation will result in receive
> order, not numerical order. Implementations should bound their ACK buffers
> to avoid DoS, and are expected to preferentially ACK more recent records:
> 
> > Implementations MAY acknowledge the records corresponding to each
> transmission of each flight or simply acknowledge the most recent one. In
> general, implementations SHOULD ACK as many received packets as can fit
> into the ACK record, as this provides the most complete information and
> thus reduces the chance of spurious retransmission; if space is limited,
> implementations SHOULD favor including records which have not yet been
> acknowledged.

I think it is important to priorize acking highest record numbers,
because senders should bound outstanding record buffers to avoid DoS,
those entries are required to handle ACK, and there is no backup whole-
message ack (with exception of ServerHello). 

There is subtle edge case where this can cause outright failures:

- Sender that implements only linear tracking.
- Very large flight that gets split into lots of records.
- Some of those records get evicted from ACK buffer before being ACKed.
- Flight has no response, or response is blocked.

In this scenario, no data will get through, and the sender will just
re-transmit the flight forever.

To counter this, if ACK buffer fills with unacknowledged records, one
should immediately send ACK. If the first record in transmission was
received and that ACK makes it through, it will cause forward progress.


> Given that, the natural implementation is some kind of bounded MRU queue of
> records, where old ones fall off the end. (I'm planning to use a ring
> buffer for our implementation.) To get numerical order, you'd need to
> re-sort when sending an ACK. That is not hard, but it's unclear to me
> what's the point.

Above is one case where one wants last records sent (highest RSN), not
last records received.


> Next, the spec's guidance on when to clear the ACK buffer seems odd to me.
> Section 7 also says:
> 
> > During the handshake, ACKs only cover the current outstanding flight
> (this is possible because DTLS is generally a lock-step protocol). In
> particular, receiving a message from a handshake flight implicitly
> acknowledges all messages from the previous flight(s). Accordingly, an ACK
> from the server would not cover both the ClientHello and the client's
> Certificate message, because the ClientHello and client Certificate are in
> different flights. Implementations can accomplish this by clearing their
> ACK list upon receiving the start of the next flight.

One thing to note: While DTLS is usually lock-step protocol, there are
post-handshake messages that are not lock-step.

If flight has a reply, then that reply starting will implicitly ACK the
flight. However, crossing flight may block that response. In that case,
the flight must be ACKed to avoid a deadlock.


> The claim that clearing this ACK list accomplishes this is not true, for
> several reasons:
> 
> First, there's nothing stopping you from receiving a (redundant) portion of
> the previous flight while you're receiving the new one. You'll notice all
> the sequence numbers are old and ignore them when processing, but that
> still keeps the record eligible for an ACK. Moreover, it's still important
> ACK to old fragments. When the old fragment is in the *current* flight, the
> peer may have lost an earlier ACK and not realize they can stop
> retransmitting. It's only old fragments in *previous* flights that are
> unnecessary to ACK, but the specification does not suggest to distinguish
> them. (Distinguishing them would require extra state in the record layer to
> store a low watermark for the flight, and that seems a waste. There's no
> real harm in adding that record to the ACK buffer.)

There is in fact another subtle edge case which requires ACKing stuff
from previous flight:

- Sender sends flight that has no response or response is blocked.
- The complete flight comes through, but the last ACK is lost.
- Sender re-transmits the (possibly partial) flight.

The receiver considers flight complete, but sender does not. Getting
things unstuck requires ACKing stuff from previous flight.

However, this does not require keeping the records from previous flight
in the list.


> If the peer sent flight N-1, you sent N, and now you're in the middle of
> receiving flight N+1, you can stop ACKing flight N-1 as soon as you start
> *sending* N. You don't need to wait to receive N+1. *Every* fragment of N
> implicitly ACKs all of N-1, so as soon as you're ready to send any part of
> N, you may as well send that instead of ACKing individual records because
> then you also make progress in the connection. The spec instead says to
> wait until receiving part of N+1, which seems later than needed and may not
> even exist.

Yes, if one starts transmitting reply flight, one should immedately stop
sending ACKs for the previous flight. The reply flight will implcitly
ACK the complete previous flight.

However, if the reply is blocked (which can happen with non-lockstep
post-handshake messages), one needs to continue transmitting ACKs until
unblocked in order to avoid possible deadlock (the other side might be
blocked as well!).

 
> (Neither version achieves the stated goal in the spec. The stated goal
> seems to require tracking extra state.)
> 
> With all that said, it seems odd to be clearing the ACK buffer at all. I've
> gathered the reason to ACK by record number instead of message ranges (and
> thus require the implementation keep around some state) was so that RTT
> measurements could work despite retransmits. Is that right? But if the
> happy path doesn't ACK most records in the first place, you won't actually
> get an estimate out of it.

Furthermore, RTT estimation does not seem that useful. There are no
RTT estimates available when one would want those the most.




-Ilari