Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3

Watson Ladd <watsonbladd@gmail.com> Sat, 27 April 2024 15:11 UTC

Return-Path: <watsonbladd@gmail.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3CB5AC14F69E for <tls@ietfa.amsl.com>; Sat, 27 Apr 2024 08:11:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.094
X-Spam-Level:
X-Spam-Status: No, score=-7.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JZVUzt0FJshm for <tls@ietfa.amsl.com>; Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D737C14F5F5 for <tls@ietf.org>; Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id ffacd0b85a97d-343d2b20c4bso2190099f8f.2 for <tls@ietf.org>; Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714230700; x=1714835500; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DIF/GBdDc+wuR0I6h1zV+aPhwxtkA8C8NdWAY4fHbkg=; b=Jr6+Y2xL9hkesv1uhwPw1AVfijbZu9vcKIfkuPNcimxXRiko6Vvoy/Ej8PeObrrpi3 Nhcf1bB/K1ff2HkRtq9Z1mbSTXm4BjVb6oKVLyIyqE2GnoqhPVsb3VBFgzzrordH3AWN sb+zw9nwKYaKDtzlWx7sYmyBbSGLJQaVD1lorFbdUWa3bU4oEXd83FsBS62UJVwLjHpu WRBcvmdD7gzc4LleF0o04m1jb/C45FXocpYfz0rZLa/CkspSeNdZUiF/iU7Kz8RrBcWk z3d68mq1BRKq1io4Kac+u3l7JsTo9VVWnSq6MHxSvOKetCTI8Qqnt9gsEstsOyvpfmnQ z/6Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714230700; x=1714835500; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DIF/GBdDc+wuR0I6h1zV+aPhwxtkA8C8NdWAY4fHbkg=; b=wSwogdmqaZAU5XyT8jmusJol3Lsn7R9P0IedtiExO2MkpJoleLBLqCbZd++VbP+1CX 8wTOMHoEUzi/840OzCefnakKBkRaov72QjWKmkxVQ+DW/rJF1nTPUPWkrlLE2ie877j9 45R3BaZPyd9Sl1bvDMiANDgpoKJbzhM7wQZ8KE+qULutwltoIGrf2wTvvDQwmvoOLFog LapfbkU88lVB+HPxWotxAZasFjeqr1EcQCachmiIoFuEnp150xZvmS3o0zMfS8ygixUI /Z6ZSVTobwKLrZcsLxI8ag+XRjqMy1Yo6C9g4Ax71XRIfeR8oMBGjTitVWHYjm1tcrLn 7LZg==
X-Forwarded-Encrypted: i=1; AJvYcCXR6/OiqsjzQVCM1Fs3V/5WsvCRWTHthqLKZXZ+whB8XnOdA0WdnT76udWRVh1gzgIg10Xn0AhYKbPAjrE=
X-Gm-Message-State: AOJu0YwDX3b6PzMsafoQ9CKWtwpBOl5MEzbSqI9BThazJuv1kiJsU9l1 ZXlvSCLf3Lb8tIigzCSDUB9ILeezLs4RvBVLVzpC9sy1+q952niin7AjEcrCBargpp7l/NoKw7i 27w7XxT8b9K2ah3LFdWYxL6ylIoE=
X-Google-Smtp-Source: AGHT+IFdrYGvx04umD7PFOG7Dvge2jT6u/0Qbl/0H5gDcvFvQGKuzBx/VmLWSiKUZprGcZAKm84t4V22pWywIZikEGk=
X-Received: by 2002:a5d:6742:0:b0:34b:4d2e:47d4 with SMTP id l2-20020a5d6742000000b0034b4d2e47d4mr3705524wrw.24.1714230699503; Sat, 27 Apr 2024 08:11:39 -0700 (PDT)
MIME-Version: 1.0
References: <CAF8qwaCAJif0SA+uyZ=vGUZ29bwrFNL2jrS9wTOxjxaA2JLOaw@mail.gmail.com> <CAF8qwaDnvOAeQWMCprs=2xqaFFn7saBQg9mAVwXTSo1MGhfWcA@mail.gmail.com> <AS8PR10MB7427BA4C60115664B4571C0BEE082@AS8PR10MB7427.EURPRD10.PROD.OUTLOOK.COM> <CAF8qwaBdr_Y6nbG+6kTHhOM4KdvKKUGaBn2YLAKNYcRu5b85Lw@mail.gmail.com> <CAEGZyHUvwPixW8qmk73Q2S=MqmjiNsOPNrhESg1zu85p0pEbCA@mail.gmail.com> <CAF8qwaCiaJgfxvXTRPp0SvpinWjJ5LnhvjNb1EsWsy1RHWCahw@mail.gmail.com> <CAF8qwaBF_p-D7sM6g2rkuD0P4MbayXx6ava37X7g-P0U6y5vmg@mail.gmail.com>
In-Reply-To: <CAF8qwaBF_p-D7sM6g2rkuD0P4MbayXx6ava37X7g-P0U6y5vmg@mail.gmail.com>
From: Watson Ladd <watsonbladd@gmail.com>
Date: Sat, 27 Apr 2024 08:11:29 -0700
Message-ID: <CACsn0cke87UCyweqitng0Wvi5AV7_J68MiM11is6P=Vhibt6MA@mail.gmail.com>
To: David Benjamin <davidben@chromium.org>
Cc: Marco Oliverio <marco@wolfssl.com>, Nick Harper <nharper@chromium.org>, "<tls@ietf.org>" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e839890617156f99"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/RifuawWyFGVXybxQN52T2IADly4>
Subject: Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Apr 2024 15:11:46 -0000

On Sat, Apr 27, 2024, 8:03 AM David Benjamin <davidben@chromium.org> wrote:

> What should the next steps be here? Is this a bunch of errata, or
> something else?
>

Errata at a minimum but this might be big enough for a small RFC describing
the fix.

>
> On Wed, Apr 17, 2024 at 10:08 AM David Benjamin <davidben@chromium.org>
> wrote:
>
>> > Sender implementations should already be able to retransmit messages
>> with older epochs due to the "duplicated" post-auth state machine
>>
>> The nice thing about option 7 is that the older epochs retransmit problem
>> becomes moot in updated senders, I think. If the sender doesn't activate
>> epoch N+1 until KeyUpdate *and prior messages* are ACKed and if KeyUpdate
>> is required to be the last handshake message in epoch N, then the previous
>> epoch is guaranteed to be empty by the time you activate it.
>>
>> On Wed, Apr 17, 2024, 09:27 Marco Oliverio <marco@wolfssl.com> wrote:
>>
>>> Hi David,
>>>
>>> Thanks for pointing this out. I also favor solution 7 as it's the
>>> simpler approach and it doesn't require too much effort to add in current
>>> implementations.
>>> Sender implementations should already be able to retransmit messages
>>> with older epochs due to the "duplicated" post-auth state machine.
>>>
>>> Marco
>>>
>>> On Tue, Apr 16, 2024 at 3:48 PM David Benjamin <davidben@chromium.org>
>>> wrote:
>>>
>>>> Thanks, Hannes!
>>>>
>>>> Since it was buried in there (my understanding of the issue evolved as
>>>> I described it), I currently favor option 7. I.e. the sender-only fix to
>>>> the KeyUpdate criteria.
>>>>
>>>> At first I thought we should also change the receiver to mitigate
>>>> unfixed senders, but this situation should be pretty rare (most senders
>>>> will send NewSessionTicket well before they KeyUpdate), DTLS 1.3 isn't very
>>>> widely deployed yet, and ultimately, it's on the sender implementation to
>>>> make sure all states they can get into are coherent.
>>>>
>>>> If the sender crashed, that's unambiguously on the sender to fix. If
>>>> the sender still correctly retransmits the missing messages, the connection
>>>> will perform suboptimally for a blip but still recover.
>>>>
>>>> David
>>>>
>>>>
>>>> On Tue, Apr 16, 2024, 05:19 Tschofenig, Hannes <
>>>> hannes.tschofenig@siemens.com> wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>>
>>>>>
>>>>> this is great feedback. Give me a few days to respond to this issue
>>>>> with my suggestion for moving forward.
>>>>>
>>>>>
>>>>>
>>>>> Ciao
>>>>>
>>>>> Hannes
>>>>>
>>>>>
>>>>>
>>>>> *From:* TLS <tls-bounces@ietf.org> *On Behalf Of *David Benjamin
>>>>> *Sent:* Saturday, April 13, 2024 7:59 PM
>>>>> *To:* <tls@ietf.org> <tls@ietf.org>
>>>>> *Cc:* Nick Harper <nharper@chromium.org>
>>>>> *Subject:* Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS
>>>>> 1.3
>>>>>
>>>>>
>>>>>
>>>>> Another issues with DTLS 1.3's state machine duplication scheme:
>>>>>
>>>>>
>>>>>
>>>>> Section 8 says implementation must not send new KeyUpdate until the
>>>>> KeyUpdate is ACKed, but it says nothing about other post-handshake
>>>>> messages. Suppose KeyUpdate(5) in flight and the implementation decides to
>>>>> send NewSessionTicket. (E.g. the application called some
>>>>> "send NewSessionTicket" API.) The new epoch doesn't exist yet, so naively
>>>>> one would start sending NewSessionTicket(6) in the current epoch. Now the
>>>>> peer ACKs KeyUpdate(5), so we transition to the new epoch. But
>>>>> retransmissions must retain their original epoch:
>>>>>
>>>>>
>>>>>
>>>>> > Implementations MUST send retransmissions of lost messages using the
>>>>> same epoch and keying material as the original transmission.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-3
>>>>>
>>>>>
>>>>>
>>>>> This means we must keep sending the NST at the old epoch. But the peer
>>>>> may have no idea there's a message at that epoch due to packet loss!
>>>>> Section 8 does ask the peer to keep the old epoch around for a spell, but
>>>>> eventually the peer will discard the old epoch. If NST(6) didn't get
>>>>> through before then, the entire post-handshake stream is now wedged!
>>>>>
>>>>>
>>>>>
>>>>> I think this means we need to amend Section 8 to forbid sending *any*
>>>>> post-handshake message after KeyUpdate. That is, rather than saying you
>>>>> cannot send a new KeyUpdate, a KeyUpdate terminates the post-handshake
>>>>> stream at that epoch and all new post-handshake messages, be they KeyUpdate
>>>>> or anything else, must be enqueued for the new epoch. This is a little
>>>>> unfortunate because a TLS library which transparently KeyUpdates will then
>>>>> inadvertently introduce hiccups where post-handshake messages triggered by
>>>>> the application, like post-handshake auth, are blocked.
>>>>>
>>>>>
>>>>>
>>>>> That then suggests some more options for fixing the original problem.
>>>>>
>>>>>
>>>>>
>>>>> *7. Fix the sender's KeyUpdate criteria*
>>>>>
>>>>>
>>>>>
>>>>> We tell the sender to wait for all previous messages to be ACKed too.
>>>>> Fix the first paragraph of section 8 to say:
>>>>>
>>>>>
>>>>>
>>>>> > As with other handshake messages with no built-in response,
>>>>> KeyUpdates MUST be acknowledged. Acknowledgements are used to both control
>>>>> retransmission and transition to the next epoch. Implementations MUST NOT
>>>>> send records with the new keys until the KeyUpdate *and all preceding
>>>>> messages* have been acknowledged. This facilitates epoch
>>>>> reconstruction (Section 4.2.2) and avoids too many epochs in active use, by
>>>>> ensuring the peer has processed the KeyUpdate and started receiving at the
>>>>> new epoch.
>>>>>
>>>>> >
>>>>>
>>>>> > A KeyUpdate message terminates the post-handshake stream in an
>>>>> epoch. After sending KeyUpdate in an epoch, implementations MUST NOT send
>>>>> any new post-handshake messages in that epoch. Note that, if the
>>>>> implementation has sent KeyUpdate but is waiting for an ACK, the next epoch
>>>>> is not yet active. In this case, subsequent post-handshake messages may not
>>>>> be sent until receiving the ACK.
>>>>>
>>>>>
>>>>>
>>>>> And then on the receiver side, we leave things as-is. If the sender
>>>>> implemented the old semantics AND had multiple post-handshake transactions
>>>>> in parallel, it might update keys too early and then we get into the
>>>>> situation described in (1). We then declare that, if this happens, and the
>>>>> sender gets confused as a result, that's the sender's fault. Hopefully this
>>>>> is not rare enough (did anyone even implement 5.8.4, or does everyone just
>>>>> serialize their post-handshake transitions?) to not be a serious protocol
>>>>> break? That risk aside, this option seems the most in spirit with the
>>>>> current design to me.
>>>>>
>>>>>
>>>>>
>>>>> *8. Decouple post-handshake retransmissions from epochs*
>>>>>
>>>>>
>>>>>
>>>>> If we instead say that the same epoch rule only applies for the
>>>>> handshake, and not post-handshake messages, I think option 5 (process
>>>>> KeyUpdate out of order) might become viable? I'm not sure. Either way, this
>>>>> seems like a significant protocol break, so I don't think this is an option
>>>>> until some hypothetical DTLS 1.4.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 12, 2024 at 6:59 PM David Benjamin <davidben@chromium.org>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>
>>>>>
>>>>> This is going to be a bit long. In short, DTLS 1.3 KeyUpdates seem to
>>>>> conflate the peer *receiving* the KeyUpdate with the peer *processing* the
>>>>> KeyUpdate, in ways that appear to break some assumptions made by the
>>>>> protocol design.
>>>>>
>>>>>
>>>>>
>>>>> *When to switch keys in KeyUpdate*
>>>>>
>>>>>
>>>>>
>>>>> So, first, DTLS 1.3, unlike TLS 1.3, applies the KeyUpdate on the ACK,
>>>>> not when the KeyUpdate is sent. This makes sense because KeyUpdate records
>>>>> are not intrinsically ordered with app data records sent after them:
>>>>>
>>>>>
>>>>>
>>>>> > As with other handshake messages with no built-in response,
>>>>> KeyUpdates MUST be acknowledged. In order to facilitate epoch
>>>>> reconstruction (Section 4.2.2), implementations MUST NOT send records with
>>>>> the new keys or send a new KeyUpdate until the previous KeyUpdate has been
>>>>> acknowledged (this avoids having too many epochs in active use).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1
>>>>>
>>>>>
>>>>>
>>>>> Now, the parenthetical says this is to avoid having too many epochs in
>>>>> active use, but it appears that there are stronger assumptions on this:
>>>>>
>>>>>
>>>>>
>>>>> > After the handshake is complete, if the epoch bits do not match
>>>>> those from the current epoch, implementations SHOULD use the most recent *
>>>>> *past** epoch which has matching bits, and then reconstruct the
>>>>> sequence number for that epoch as described above.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.2-3
>>>>>
>>>>> (emphasis mine)
>>>>>
>>>>>
>>>>>
>>>>> > After the handshake, implementations MUST use the highest available
>>>>> sending epoch [to send ACKs]
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-7-7
>>>>>
>>>>>
>>>>>
>>>>> These two snippets imply the protocol wants the peer to definitely
>>>>> have installed the new keys before you start using them. This makes sense
>>>>> because sending stuff the peer can't decrypt is pretty silly. As an aside,
>>>>> DTLS 1.3 retains this text from DTLS 1.2:
>>>>>
>>>>>
>>>>>
>>>>> > Conversely, it is possible for records that are protected with the
>>>>> new epoch to be received prior to the completion of a handshake. For
>>>>> instance, the server may send its Finished message and then start
>>>>> transmitting data. Implementations MAY either buffer or discard such
>>>>> records, though when DTLS is used over reliable transports (e.g., SCTP
>>>>> [RFC4960]), they SHOULD be buffered and processed once the handshake
>>>>> completes.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-2
>>>>>
>>>>>
>>>>> The text from DTLS 1.2 talks about *a* handshake, which presumably
>>>>> refers to rekeying via renegotiation. But in DTLS 1.3, the epoch
>>>>> reconstruction rule and the KeyUpdate rule mean this is only possible
>>>>> during the handshake, when you see epoch 4 and expect epoch 0-3. The steady
>>>>> state rekeying mechanism never hits this case. (This is a reasonable change
>>>>> because there's no sense in unnecessarily introducing blips where the
>>>>> connection is less tolerant of reordering.)
>>>>>
>>>>>
>>>>>
>>>>> *Buffered handshake messages*
>>>>>
>>>>>
>>>>>
>>>>> Okay, so KeyUpdates want to wait for the recipient to install keys,
>>>>> except we don't seem to actually achieve this! Section 5.2 says:
>>>>>
>>>>>
>>>>>
>>>>> > DTLS implementations maintain (at least notionally) a
>>>>> next_receive_seq counter. This counter is initially set to zero. When a
>>>>> handshake message is received, if its message_seq value matches
>>>>> next_receive_seq, next_receive_seq is incremented and the message is
>>>>> processed. If the sequence number is less than next_receive_seq, the
>>>>> message MUST be discarded. If the sequence number is greater than
>>>>> next_receive_seq, the implementation SHOULD queue the message but MAY
>>>>> discard it. (This is a simple space/bandwidth trade-off).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-7
>>>>>
>>>>>
>>>>>
>>>>> I assume this is intended to apply to post-handshake messages too.
>>>>> (See below for a discussion of the alternative.) But that means that, when
>>>>> you receive a KeyUpdate, you might not immediately process it. Suppose
>>>>> next_receive_seq is 5, and the peer sends NewSessionTicket(5),
>>>>> NewSessionTicket(6), and KeyUpdate(7). 5 is lost, but 6 and 7 come in,
>>>>> perhaps even in the same record which means that you're forced to ACK both
>>>>> or neither. But suppose the implementation is willing to buffer 3 messages
>>>>> ahead, so it ACKs the 6+7 record, by the rules in section 7, which permits
>>>>> ACKing fragments that were buffered and not yet processed.
>>>>>
>>>>>
>>>>>
>>>>> That means the peer will switch keys and now all subsequent records
>>>>> from them will come from epoch N+1. But the sender is not ready for N+1
>>>>> yet, so we contradict everything above. We also contradict this
>>>>> parenthetical in section 8:
>>>>>
>>>>>
>>>>>
>>>>> > Due to loss and/or reordering, DTLS 1.3 implementations may receive
>>>>> a record with an older epoch than the current one (the requirements above
>>>>> preclude receiving a newer record).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-8-2
>>>>>
>>>>>
>>>>>
>>>>> I assume then that this was not actually what was intended.
>>>>>
>>>>>
>>>>>
>>>>> *Options (and non-options)*
>>>>>
>>>>>
>>>>>
>>>>> Assuming I'm reading this right, we seem to have made a mess of
>>>>> things. The sender could avoid this by only allowing one active
>>>>> post-handshake transaction at a time and serializing them, at the cost of
>>>>> taking a round-trip for each. But the receiver needs to account for all
>>>>> possible senders, so that doesn't help. Some options that come to mind:
>>>>>
>>>>>
>>>>>
>>>>> *1. Accept that the sender updates its keys too early*
>>>>>
>>>>>
>>>>>
>>>>> Apart from contradicting most of the specification text, the protocol
>>>>> doesn't *break* per se if you just allow the peer to switch keys
>>>>> early in this buffered KeyUpdate case. We *merely* contradict all of
>>>>> the explanatory text and introduce a bunch of cases that the specification
>>>>> suggests are impossible. :-) Also the connection quality is poor.
>>>>>
>>>>>
>>>>>
>>>>> The sender will use epoch N+1 at a point when the peer is on N. But
>>>>> epoch reconstruction will misread it as N-3 instead of N+1, and either way
>>>>> you won't have the keys to decrypt it yet! The connection is interrupted
>>>>> (and with all packets discarded because epoch reconstruction fails!) until
>>>>> the peer retransmits 5 and you catch up. Until then, not only will you not
>>>>> receive application data, but you also won't receive ACKs. This also adds a
>>>>> subtle corner case on the sender side: the sender cannot discard the old
>>>>> sending keys because it still has unACKed messages from the previous epoch
>>>>> to retransmit, but this is not called out in section 8. Section 8 only
>>>>> discusses the receiver needing to retain the old epoch.
>>>>>
>>>>>
>>>>> This seems not great. Also it contradicts much of the text in the
>>>>> spec, including section 8 explicitly saying this case cannot happen.
>>>>>
>>>>>
>>>>>
>>>>> *2. Never ACK buffered KeyUpdates*
>>>>>
>>>>>
>>>>>
>>>>> We can say that KeyUpdates are special and, unless you're willing to
>>>>> process them immediately, you must not ACK the records containing them.
>>>>> This means you might under-ACK and the peer might over-retransmit, but
>>>>> seems not fatal. This also seems a little hairy to implement if you want to
>>>>> avoid under-ACKing unnecessarily. You might have message
>>>>> NewSessionTicket(6) buffered and then receive a record with
>>>>> NewSessionTicket(5) and KeyUpdate(7). That record may appear unACKable, but
>>>>> it's fine because you'll immediately process 5 then 6 then 7... unless your
>>>>> NewSessionTicket process is asynchronous, in which case it might not be?
>>>>>
>>>>>
>>>>>
>>>>> Despite all that mess, this seems the most viable option?
>>>>>
>>>>>
>>>>>
>>>>> *3. Declare this situation a sender error*
>>>>>
>>>>>
>>>>>
>>>>> We could say this is not allowed and senders MUST NOT send KeyUpdate
>>>>> if there are any outstanding post-handshake messages. And then the receiver
>>>>> should fail with unexpected_message if it ever receives KeyUpdate at a
>>>>> future message_seq. But as the RFC is already published, I don't know if
>>>>> this is compatible with existing implementations.
>>>>>
>>>>>
>>>>>
>>>>> *4. Explicit KeyUpdateAck message*
>>>>>
>>>>>
>>>>>
>>>>> We could have made a KeyUpdateAck message to signal that you've
>>>>> processed a KeyUpdate, not just sent it. But that's a protocol change and
>>>>> the RFC is stamped, so it's too late now.
>>>>>
>>>>>
>>>>>
>>>>> *5. Process KeyUpdate out of order*
>>>>>
>>>>>
>>>>>
>>>>> We could say that the receiver doesn't buffer KeyUpdate. It just goes
>>>>> ahead and processes it immediately to install epoch N+1. This seems like it
>>>>> would address the issue but opens more cans of worms. Now the receiver
>>>>> needs to keep the old epoch around for more than packet reorder, but also
>>>>> to pick up the retransmissions of the missing handshake messages. Also, by
>>>>> activating the new epoch, the receiver now allows the sender to KeyUpdate
>>>>> again, and again, and again. But, several epochs later, the holes in the
>>>>> message stream may remain unfilled, so we still need the old keys. Without
>>>>> further protocol rules, a sender could force the receiver to keep keys
>>>>> arbitrarily many records back. All this is, at best, a difficult case that
>>>>> is unlikely to be well-tested, and at worst get the implementation into
>>>>> some broken state and then misbehave badly.
>>>>>
>>>>>
>>>>>
>>>>> *6. Post-handshake transactions aren't ordered at all*
>>>>>
>>>>>
>>>>>
>>>>> It could be that my assumption above was wrong and the
>>>>> next_receive_seq discussion in 5.2 only applies to the handshake. After
>>>>> all, section 5.8.4 discusses how every post-handshake transaction
>>>>> duplicates the "state machine". Except it only says to duplicate the 5.8.1
>>>>> state machine, and it's unclear ambiguous whether that includes the
>>>>> message_seq logic.
>>>>>
>>>>>
>>>>>
>>>>> However, going this direction seems to very quickly make a mess. If
>>>>> each post-handshake transaction handles message_seq independently, you
>>>>> cannot distinguish a retransmission from a new transaction. That seems
>>>>> quite bad, so presumably the intent was to use message_seq to distinguish
>>>>> those. (I.e. the intent can't have been to duplicate the message_seq
>>>>> state.) Indeed, we have:
>>>>>
>>>>>
>>>>>
>>>>> > However, in DTLS 1.3 the message_seq is not reset, to allow
>>>>> distinguishing a retransmission from a previously sent post-handshake
>>>>> message from a newly sent post-handshake message.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-6
>>>>>
>>>>>
>>>>>
>>>>> But if we distinguish with message_seq AND process transactions out of
>>>>> order, now receivers need to keep track of fairly complex state in case
>>>>> they process messages 5, 7, 9, 11, 13, 15, 17, ... but then only get the
>>>>> even ones later. And we'd need to define some kind of sliding window for
>>>>> what happens if you receive message_seq 9000 all of a sudden. And we import
>>>>> all the cross-epoch problems in option 5 above. None of that is in the
>>>>> text, so I assume this was not the intended reading, and I don't think we
>>>>> want to go that direction. :-)
>>>>>
>>>>>
>>>>> * Digression: ACK fate-sharing and flow control*
>>>>>
>>>>>
>>>>>
>>>>> All this alludes to another quirk that isn't a problem, but is a
>>>>> little non-obvious and warrants some discussion in the spec. Multiple
>>>>> handshake fragments may be packed into the same record, but ACKs apply to
>>>>> the whole record. If you receive a fragment for a message sequence too far
>>>>> into the future, you are permitted to discard the fragment. But if you
>>>>> discard *any* fragment, you cannot ACK the record, *even if there
>>>>> were fragments which you did process*. During the handshake, an
>>>>> implementation could avoid needing to make this decision by knowing the
>>>>> maximum size of a handshake flight. After the handshake, there is no
>>>>> inherent limit on how many NewSessionTickets the peer may choose to send in
>>>>> a row, and no flow control.
>>>>>
>>>>>
>>>>>
>>>>> QUIC ran into a similar issue here and said an implementation can
>>>>> choose an ad-hoc limit, after which it can choose to either wedge the
>>>>> post-handshake stream or return an error.
>>>>>
>>>>> https://github.com/quicwg/base-drafts/issues/1834
>>>>> https://github.com/quicwg/base-drafts/pull/2524
>>>>>
>>>>>
>>>>>
>>>>> I suspect the most practical outcome for DTLS (and arguably already
>>>>> supported by the existing text, but not very obviously), is to instead say
>>>>> the receiver just refuses to ACK stuff and, okay, maybe in some weird edge
>>>>> cases the receiver under-ACKs and then the sender over-retransmits, until
>>>>> things settle down. Whereas ACKs are a bit more tightly integrated with
>>>>> QUIC, so refusing to ACK a packet due to one bad frame is less of an
>>>>> option. Still, I think this would have been worth calling out in the text.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So... did I read all this right? Did we indeed make a mess of this, or
>>>>> did I miss something?
>>>>>
>>>>>
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>> TLS mailing list
>>>> TLS@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/tls
>>>>
>>> _______________________________________________
> TLS mailing list
> TLS@ietf.org
> https://www.ietf.org/mailman/listinfo/tls
>