From watsonbladd@gmail.com  Sat Apr 27 08:11:46 2024
Return-Path: <watsonbladd@gmail.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id 3CB5AC14F69E
 for <tls@ietfa.amsl.com>; Sat, 27 Apr 2024 08:11:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.094
X-Spam-Level: 
X-Spam-Status: No, score=-7.094 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
 DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5,
 RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001,
 URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
 header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id JZVUzt0FJshm for <tls@ietfa.amsl.com>;
 Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com
 [IPv6:2a00:1450:4864:20::42d])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id 8D737C14F5F5
 for <tls@ietf.org>; Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id
 ffacd0b85a97d-343d2b20c4bso2190099f8f.2
 for <tls@ietf.org>; Sat, 27 Apr 2024 08:11:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1714230700; x=1714835500; darn=ietf.org;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=DIF/GBdDc+wuR0I6h1zV+aPhwxtkA8C8NdWAY4fHbkg=;
 b=Jr6+Y2xL9hkesv1uhwPw1AVfijbZu9vcKIfkuPNcimxXRiko6Vvoy/Ej8PeObrrpi3
 Nhcf1bB/K1ff2HkRtq9Z1mbSTXm4BjVb6oKVLyIyqE2GnoqhPVsb3VBFgzzrordH3AWN
 sb+zw9nwKYaKDtzlWx7sYmyBbSGLJQaVD1lorFbdUWa3bU4oEXd83FsBS62UJVwLjHpu
 WRBcvmdD7gzc4LleF0o04m1jb/C45FXocpYfz0rZLa/CkspSeNdZUiF/iU7Kz8RrBcWk
 z3d68mq1BRKq1io4Kac+u3l7JsTo9VVWnSq6MHxSvOKetCTI8Qqnt9gsEstsOyvpfmnQ
 z/6Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1714230700; x=1714835500;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=DIF/GBdDc+wuR0I6h1zV+aPhwxtkA8C8NdWAY4fHbkg=;
 b=wSwogdmqaZAU5XyT8jmusJol3Lsn7R9P0IedtiExO2MkpJoleLBLqCbZd++VbP+1CX
 8wTOMHoEUzi/840OzCefnakKBkRaov72QjWKmkxVQ+DW/rJF1nTPUPWkrlLE2ie877j9
 45R3BaZPyd9Sl1bvDMiANDgpoKJbzhM7wQZ8KE+qULutwltoIGrf2wTvvDQwmvoOLFog
 LapfbkU88lVB+HPxWotxAZasFjeqr1EcQCachmiIoFuEnp150xZvmS3o0zMfS8ygixUI
 /Z6ZSVTobwKLrZcsLxI8ag+XRjqMy1Yo6C9g4Ax71XRIfeR8oMBGjTitVWHYjm1tcrLn
 7LZg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXR6/OiqsjzQVCM1Fs3V/5WsvCRWTHthqLKZXZ+whB8XnOdA0WdnT76udWRVh1gzgIg10Xn0AhYKbPAjrE=
X-Gm-Message-State: AOJu0YwDX3b6PzMsafoQ9CKWtwpBOl5MEzbSqI9BThazJuv1kiJsU9l1
 ZXlvSCLf3Lb8tIigzCSDUB9ILeezLs4RvBVLVzpC9sy1+q952niin7AjEcrCBargpp7l/NoKw7i
 27w7XxT8b9K2ah3LFdWYxL6ylIoE=
X-Google-Smtp-Source: AGHT+IFdrYGvx04umD7PFOG7Dvge2jT6u/0Qbl/0H5gDcvFvQGKuzBx/VmLWSiKUZprGcZAKm84t4V22pWywIZikEGk=
X-Received: by 2002:a5d:6742:0:b0:34b:4d2e:47d4 with SMTP id
 l2-20020a5d6742000000b0034b4d2e47d4mr3705524wrw.24.1714230699503; Sat, 27 Apr
 2024 08:11:39 -0700 (PDT)
MIME-Version: 1.0
References: <CAF8qwaCAJif0SA+uyZ=vGUZ29bwrFNL2jrS9wTOxjxaA2JLOaw@mail.gmail.com>
 <CAF8qwaDnvOAeQWMCprs=2xqaFFn7saBQg9mAVwXTSo1MGhfWcA@mail.gmail.com>
 <AS8PR10MB7427BA4C60115664B4571C0BEE082@AS8PR10MB7427.EURPRD10.PROD.OUTLOOK.COM>
 <CAF8qwaBdr_Y6nbG+6kTHhOM4KdvKKUGaBn2YLAKNYcRu5b85Lw@mail.gmail.com>
 <CAEGZyHUvwPixW8qmk73Q2S=MqmjiNsOPNrhESg1zu85p0pEbCA@mail.gmail.com>
 <CAF8qwaCiaJgfxvXTRPp0SvpinWjJ5LnhvjNb1EsWsy1RHWCahw@mail.gmail.com>
 <CAF8qwaBF_p-D7sM6g2rkuD0P4MbayXx6ava37X7g-P0U6y5vmg@mail.gmail.com>
In-Reply-To: <CAF8qwaBF_p-D7sM6g2rkuD0P4MbayXx6ava37X7g-P0U6y5vmg@mail.gmail.com>
From: Watson Ladd <watsonbladd@gmail.com>
Date: Sat, 27 Apr 2024 08:11:29 -0700
Message-ID: <CACsn0cke87UCyweqitng0Wvi5AV7_J68MiM11is6P=Vhibt6MA@mail.gmail.com>
To: David Benjamin <davidben@chromium.org>
Cc: Marco Oliverio <marco@wolfssl.com>, Nick Harper <nharper@chromium.org>, 
 "<tls@ietf.org>" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e839890617156f99"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/RifuawWyFGVXybxQN52T2IADly4>
Subject: Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working
 group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>,
 <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>,
 <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Apr 2024 15:11:46 -0000

--000000000000e839890617156f99
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 27, 2024, 8:03=E2=80=AFAM David Benjamin <davidben@chromium.org=
> wrote:

> What should the next steps be here? Is this a bunch of errata, or
> something else?
>

Errata at a minimum but this might be big enough for a small RFC describing
the fix.

>
> On Wed, Apr 17, 2024 at 10:08=E2=80=AFAM David Benjamin <davidben@chromiu=
m.org>
> wrote:
>
>> > Sender implementations should already be able to retransmit messages
>> with older epochs due to the "duplicated" post-auth state machine
>>
>> The nice thing about option 7 is that the older epochs retransmit proble=
m
>> becomes moot in updated senders, I think. If the sender doesn't activate
>> epoch N+1 until KeyUpdate *and prior messages* are ACKed and if KeyUpdat=
e
>> is required to be the last handshake message in epoch N, then the previo=
us
>> epoch is guaranteed to be empty by the time you activate it.
>>
>> On Wed, Apr 17, 2024, 09:27 Marco Oliverio <marco@wolfssl.com> wrote:
>>
>>> Hi David,
>>>
>>> Thanks for pointing this out. I also favor solution 7 as it's the
>>> simpler approach and it doesn't require too much effort to add in curre=
nt
>>> implementations.
>>> Sender implementations should already be able to retransmit messages
>>> with older epochs due to the "duplicated" post-auth state machine.
>>>
>>> Marco
>>>
>>> On Tue, Apr 16, 2024 at 3:48=E2=80=AFPM David Benjamin <davidben@chromi=
um.org>
>>> wrote:
>>>
>>>> Thanks, Hannes!
>>>>
>>>> Since it was buried in there (my understanding of the issue evolved as
>>>> I described it), I currently favor option 7. I.e. the sender-only fix =
to
>>>> the KeyUpdate criteria.
>>>>
>>>> At first I thought we should also change the receiver to mitigate
>>>> unfixed senders, but this situation should be pretty rare (most sender=
s
>>>> will send NewSessionTicket well before they KeyUpdate), DTLS 1.3 isn't=
 very
>>>> widely deployed yet, and ultimately, it's on the sender implementation=
 to
>>>> make sure all states they can get into are coherent.
>>>>
>>>> If the sender crashed, that's unambiguously on the sender to fix. If
>>>> the sender still correctly retransmits the missing messages, the conne=
ction
>>>> will perform suboptimally for a blip but still recover.
>>>>
>>>> David
>>>>
>>>>
>>>> On Tue, Apr 16, 2024, 05:19 Tschofenig, Hannes <
>>>> hannes.tschofenig@siemens.com> wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>>
>>>>>
>>>>> this is great feedback. Give me a few days to respond to this issue
>>>>> with my suggestion for moving forward.
>>>>>
>>>>>
>>>>>
>>>>> Ciao
>>>>>
>>>>> Hannes
>>>>>
>>>>>
>>>>>
>>>>> *From:* TLS <tls-bounces@ietf.org> *On Behalf Of *David Benjamin
>>>>> *Sent:* Saturday, April 13, 2024 7:59 PM
>>>>> *To:* <tls@ietf.org> <tls@ietf.org>
>>>>> *Cc:* Nick Harper <nharper@chromium.org>
>>>>> *Subject:* Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS
>>>>> 1.3
>>>>>
>>>>>
>>>>>
>>>>> Another issues with DTLS 1.3's state machine duplication scheme:
>>>>>
>>>>>
>>>>>
>>>>> Section 8 says implementation must not send new KeyUpdate until the
>>>>> KeyUpdate is ACKed, but it says nothing about other post-handshake
>>>>> messages. Suppose KeyUpdate(5) in flight and the implementation decid=
es to
>>>>> send NewSessionTicket. (E.g. the application called some
>>>>> "send NewSessionTicket" API.) The new epoch doesn't exist yet, so nai=
vely
>>>>> one would start sending NewSessionTicket(6) in the current epoch. Now=
 the
>>>>> peer ACKs KeyUpdate(5), so we transition to the new epoch. But
>>>>> retransmissions must retain their original epoch:
>>>>>
>>>>>
>>>>>
>>>>> > Implementations MUST send retransmissions of lost messages using th=
e
>>>>> same epoch and keying material as the original transmission.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-3
>>>>>
>>>>>
>>>>>
>>>>> This means we must keep sending the NST at the old epoch. But the pee=
r
>>>>> may have no idea there's a message at that epoch due to packet loss!
>>>>> Section 8 does ask the peer to keep the old epoch around for a spell,=
 but
>>>>> eventually the peer will discard the old epoch. If NST(6) didn't get
>>>>> through before then, the entire post-handshake stream is now wedged!
>>>>>
>>>>>
>>>>>
>>>>> I think this means we need to amend Section 8 to forbid sending *any*
>>>>> post-handshake message after KeyUpdate. That is, rather than saying y=
ou
>>>>> cannot send a new KeyUpdate, a KeyUpdate terminates the post-handshak=
e
>>>>> stream at that epoch and all new post-handshake messages, be they Key=
Update
>>>>> or anything else, must be enqueued for the new epoch. This is a littl=
e
>>>>> unfortunate because a TLS library which transparently KeyUpdates will=
 then
>>>>> inadvertently introduce hiccups where post-handshake messages trigger=
ed by
>>>>> the application, like post-handshake auth, are blocked.
>>>>>
>>>>>
>>>>>
>>>>> That then suggests some more options for fixing the original problem.
>>>>>
>>>>>
>>>>>
>>>>> *7. Fix the sender's KeyUpdate criteria*
>>>>>
>>>>>
>>>>>
>>>>> We tell the sender to wait for all previous messages to be ACKed too.
>>>>> Fix the first paragraph of section 8 to say:
>>>>>
>>>>>
>>>>>
>>>>> > As with other handshake messages with no built-in response,
>>>>> KeyUpdates MUST be acknowledged. Acknowledgements are used to both co=
ntrol
>>>>> retransmission and transition to the next epoch. Implementations MUST=
 NOT
>>>>> send records with the new keys until the KeyUpdate *and all preceding
>>>>> messages* have been acknowledged. This facilitates epoch
>>>>> reconstruction (Section 4.2.2) and avoids too many epochs in active u=
se, by
>>>>> ensuring the peer has processed the KeyUpdate and started receiving a=
t the
>>>>> new epoch.
>>>>>
>>>>> >
>>>>>
>>>>> > A KeyUpdate message terminates the post-handshake stream in an
>>>>> epoch. After sending KeyUpdate in an epoch, implementations MUST NOT =
send
>>>>> any new post-handshake messages in that epoch. Note that, if the
>>>>> implementation has sent KeyUpdate but is waiting for an ACK, the next=
 epoch
>>>>> is not yet active. In this case, subsequent post-handshake messages m=
ay not
>>>>> be sent until receiving the ACK.
>>>>>
>>>>>
>>>>>
>>>>> And then on the receiver side, we leave things as-is. If the sender
>>>>> implemented the old semantics AND had multiple post-handshake transac=
tions
>>>>> in parallel, it might update keys too early and then we get into the
>>>>> situation described in (1). We then declare that, if this happens, an=
d the
>>>>> sender gets confused as a result, that's the sender's fault. Hopefull=
y this
>>>>> is not rare enough (did anyone even implement 5.8.4, or does everyone=
 just
>>>>> serialize their post-handshake transitions?) to not be a serious prot=
ocol
>>>>> break? That risk aside, this option seems the most in spirit with the
>>>>> current design to me.
>>>>>
>>>>>
>>>>>
>>>>> *8. Decouple post-handshake retransmissions from epochs*
>>>>>
>>>>>
>>>>>
>>>>> If we instead say that the same epoch rule only applies for the
>>>>> handshake, and not post-handshake messages, I think option 5 (process
>>>>> KeyUpdate out of order) might become viable? I'm not sure. Either way=
, this
>>>>> seems like a significant protocol break, so I don't think this is an =
option
>>>>> until some hypothetical DTLS 1.4.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 12, 2024 at 6:59=E2=80=AFPM David Benjamin <davidben@chro=
mium.org>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>
>>>>>
>>>>> This is going to be a bit long. In short, DTLS 1.3 KeyUpdates seem to
>>>>> conflate the peer *receiving* the KeyUpdate with the peer *processing=
* the
>>>>> KeyUpdate, in ways that appear to break some assumptions made by the
>>>>> protocol design.
>>>>>
>>>>>
>>>>>
>>>>> *When to switch keys in KeyUpdate*
>>>>>
>>>>>
>>>>>
>>>>> So, first, DTLS 1.3, unlike TLS 1.3, applies the KeyUpdate on the ACK=
,
>>>>> not when the KeyUpdate is sent. This makes sense because KeyUpdate re=
cords
>>>>> are not intrinsically ordered with app data records sent after them:
>>>>>
>>>>>
>>>>>
>>>>> > As with other handshake messages with no built-in response,
>>>>> KeyUpdates MUST be acknowledged. In order to facilitate epoch
>>>>> reconstruction (Section 4.2.2), implementations MUST NOT send records=
 with
>>>>> the new keys or send a new KeyUpdate until the previous KeyUpdate has=
 been
>>>>> acknowledged (this avoids having too many epochs in active use).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1
>>>>>
>>>>>
>>>>>
>>>>> Now, the parenthetical says this is to avoid having too many epochs i=
n
>>>>> active use, but it appears that there are stronger assumptions on thi=
s:
>>>>>
>>>>>
>>>>>
>>>>> > After the handshake is complete, if the epoch bits do not match
>>>>> those from the current epoch, implementations SHOULD use the most rec=
ent *
>>>>> *past** epoch which has matching bits, and then reconstruct the
>>>>> sequence number for that epoch as described above.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.2-3
>>>>>
>>>>> (emphasis mine)
>>>>>
>>>>>
>>>>>
>>>>> > After the handshake, implementations MUST use the highest available
>>>>> sending epoch [to send ACKs]
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-7-7
>>>>>
>>>>>
>>>>>
>>>>> These two snippets imply the protocol wants the peer to definitely
>>>>> have installed the new keys before you start using them. This makes s=
ense
>>>>> because sending stuff the peer can't decrypt is pretty silly. As an a=
side,
>>>>> DTLS 1.3 retains this text from DTLS 1.2:
>>>>>
>>>>>
>>>>>
>>>>> > Conversely, it is possible for records that are protected with the
>>>>> new epoch to be received prior to the completion of a handshake. For
>>>>> instance, the server may send its Finished message and then start
>>>>> transmitting data. Implementations MAY either buffer or discard such
>>>>> records, though when DTLS is used over reliable transports (e.g., SCT=
P
>>>>> [RFC4960]), they SHOULD be buffered and processed once the handshake
>>>>> completes.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-2
>>>>>
>>>>>
>>>>> The text from DTLS 1.2 talks about *a* handshake, which presumably
>>>>> refers to rekeying via renegotiation. But in DTLS 1.3, the epoch
>>>>> reconstruction rule and the KeyUpdate rule mean this is only possible
>>>>> during the handshake, when you see epoch 4 and expect epoch 0-3. The =
steady
>>>>> state rekeying mechanism never hits this case. (This is a reasonable =
change
>>>>> because there's no sense in unnecessarily introducing blips where the
>>>>> connection is less tolerant of reordering.)
>>>>>
>>>>>
>>>>>
>>>>> *Buffered handshake messages*
>>>>>
>>>>>
>>>>>
>>>>> Okay, so KeyUpdates want to wait for the recipient to install keys,
>>>>> except we don't seem to actually achieve this! Section 5.2 says:
>>>>>
>>>>>
>>>>>
>>>>> > DTLS implementations maintain (at least notionally) a
>>>>> next_receive_seq counter. This counter is initially set to zero. When=
 a
>>>>> handshake message is received, if its message_seq value matches
>>>>> next_receive_seq, next_receive_seq is incremented and the message is
>>>>> processed. If the sequence number is less than next_receive_seq, the
>>>>> message MUST be discarded. If the sequence number is greater than
>>>>> next_receive_seq, the implementation SHOULD queue the message but MAY
>>>>> discard it. (This is a simple space/bandwidth trade-off).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-7
>>>>>
>>>>>
>>>>>
>>>>> I assume this is intended to apply to post-handshake messages too.
>>>>> (See below for a discussion of the alternative.) But that means that,=
 when
>>>>> you receive a KeyUpdate, you might not immediately process it. Suppos=
e
>>>>> next_receive_seq is 5, and the peer sends NewSessionTicket(5),
>>>>> NewSessionTicket(6), and KeyUpdate(7). 5 is lost, but 6 and 7 come in=
,
>>>>> perhaps even in the same record which means that you're forced to ACK=
 both
>>>>> or neither. But suppose the implementation is willing to buffer 3 mes=
sages
>>>>> ahead, so it ACKs the 6+7 record, by the rules in section 7, which pe=
rmits
>>>>> ACKing fragments that were buffered and not yet processed.
>>>>>
>>>>>
>>>>>
>>>>> That means the peer will switch keys and now all subsequent records
>>>>> from them will come from epoch N+1. But the sender is not ready for N=
+1
>>>>> yet, so we contradict everything above. We also contradict this
>>>>> parenthetical in section 8:
>>>>>
>>>>>
>>>>>
>>>>> > Due to loss and/or reordering, DTLS 1.3 implementations may receive
>>>>> a record with an older epoch than the current one (the requirements a=
bove
>>>>> preclude receiving a newer record).
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-8-2
>>>>>
>>>>>
>>>>>
>>>>> I assume then that this was not actually what was intended.
>>>>>
>>>>>
>>>>>
>>>>> *Options (and non-options)*
>>>>>
>>>>>
>>>>>
>>>>> Assuming I'm reading this right, we seem to have made a mess of
>>>>> things. The sender could avoid this by only allowing one active
>>>>> post-handshake transaction at a time and serializing them, at the cos=
t of
>>>>> taking a round-trip for each. But the receiver needs to account for a=
ll
>>>>> possible senders, so that doesn't help. Some options that come to min=
d:
>>>>>
>>>>>
>>>>>
>>>>> *1. Accept that the sender updates its keys too early*
>>>>>
>>>>>
>>>>>
>>>>> Apart from contradicting most of the specification text, the protocol
>>>>> doesn't *break* per se if you just allow the peer to switch keys
>>>>> early in this buffered KeyUpdate case. We *merely* contradict all of
>>>>> the explanatory text and introduce a bunch of cases that the specific=
ation
>>>>> suggests are impossible. :-) Also the connection quality is poor.
>>>>>
>>>>>
>>>>>
>>>>> The sender will use epoch N+1 at a point when the peer is on N. But
>>>>> epoch reconstruction will misread it as N-3 instead of N+1, and eithe=
r way
>>>>> you won't have the keys to decrypt it yet! The connection is interrup=
ted
>>>>> (and with all packets discarded because epoch reconstruction fails!) =
until
>>>>> the peer retransmits 5 and you catch up. Until then, not only will yo=
u not
>>>>> receive application data, but you also won't receive ACKs. This also =
adds a
>>>>> subtle corner case on the sender side: the sender cannot discard the =
old
>>>>> sending keys because it still has unACKed messages from the previous =
epoch
>>>>> to retransmit, but this is not called out in section 8. Section 8 onl=
y
>>>>> discusses the receiver needing to retain the old epoch.
>>>>>
>>>>>
>>>>> This seems not great. Also it contradicts much of the text in the
>>>>> spec, including section 8 explicitly saying this case cannot happen.
>>>>>
>>>>>
>>>>>
>>>>> *2. Never ACK buffered KeyUpdates*
>>>>>
>>>>>
>>>>>
>>>>> We can say that KeyUpdates are special and, unless you're willing to
>>>>> process them immediately, you must not ACK the records containing the=
m.
>>>>> This means you might under-ACK and the peer might over-retransmit, bu=
t
>>>>> seems not fatal. This also seems a little hairy to implement if you w=
ant to
>>>>> avoid under-ACKing unnecessarily. You might have message
>>>>> NewSessionTicket(6) buffered and then receive a record with
>>>>> NewSessionTicket(5) and KeyUpdate(7). That record may appear unACKabl=
e, but
>>>>> it's fine because you'll immediately process 5 then 6 then 7... unles=
s your
>>>>> NewSessionTicket process is asynchronous, in which case it might not =
be?
>>>>>
>>>>>
>>>>>
>>>>> Despite all that mess, this seems the most viable option?
>>>>>
>>>>>
>>>>>
>>>>> *3. Declare this situation a sender error*
>>>>>
>>>>>
>>>>>
>>>>> We could say this is not allowed and senders MUST NOT send KeyUpdate
>>>>> if there are any outstanding post-handshake messages. And then the re=
ceiver
>>>>> should fail with unexpected_message if it ever receives KeyUpdate at =
a
>>>>> future message_seq. But as the RFC is already published, I don't know=
 if
>>>>> this is compatible with existing implementations.
>>>>>
>>>>>
>>>>>
>>>>> *4. Explicit KeyUpdateAck message*
>>>>>
>>>>>
>>>>>
>>>>> We could have made a KeyUpdateAck message to signal that you've
>>>>> processed a KeyUpdate, not just sent it. But that's a protocol change=
 and
>>>>> the RFC is stamped, so it's too late now.
>>>>>
>>>>>
>>>>>
>>>>> *5. Process KeyUpdate out of order*
>>>>>
>>>>>
>>>>>
>>>>> We could say that the receiver doesn't buffer KeyUpdate. It just goes
>>>>> ahead and processes it immediately to install epoch N+1. This seems l=
ike it
>>>>> would address the issue but opens more cans of worms. Now the receive=
r
>>>>> needs to keep the old epoch around for more than packet reorder, but =
also
>>>>> to pick up the retransmissions of the missing handshake messages. Als=
o, by
>>>>> activating the new epoch, the receiver now allows the sender to KeyUp=
date
>>>>> again, and again, and again. But, several epochs later, the holes in =
the
>>>>> message stream may remain unfilled, so we still need the old keys. Wi=
thout
>>>>> further protocol rules, a sender could force the receiver to keep key=
s
>>>>> arbitrarily many records back. All this is, at best, a difficult case=
 that
>>>>> is unlikely to be well-tested, and at worst get the implementation in=
to
>>>>> some broken state and then misbehave badly.
>>>>>
>>>>>
>>>>>
>>>>> *6. Post-handshake transactions aren't ordered at all*
>>>>>
>>>>>
>>>>>
>>>>> It could be that my assumption above was wrong and the
>>>>> next_receive_seq discussion in 5.2 only applies to the handshake. Aft=
er
>>>>> all, section 5.8.4 discusses how every post-handshake transaction
>>>>> duplicates the "state machine". Except it only says to duplicate the =
5.8.1
>>>>> state machine, and it's unclear ambiguous whether that includes the
>>>>> message_seq logic.
>>>>>
>>>>>
>>>>>
>>>>> However, going this direction seems to very quickly make a mess. If
>>>>> each post-handshake transaction handles message_seq independently, yo=
u
>>>>> cannot distinguish a retransmission from a new transaction. That seem=
s
>>>>> quite bad, so presumably the intent was to use message_seq to disting=
uish
>>>>> those. (I.e. the intent can't have been to duplicate the message_seq
>>>>> state.) Indeed, we have:
>>>>>
>>>>>
>>>>>
>>>>> > However, in DTLS 1.3 the message_seq is not reset, to allow
>>>>> distinguishing a retransmission from a previously sent post-handshake
>>>>> message from a newly sent post-handshake message.
>>>>>
>>>>> https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-6
>>>>>
>>>>>
>>>>>
>>>>> But if we distinguish with message_seq AND process transactions out o=
f
>>>>> order, now receivers need to keep track of fairly complex state in ca=
se
>>>>> they process messages 5, 7, 9, 11, 13, 15, 17, ... but then only get =
the
>>>>> even ones later. And we'd need to define some kind of sliding window =
for
>>>>> what happens if you receive message_seq 9000 all of a sudden. And we =
import
>>>>> all the cross-epoch problems in option 5 above. None of that is in th=
e
>>>>> text, so I assume this was not the intended reading, and I don't thin=
k we
>>>>> want to go that direction. :-)
>>>>>
>>>>>
>>>>> * Digression: ACK fate-sharing and flow control*
>>>>>
>>>>>
>>>>>
>>>>> All this alludes to another quirk that isn't a problem, but is a
>>>>> little non-obvious and warrants some discussion in the spec. Multiple
>>>>> handshake fragments may be packed into the same record, but ACKs appl=
y to
>>>>> the whole record. If you receive a fragment for a message sequence to=
o far
>>>>> into the future, you are permitted to discard the fragment. But if yo=
u
>>>>> discard *any* fragment, you cannot ACK the record, *even if there
>>>>> were fragments which you did process*. During the handshake, an
>>>>> implementation could avoid needing to make this decision by knowing t=
he
>>>>> maximum size of a handshake flight. After the handshake, there is no
>>>>> inherent limit on how many NewSessionTickets the peer may choose to s=
end in
>>>>> a row, and no flow control.
>>>>>
>>>>>
>>>>>
>>>>> QUIC ran into a similar issue here and said an implementation can
>>>>> choose an ad-hoc limit, after which it can choose to either wedge the
>>>>> post-handshake stream or return an error.
>>>>>
>>>>> https://github.com/quicwg/base-drafts/issues/1834
>>>>> https://github.com/quicwg/base-drafts/pull/2524
>>>>>
>>>>>
>>>>>
>>>>> I suspect the most practical outcome for DTLS (and arguably already
>>>>> supported by the existing text, but not very obviously), is to instea=
d say
>>>>> the receiver just refuses to ACK stuff and, okay, maybe in some weird=
 edge
>>>>> cases the receiver under-ACKs and then the sender over-retransmits, u=
ntil
>>>>> things settle down. Whereas ACKs are a bit more tightly integrated wi=
th
>>>>> QUIC, so refusing to ACK a packet due to one bad frame is less of an
>>>>> option. Still, I think this would have been worth calling out in the =
text.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So... did I read all this right? Did we indeed make a mess of this, o=
r
>>>>> did I miss something?
>>>>>
>>>>>
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>> TLS mailing list
>>>> TLS@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/tls
>>>>
>>> _______________________________________________
> TLS mailing list
> TLS@ietf.org
> https://www.ietf.org/mailman/listinfo/tls
>

--000000000000e839890617156f99
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Sat, Apr 27, 2024, 8:03=E2=80=AFAM David Benjamin &=
lt;<a href=3D"mailto:davidben@chromium.org">davidben@chromium.org</a>&gt; w=
rote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">What should =
the next steps be here? Is this a bunch of errata, or something else?</div>=
</blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"auto">Erra=
ta at a minimum but this might be big enough for a small RFC describing the=
 fix.</div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmai=
l_attr">On Wed, Apr 17, 2024 at 10:08=E2=80=AFAM David Benjamin &lt;<a href=
=3D"mailto:davidben@chromium.org" target=3D"_blank" rel=3D"noreferrer">davi=
dben@chromium.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote"=
 style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);p=
adding-left:1ex"><div dir=3D"auto"><div dir=3D"auto">&gt;=C2=A0Sender imple=
mentations should already be able to retransmit messages with older epochs =
due to the &quot;duplicated&quot; post-auth state machine</div><div dir=3D"=
auto"><br></div><div>The nice thing about option 7 is that the older epochs=
 retransmit problem becomes moot in updated senders, I think. If the sender=
 doesn&#39;t activate epoch N+1 until KeyUpdate *and prior messages* are AC=
Ked and if KeyUpdate is required to be the last handshake message in epoch =
N, then the previous epoch is guaranteed to be empty by the time you activa=
te it.</div><div dir=3D"auto"><br><div class=3D"gmail_quote" dir=3D"auto"><=
div dir=3D"ltr" class=3D"gmail_attr">On Wed, Apr 17, 2024, 09:27 Marco Oliv=
erio &lt;<a href=3D"mailto:marco@wolfssl.com" rel=3D"noreferrer noreferrer"=
 target=3D"_blank">marco@wolfssl.com</a>&gt; wrote:<br></div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Hi David,<br><br>=
</div><div>Thanks for pointing this out. I also favor solution 7 as it&#39;=
s the simpler approach and it doesn&#39;t require too much effort to add in=
 current implementations.</div><div>Sender implementations should already b=
e able to retransmit messages with older epochs due to the &quot;duplicated=
&quot; post-auth state machine.</div><div><br></div><div>Marco<br></div></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On =
Tue, Apr 16, 2024 at 3:48=E2=80=AFPM David Benjamin &lt;<a href=3D"mailto:d=
avidben@chromium.org" rel=3D"noreferrer noreferrer noreferrer" target=3D"_b=
lank">davidben@chromium.org</a>&gt; wrote:<br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex"><div dir=3D"auto"><div>Thanks, Hannes!<div dir=
=3D"auto"><br></div><div dir=3D"auto">Since it was buried in there (my unde=
rstanding of the issue evolved as I described it), I currently favor option=
 7. I.e. the sender-only fix to the KeyUpdate criteria.</div><div dir=3D"au=
to"><br></div><div dir=3D"auto">At first I thought we should also change th=
e receiver to mitigate unfixed senders, but this situation should be pretty=
 rare (most senders will send NewSessionTicket well before they KeyUpdate),=
 DTLS 1.3 isn&#39;t very widely deployed yet, and ultimately, it&#39;s on t=
he sender implementation to make sure all states they can get into are cohe=
rent.</div><div dir=3D"auto"><br></div><div dir=3D"auto">If the sender cras=
hed, that&#39;s unambiguously on the sender to fix. If the sender still cor=
rectly retransmits the missing messages, the connection will perform subopt=
imally for a blip but still recover.</div><div dir=3D"auto"><br></div><div =
dir=3D"auto">David</div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr"=
 class=3D"gmail_attr">On Tue, Apr 16, 2024, 05:19 Tschofenig, Hannes &lt;<a=
 href=3D"mailto:hannes.tschofenig@siemens.com" rel=3D"noreferrer noreferrer=
 noreferrer" target=3D"_blank">hannes.tschofenig@siemens.com</a>&gt; wrote:=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">





<div>
<div>
<p class=3D"MsoNormal"><span lang=3D"EN-US">Hi David,<u></u><u></u></span><=
/p>
<p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-US">this is great feedback. Give me=
 a few days to respond to this issue with my suggestion for moving forward.=
<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-US">Ciao<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-US">Hannes<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=C2=A0<u></u></span></p>
<div style=3D"border-width:1pt medium medium;border-style:solid none none;b=
order-color:rgb(225,225,225) currentcolor currentcolor;padding:3pt 0in 0in"=
>
<p class=3D"MsoNormal"><b><span lang=3D"EN-US">From:</span></b><span lang=
=3D"EN-US"> TLS &lt;<a href=3D"mailto:tls-bounces@ietf.org" rel=3D"noreferr=
er noreferrer noreferrer noreferrer" target=3D"_blank">tls-bounces@ietf.org=
</a>&gt;
<b>On Behalf Of </b>David Benjamin<br>
<b>Sent:</b> Saturday, April 13, 2024 7:59 PM<br>
<b>To:</b> &lt;<a href=3D"mailto:tls@ietf.org" rel=3D"noreferrer noreferrer=
 noreferrer noreferrer" target=3D"_blank">tls@ietf.org</a>&gt; &lt;<a href=
=3D"mailto:tls@ietf.org" rel=3D"noreferrer noreferrer noreferrer noreferrer=
" target=3D"_blank">tls@ietf.org</a>&gt;<br>
<b>Cc:</b> Nick Harper &lt;<a href=3D"mailto:nharper@chromium.org" rel=3D"n=
oreferrer noreferrer noreferrer noreferrer" target=3D"_blank">nharper@chrom=
ium.org</a>&gt;<br>
<b>Subject:</b> Re: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.=
3<u></u><u></u></span></p>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<div>
<p class=3D"MsoNormal">Another issues with DTLS 1.3&#39;s state machine dup=
lication scheme:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Section 8 says implementation must not send new KeyU=
pdate until the KeyUpdate is ACKed, but it says nothing about other post-ha=
ndshake messages. Suppose KeyUpdate(5) in flight and the implementation dec=
ides to send NewSessionTicket. (E.g.
 the application called some &quot;send=C2=A0NewSessionTicket&quot; API.) T=
he new epoch doesn&#39;t exist yet, so naively one would start sending NewS=
essionTicket(6) in the current epoch. Now the peer ACKs KeyUpdate(5), so we=
 transition to the new epoch. But retransmissions must
 retain their original epoch:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Implementations MUST send retransmissions of lo=
st messages using the same epoch and keying material as the original transm=
ission.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-4.2.1-3" rel=3D"noreferrer noreferrer noreferrer noreferrer" tar=
get=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-3<=
/a><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">This means we must keep sending the NST at the old e=
poch. But the peer may have no idea there&#39;s a message at that epoch due=
 to packet=C2=A0loss! Section 8 does ask the peer to keep the old epoch aro=
und for a spell, but eventually the peer will
 discard the old epoch. If NST(6) didn&#39;t get through before then, the e=
ntire post-handshake stream is now wedged!<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I think this means we need to amend Section 8 to for=
bid sending *any* post-handshake message after KeyUpdate. That is, rather t=
han saying you cannot send a new KeyUpdate, a KeyUpdate terminates the post=
-handshake stream at that epoch and
 all new post-handshake messages, be they KeyUpdate or anything else, must =
be enqueued for the new epoch. This is a little unfortunate because a TLS l=
ibrary which transparently KeyUpdates will then inadvertently introduce hic=
cups where post-handshake messages
 triggered by the application, like post-handshake auth, are blocked.<u></u=
><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<p class=3D"MsoNormal">That then suggests some more options for fixing the =
original problem.<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>7. Fix the=C2=A0sender&#39;s=C2=A0KeyUpdate crite=
ria</i><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We tell the sender to wait for all previous messages=
 to be ACKed too. Fix the first paragraph of section 8 to say:<u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; As with other handshake messages with no built-=
in response, KeyUpdates MUST be acknowledged. Acknowledgements are used to =
both control retransmission and transition to the next epoch. Implementatio=
ns MUST NOT send records with the new
 keys until the KeyUpdate <b>and all preceding messages</b> have been ackno=
wledged. This=C2=A0facilitates epoch reconstruction (Section 4.2.2) and avo=
ids too many epochs in active use, by ensuring the peer has processed the K=
eyUpdate and started receiving at the
 new epoch.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;<u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; A KeyUpdate message terminates the post-handsha=
ke stream in an epoch. After sending KeyUpdate in an epoch, implementations=
 MUST NOT send any new post-handshake messages in that epoch. Note that, if=
 the implementation has sent KeyUpdate
 but is waiting for an ACK, the next epoch is not yet active. In this case,=
 subsequent post-handshake messages may not be sent until receiving the ACK=
.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">And then on the receiver side, we leave things as-is=
. If the sender implemented the old semantics AND had multiple post-handsha=
ke transactions in parallel, it might update keys too early and then we get=
 into the situation described in (1).
 We then declare that, if this happens, and the sender gets confused as a r=
esult, that&#39;s the sender&#39;s fault. Hopefully this is not rare enough=
 (did anyone even implement 5.8.4, or does everyone just serialize their po=
st-handshake transitions?) to not be a serious
 protocol break? That risk aside, this option seems the most in spirit with=
 the current design to me.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>8. Decouple post-handshake retransmissions from e=
pochs</i><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">If we instead say that the same epoch rule only appl=
ies for the handshake, and not post-handshake messages, I think option 5 (p=
rocess KeyUpdate out of order) might become viable? I&#39;m not sure. Eithe=
r way, this seems like a significant protocol
 break, so I don&#39;t think this is an option until some hypothetical DTLS=
 1.4.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<div>
<p class=3D"MsoNormal">On Fri, Apr 12, 2024 at 6:59=E2=80=AFPM David Benjam=
in &lt;<a href=3D"mailto:davidben@chromium.org" rel=3D"noreferrer noreferre=
r noreferrer noreferrer" target=3D"_blank">davidben@chromium.org</a>&gt; wr=
ote:<u></u><u></u></p>
</div>
<blockquote style=3D"border-width:medium medium medium 1pt;border-style:non=
e none none solid;border-color:currentcolor currentcolor currentcolor rgb(2=
04,204,204);padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">
<div>
<div>
<p class=3D"MsoNormal">Hi all,<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">This is going to be a bit long. In short, DTLS 1.3 K=
eyUpdates seem to conflate the peer
<i>receiving</i>=C2=A0the KeyUpdate with the peer <i>processing</i>=C2=A0th=
e KeyUpdate, in ways that appear to break some assumptions made by the prot=
ocol design.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><b>When to switch keys in KeyUpdate</b><u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">So, first, DTLS 1.3, unlike TLS 1.3, applies the Key=
Update on the ACK, not when the KeyUpdate is sent. This makes sense because=
 KeyUpdate records are not intrinsically ordered with app data records sent=
 after them:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0As with other handshake messages with no b=
uilt-in response, KeyUpdates MUST be acknowledged. In order to facilitate e=
poch reconstruction (Section 4.2.2), implementations MUST NOT send records =
with the new keys or send a new KeyUpdate
 until the previous KeyUpdate has been acknowledged (this avoids having too=
 many epochs in active use).<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-8-1" rel=3D"noreferrer noreferrer noreferrer noreferrer" target=
=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1</a><u><=
/u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Now, the parenthetical says this is to avoid having =
too many epochs in active use, but it appears that there are stronger assum=
ptions on this:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0After the handshake is complete, if the ep=
och bits do not match those from the current epoch, implementations SHOULD =
use the most recent *<b>past*</b> epoch which has matching bits, and then r=
econstruct the sequence number for that epoch
 as described above.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-4.2.2-3" rel=3D"noreferrer noreferrer noreferrer noreferrer" tar=
get=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.2-3<=
/a><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">(emphasis mine)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0After the handshake, implementations MUST =
use the highest available sending epoch [to send ACKs]<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-7-7" rel=3D"noreferrer noreferrer noreferrer noreferrer" target=
=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-7-7</a><u><=
/u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">These two snippets imply the protocol wants the peer=
 to definitely have installed the new keys before you start using them. Thi=
s makes sense because sending stuff the peer can&#39;t decrypt is pretty si=
lly. As an aside, DTLS 1.3 retains this
 text from DTLS 1.2:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0Conversely, it is possible for records tha=
t are protected with the new epoch to be received prior to the completion o=
f a handshake. For instance, the server may send its Finished message and t=
hen start transmitting data. Implementations
 MAY either buffer or discard such records, though when DTLS is used over r=
eliable transports (e.g., SCTP [RFC4960]), they SHOULD be buffered and proc=
essed once the handshake completes.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-4.2.1-2" rel=3D"noreferrer noreferrer noreferrer noreferrer" tar=
get=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-2<=
/a><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><br>
The text from DTLS 1.2 talks about *a* handshake, which presumably refers t=
o rekeying via renegotiation. But in DTLS 1.3, the epoch reconstruction rul=
e and the KeyUpdate rule mean this is only possible during the handshake, w=
hen you see epoch 4 and expect epoch
 0-3. The steady state rekeying mechanism never hits this case. (This is a =
reasonable change because there&#39;s no sense in unnecessarily introducing=
 blips where the connection is less tolerant of reordering.)<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><b>Buffered handshake messages</b><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Okay, so KeyUpdates want to wait for the recipient t=
o install keys, except we don&#39;t seem to actually achieve this! Section =
5.2 says:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0DTLS implementations maintain (at least no=
tionally) a next_receive_seq counter. This counter is initially set to zero=
. When a handshake message is received, if its message_seq value matches ne=
xt_receive_seq, next_receive_seq is incremented
 and the message is processed. If the sequence number is less than next_rec=
eive_seq, the message MUST be discarded. If the sequence number is greater =
than next_receive_seq, the implementation SHOULD queue the message but MAY =
discard it. (This is a simple space/bandwidth
 trade-off).<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-5.2-7" rel=3D"noreferrer noreferrer noreferrer noreferrer" targe=
t=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-7</a><=
u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I assume this is intended to apply to post-handshake=
 messages too. (See below for a discussion of the alternative.) But that me=
ans that, when you receive a KeyUpdate, you might not immediately process i=
t. Suppose next_receive_seq is 5,
 and the peer sends NewSessionTicket(5), NewSessionTicket(6), and KeyUpdate=
(7). 5 is lost, but 6 and 7 come in, perhaps even in the same record which =
means that you&#39;re forced to ACK both or neither. But suppose the implem=
entation is willing to buffer 3 messages
 ahead, so it ACKs the 6+7 record, by the rules in section 7, which permits=
 ACKing fragments that were buffered and not yet processed.<u></u><u></u></=
p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">That means the peer will switch keys and now all sub=
sequent records from them will come from epoch N+1. But the sender is not r=
eady for N+1 yet, so we contradict everything above. We also contradict thi=
s parenthetical in section 8:<u></u><u></u></p>
</div>
<div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0Due to loss and/or reordering, DTLS 1.3 im=
plementations may receive a record with an older epoch than the current one=
 (the requirements above preclude receiving a newer record).<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-8-2" rel=3D"noreferrer noreferrer noreferrer noreferrer" target=
=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-8-2</a><u><=
/u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div>
<div>
<p class=3D"MsoNormal">I assume then that this was not actually what was in=
tended.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><b>Options (and non-options)</b><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Assuming I&#39;m reading this right, we seem to have=
 made a mess of things. The sender could avoid this by only allowing one ac=
tive post-handshake transaction at a time and serializing them, at the cost=
 of taking a round-trip for each. But
 the receiver needs to account for all possible senders, so that doesn&#39;=
t help. Some options that come to mind:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>1. Accept that the sender updates its keys too ea=
rly</i><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Apart from contradicting most of the specification t=
ext, the protocol doesn&#39;t
<i>break</i>=C2=A0per se if you just allow the peer to switch keys early in=
 this buffered KeyUpdate case. We
<i>merely</i>=C2=A0contradict all of the explanatory text and introduce a b=
unch of cases that the specification suggests are impossible. :-) Also the =
connection quality is poor.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">The sender will use epoch N+1 at a point when the pe=
er is on N. But epoch reconstruction will misread it as N-3 instead of N+1,=
 and either way you won&#39;t have the keys to decrypt it yet! The connecti=
on is interrupted (and with all packets
 discarded because epoch reconstruction fails!) until the peer retransmits =
5 and you catch up. Until then, not only will you not receive application d=
ata, but you also won&#39;t receive ACKs. This also adds a subtle corner ca=
se on the sender side: the sender cannot
 discard the old sending keys because it still has unACKed messages from th=
e previous epoch to retransmit, but this is not called out in section 8. Se=
ction 8 only discusses the receiver needing to retain the old epoch.<u></u>=
<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><br>
This seems not great. Also it contradicts much of the text in the spec, inc=
luding section 8 explicitly saying this case cannot happen.<u></u><u></u></=
p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>2. Never ACK buffered KeyUpdates</i><u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We can say that KeyUpdates are special and, unless y=
ou&#39;re willing to process them immediately, you must not ACK the records=
 containing them. This means you might under-ACK and the peer might over-re=
transmit, but seems not fatal. This also
 seems a little hairy to implement if you want to avoid under-ACKing unnece=
ssarily. You might have message NewSessionTicket(6) buffered and then recei=
ve a record with NewSessionTicket(5) and KeyUpdate(7). That record may appe=
ar unACKable, but it&#39;s fine because
 you&#39;ll immediately process 5 then 6 then 7... unless your NewSessionTi=
cket process is asynchronous, in which case it might not be?<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Despite all that mess, this seems the most viable op=
tion?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>3. Declare this situation a sender error</i><u></=
u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We could say this is not allowed and senders MUST NO=
T send KeyUpdate if there are any outstanding post-handshake messages. And =
then the receiver should fail with unexpected_message if it ever receives K=
eyUpdate at a future message_seq.
 But as the RFC is already published, I don&#39;t know if this is compatibl=
e with existing implementations.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>4. Explicit KeyUpdateAck message</i><u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We could have made a KeyUpdateAck message to signal =
that you&#39;ve processed a KeyUpdate, not just sent it. But that&#39;s a p=
rotocol change and the RFC is stamped, so it&#39;s too late now.<u></u><u><=
/u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>5. Process KeyUpdate out of order</i><u></u><u></=
u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We could say that the receiver doesn&#39;t buffer Ke=
yUpdate. It just goes ahead and processes it immediately to install epoch N=
+1. This seems like it would address the issue but opens more cans of worms=
. Now the receiver needs to keep the old
 epoch around for more than packet reorder, but also to pick up the retrans=
missions of the missing handshake messages. Also, by activating the new epo=
ch, the receiver now allows the sender to KeyUpdate again, and again, and a=
gain. But, several epochs later,
 the holes in the message stream may remain unfilled, so we still need the =
old keys. Without further protocol rules, a=C2=A0sender could force the rec=
eiver to keep keys arbitrarily many records back. All this is, at best, a d=
ifficult case that is unlikely to be
 well-tested, and at worst get the implementation into some broken state an=
d then misbehave badly.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><i>6. Post-handshake transactions aren&#39;t ordered=
 at all</i><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">It could be that my assumption above was wrong and t=
he next_receive_seq discussion in 5.2 only applies to the handshake. After =
all, section 5.8.4 discusses how every post-handshake transaction duplicate=
s the &quot;state machine&quot;. Except it only
 says to duplicate the 5.8.1 state machine, and it&#39;s unclear ambiguous =
whether that includes the message_seq logic.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">However, going this direction seems to very quickly =
make=C2=A0a mess. If each post-handshake transaction handles message_seq in=
dependently, you cannot distinguish a retransmission from a new transaction=
. That seems quite bad, so presumably the
 intent was to use message_seq to distinguish those. (I.e. the intent can&#=
39;t have been to duplicate the message_seq state.) Indeed, we have:<u></u>=
<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0However, in DTLS 1.3 the message_seq is no=
t reset, to allow distinguishing a retransmission from a previously sent po=
st-handshake message from a newly sent post-handshake message.<u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://www.rfc-editor.org/rfc/rfc9147.ht=
ml#section-5.2-6" rel=3D"noreferrer noreferrer noreferrer noreferrer" targe=
t=3D"_blank">https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-6</a><=
u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">But if we distinguish with message_seq AND process=
=C2=A0transactions out of order, now receivers need to keep track of fairly=
 complex state in case they process messages 5, 7, 9, 11, 13, 15, 17, ... b=
ut then only get the even ones later. And
 we&#39;d need to define some kind of sliding window for what happens if yo=
u receive message_seq 9000 all of a sudden. And we import all the cross-epo=
ch problems in option 5 above. None of that is in the text, so I assume thi=
s was not the intended reading, and
 I don&#39;t think we want to go that direction. :-)<u></u><u></u></p>
</div>
<div>
<div>
<p class=3D"MsoNormal"><b><br>
Digression: ACK fate-sharing and flow control</b><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">All this alludes to another=C2=A0quirk that isn&#39;=
t a problem, but is a little non-obvious and warrants some discussion in th=
e spec. Multiple handshake fragments may be packed into the same record, bu=
t ACKs apply to the whole record. If you receive
 a fragment for a message sequence too far into the future, you are permitt=
ed to discard the fragment. But if you discard=C2=A0<i>any</i>=C2=A0fragmen=
t, you cannot ACK the record,=C2=A0<i>even if there were fragments which yo=
u did process</i>. During the handshake, an implementation
 could avoid needing to make this decision by knowing the maximum size of a=
 handshake flight. After the handshake, there is no inherent limit on how m=
any NewSessionTickets the peer may choose to send in a row, and no flow con=
trol.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">QUIC ran into a similar issue here and said an imple=
mentation can choose an ad-hoc limit, after which it can choose to either w=
edge the post-handshake stream or return an error.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://github.com/quicwg/base-drafts/iss=
ues/1834" rel=3D"noreferrer noreferrer noreferrer noreferrer" target=3D"_bl=
ank">https://github.com/quicwg/base-drafts/issues/1834</a><br>
<a href=3D"https://github.com/quicwg/base-drafts/pull/2524" rel=3D"noreferr=
er noreferrer noreferrer noreferrer" target=3D"_blank">https://github.com/q=
uicwg/base-drafts/pull/2524</a><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I suspect the most practical outcome for DTLS (and a=
rguably already supported by the existing text, but not very obviously), is=
 to instead say the receiver just refuses to ACK stuff and, okay, maybe in =
some weird edge cases the receiver
 under-ACKs and then the sender over-retransmits, until things settle down.=
 Whereas ACKs are a bit more tightly integrated with QUIC, so refusing to A=
CK a packet due to one bad frame is less of an option. Still, I think this =
would have been worth calling out
 in the text.<u></u><u></u></p>
</div>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">So... did I read all this right? Did we indeed make =
a mess of this, or did I miss something?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">David<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div>
</blockquote>
</div>
</div>
</div>

</blockquote></div></div></div>
_______________________________________________<br>
TLS mailing list<br>
<a href=3D"mailto:TLS@ietf.org" rel=3D"noreferrer noreferrer noreferrer" ta=
rget=3D"_blank">TLS@ietf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/tls" rel=3D"noreferrer nor=
eferrer noreferrer noreferrer" target=3D"_blank">https://www.ietf.org/mailm=
an/listinfo/tls</a><br>
</blockquote></div>
</blockquote></div></div></div>
</blockquote></div>
_______________________________________________<br>
TLS mailing list<br>
<a href=3D"mailto:TLS@ietf.org" target=3D"_blank" rel=3D"noreferrer">TLS@ie=
tf.org</a><br>
<a href=3D"https://www.ietf.org/mailman/listinfo/tls" rel=3D"noreferrer nor=
eferrer" target=3D"_blank">https://www.ietf.org/mailman/listinfo/tls</a><br=
>
</blockquote></div></div></div>

--000000000000e839890617156f99--

