[TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3
David Benjamin <davidben@chromium.org> Fri, 12 April 2024 23:00 UTC
Return-Path: <davidben@google.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A0A64C14F6A9 for <tls@ietfa.amsl.com>; Fri, 12 Apr 2024 16:00:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -16.297
X-Spam-Level:
X-Spam-Status: No, score=-16.297 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.049, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x1a0tP-pLIVN for <tls@ietfa.amsl.com>; Fri, 12 Apr 2024 16:00:17 -0700 (PDT)
Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E3924C14F61B for <tls@ietf.org>; Fri, 12 Apr 2024 16:00:16 -0700 (PDT)
Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-dc742543119so1374575276.0 for <tls@ietf.org>; Fri, 12 Apr 2024 16:00:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1712962816; x=1713567616; darn=ietf.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=Y1/CJUglole0w78RLsoSPAqQ6dEjP2t2FEMl1z2Y2Lc=; b=Q1tjDkaLCkSEWM7Im0Mn7GKhQe3wSnFau5dfpbIaXhmXjRTyUg1KzEnSc7LKp1i7+Q QzMN0FU70nx6nuDmVgQ/LcjxfiyOvQCSY/T1ZE5So35o5vz+PlQHvKcH26BRjDTHBZIP WTwTPfG7Vn/H8D2yXG2yfZyWUx04DAuAUNYEE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712962816; x=1713567616; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Y1/CJUglole0w78RLsoSPAqQ6dEjP2t2FEMl1z2Y2Lc=; b=PoptirM1hBT5Tzaf6iP9AW7bLk1irh646m39j7pq3/bk+RUbw+ZYCOfKXQq3vp3q3l 2Jp/99OHe+XkuMJfiNz2wTsdXhV7ShzV+7VvZjivBLznOsj0cxXFCLlyIoAZ/QNrXb1d WZxxmwe2xNpPvo7pyCelsqImYrW/Vy/mb54Hq64eP7u0e9O/VRGYApfBGHNu38+oNv5u Re/lpa7AeHvrw2edfD7DXXtLdCkTDUc9RDd1JGSrO8YSuI0HuryZGIRaT7hLfCk1U9AR zaIabbJ9eTfinrfMhOvSKuGhVA0AUiOkSPnpZOBR1d+DftPFoDuASzKxnIt1Fne9xXoR ZgIA==
X-Gm-Message-State: AOJu0YxsfANnbnVqRSoj8XkiAJ9kcJhv7H82cFyrcS6abNeFYGB8eb1P 3q66nsd/M3Bjn5ieE5dGGBdoX95qMnG0kuz2wpdaJ0T9/mUTmu7KTHclQZ0BLZ9NOORh8FnN8Ys g5SCKJyhUTBouvaAB3S85YXrzG+5B1cM9+ivBl6TvDJQyG1ai2OU=
X-Google-Smtp-Source: AGHT+IE5s/9Nt+GPXUftYdbg97XsXJiDlnCMyYiD+liA5TPC2MonH8ICFlw5vX3XihfXBY+45Ra5CYzNOWaLfIdjGew=
X-Received: by 2002:a25:fb02:0:b0:dc2:1dd0:1d1b with SMTP id j2-20020a25fb02000000b00dc21dd01d1bmr4109789ybe.19.1712962815288; Fri, 12 Apr 2024 16:00:15 -0700 (PDT)
MIME-Version: 1.0
From: David Benjamin <davidben@chromium.org>
Date: Fri, 12 Apr 2024 18:59:57 -0400
Message-ID: <CAF8qwaCAJif0SA+uyZ=vGUZ29bwrFNL2jrS9wTOxjxaA2JLOaw@mail.gmail.com>
To: "<tls@ietf.org>" <tls@ietf.org>
Cc: bbe@chromium.org, Nick Harper <nharper@chromium.org>
Content-Type: multipart/alternative; boundary="0000000000001f44d10615ee3c9c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/_ku3-YDcroNmG_QKZsYTtqYzC0M>
Subject: [TLS] Issues with buffered, ACKed KeyUpdates in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Apr 2024 23:00:21 -0000
Hi all, This is going to be a bit long. In short, DTLS 1.3 KeyUpdates seem to conflate the peer *receiving* the KeyUpdate with the peer *processing* the KeyUpdate, in ways that appear to break some assumptions made by the protocol design. *When to switch keys in KeyUpdate* So, first, DTLS 1.3, unlike TLS 1.3, applies the KeyUpdate on the ACK, not when the KeyUpdate is sent. This makes sense because KeyUpdate records are not intrinsically ordered with app data records sent after them: > As with other handshake messages with no built-in response, KeyUpdates MUST be acknowledged. In order to facilitate epoch reconstruction (Section 4.2.2), implementations MUST NOT send records with the new keys or send a new KeyUpdate until the previous KeyUpdate has been acknowledged (this avoids having too many epochs in active use). https://www.rfc-editor.org/rfc/rfc9147.html#section-8-1 Now, the parenthetical says this is to avoid having too many epochs in active use, but it appears that there are stronger assumptions on this: > After the handshake is complete, if the epoch bits do not match those from the current epoch, implementations SHOULD use the most recent **past** epoch which has matching bits, and then reconstruct the sequence number for that epoch as described above. https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.2-3 (emphasis mine) > After the handshake, implementations MUST use the highest available sending epoch [to send ACKs] https://www.rfc-editor.org/rfc/rfc9147.html#section-7-7 These two snippets imply the protocol wants the peer to definitely have installed the new keys before you start using them. This makes sense because sending stuff the peer can't decrypt is pretty silly. As an aside, DTLS 1.3 retains this text from DTLS 1.2: > Conversely, it is possible for records that are protected with the new epoch to be received prior to the completion of a handshake. For instance, the server may send its Finished message and then start transmitting data. Implementations MAY either buffer or discard such records, though when DTLS is used over reliable transports (e.g., SCTP [RFC4960]), they SHOULD be buffered and processed once the handshake completes. https://www.rfc-editor.org/rfc/rfc9147.html#section-4.2.1-2 The text from DTLS 1.2 talks about *a* handshake, which presumably refers to rekeying via renegotiation. But in DTLS 1.3, the epoch reconstruction rule and the KeyUpdate rule mean this is only possible during the handshake, when you see epoch 4 and expect epoch 0-3. The steady state rekeying mechanism never hits this case. (This is a reasonable change because there's no sense in unnecessarily introducing blips where the connection is less tolerant of reordering.) *Buffered handshake messages* Okay, so KeyUpdates want to wait for the recipient to install keys, except we don't seem to actually achieve this! Section 5.2 says: > DTLS implementations maintain (at least notionally) a next_receive_seq counter. This counter is initially set to zero. When a handshake message is received, if its message_seq value matches next_receive_seq, next_receive_seq is incremented and the message is processed. If the sequence number is less than next_receive_seq, the message MUST be discarded. If the sequence number is greater than next_receive_seq, the implementation SHOULD queue the message but MAY discard it. (This is a simple space/bandwidth trade-off). https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-7 I assume this is intended to apply to post-handshake messages too. (See below for a discussion of the alternative.) But that means that, when you receive a KeyUpdate, you might not immediately process it. Suppose next_receive_seq is 5, and the peer sends NewSessionTicket(5), NewSessionTicket(6), and KeyUpdate(7). 5 is lost, but 6 and 7 come in, perhaps even in the same record which means that you're forced to ACK both or neither. But suppose the implementation is willing to buffer 3 messages ahead, so it ACKs the 6+7 record, by the rules in section 7, which permits ACKing fragments that were buffered and not yet processed. That means the peer will switch keys and now all subsequent records from them will come from epoch N+1. But the sender is not ready for N+1 yet, so we contradict everything above. We also contradict this parenthetical in section 8: > Due to loss and/or reordering, DTLS 1.3 implementations may receive a record with an older epoch than the current one (the requirements above preclude receiving a newer record). https://www.rfc-editor.org/rfc/rfc9147.html#section-8-2 I assume then that this was not actually what was intended. *Options (and non-options)* Assuming I'm reading this right, we seem to have made a mess of things. The sender could avoid this by only allowing one active post-handshake transaction at a time and serializing them, at the cost of taking a round-trip for each. But the receiver needs to account for all possible senders, so that doesn't help. Some options that come to mind: *1. Accept that the sender updates its keys too early* Apart from contradicting most of the specification text, the protocol doesn't *break* per se if you just allow the peer to switch keys early in this buffered KeyUpdate case. We *merely* contradict all of the explanatory text and introduce a bunch of cases that the specification suggests are impossible. :-) Also the connection quality is poor. The sender will use epoch N+1 at a point when the peer is on N. But epoch reconstruction will misread it as N-3 instead of N+1, and either way you won't have the keys to decrypt it yet! The connection is interrupted (and with all packets discarded because epoch reconstruction fails!) until the peer retransmits 5 and you catch up. Until then, not only will you not receive application data, but you also won't receive ACKs. This also adds a subtle corner case on the sender side: the sender cannot discard the old sending keys because it still has unACKed messages from the previous epoch to retransmit, but this is not called out in section 8. Section 8 only discusses the receiver needing to retain the old epoch. This seems not great. Also it contradicts much of the text in the spec, including section 8 explicitly saying this case cannot happen. *2. Never ACK buffered KeyUpdates* We can say that KeyUpdates are special and, unless you're willing to process them immediately, you must not ACK the records containing them. This means you might under-ACK and the peer might over-retransmit, but seems not fatal. This also seems a little hairy to implement if you want to avoid under-ACKing unnecessarily. You might have message NewSessionTicket(6) buffered and then receive a record with NewSessionTicket(5) and KeyUpdate(7). That record may appear unACKable, but it's fine because you'll immediately process 5 then 6 then 7... unless your NewSessionTicket process is asynchronous, in which case it might not be? Despite all that mess, this seems the most viable option? *3. Declare this situation a sender error* We could say this is not allowed and senders MUST NOT send KeyUpdate if there are any outstanding post-handshake messages. And then the receiver should fail with unexpected_message if it ever receives KeyUpdate at a future message_seq. But as the RFC is already published, I don't know if this is compatible with existing implementations. *4. Explicit KeyUpdateAck message* We could have made a KeyUpdateAck message to signal that you've processed a KeyUpdate, not just sent it. But that's a protocol change and the RFC is stamped, so it's too late now. *5. Process KeyUpdate out of order* We could say that the receiver doesn't buffer KeyUpdate. It just goes ahead and processes it immediately to install epoch N+1. This seems like it would address the issue but opens more cans of worms. Now the receiver needs to keep the old epoch around for more than packet reorder, but also to pick up the retransmissions of the missing handshake messages. Also, by activating the new epoch, the receiver now allows the sender to KeyUpdate again, and again, and again. But, several epochs later, the holes in the message stream may remain unfilled, so we still need the old keys. Without further protocol rules, a sender could force the receiver to keep keys arbitrarily many records back. All this is, at best, a difficult case that is unlikely to be well-tested, and at worst get the implementation into some broken state and then misbehave badly. *6. Post-handshake transactions aren't ordered at all* It could be that my assumption above was wrong and the next_receive_seq discussion in 5.2 only applies to the handshake. After all, section 5.8.4 discusses how every post-handshake transaction duplicates the "state machine". Except it only says to duplicate the 5.8.1 state machine, and it's unclear ambiguous whether that includes the message_seq logic. However, going this direction seems to very quickly make a mess. If each post-handshake transaction handles message_seq independently, you cannot distinguish a retransmission from a new transaction. That seems quite bad, so presumably the intent was to use message_seq to distinguish those. (I.e. the intent can't have been to duplicate the message_seq state.) Indeed, we have: > However, in DTLS 1.3 the message_seq is not reset, to allow distinguishing a retransmission from a previously sent post-handshake message from a newly sent post-handshake message. https://www.rfc-editor.org/rfc/rfc9147.html#section-5.2-6 But if we distinguish with message_seq AND process transactions out of order, now receivers need to keep track of fairly complex state in case they process messages 5, 7, 9, 11, 13, 15, 17, ... but then only get the even ones later. And we'd need to define some kind of sliding window for what happens if you receive message_seq 9000 all of a sudden. And we import all the cross-epoch problems in option 5 above. None of that is in the text, so I assume this was not the intended reading, and I don't think we want to go that direction. :-) *Digression: ACK fate-sharing and flow control* All this alludes to another quirk that isn't a problem, but is a little non-obvious and warrants some discussion in the spec. Multiple handshake fragments may be packed into the same record, but ACKs apply to the whole record. If you receive a fragment for a message sequence too far into the future, you are permitted to discard the fragment. But if you discard *any* fragment, you cannot ACK the record, *even if there were fragments which you did process*. During the handshake, an implementation could avoid needing to make this decision by knowing the maximum size of a handshake flight. After the handshake, there is no inherent limit on how many NewSessionTickets the peer may choose to send in a row, and no flow control. QUIC ran into a similar issue here and said an implementation can choose an ad-hoc limit, after which it can choose to either wedge the post-handshake stream or return an error. https://github.com/quicwg/base-drafts/issues/1834 https://github.com/quicwg/base-drafts/pull/2524 I suspect the most practical outcome for DTLS (and arguably already supported by the existing text, but not very obviously), is to instead say the receiver just refuses to ACK stuff and, okay, maybe in some weird edge cases the receiver under-ACKs and then the sender over-retransmits, until things settle down. Whereas ACKs are a bit more tightly integrated with QUIC, so refusing to ACK a packet due to one bad frame is less of an option. Still, I think this would have been worth calling out in the text. So... did I read all this right? Did we indeed make a mess of this, or did I miss something? David
- [TLS] Issues with buffered, ACKed KeyUpdates in D… David Benjamin
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … David Benjamin
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … Tschofenig, Hannes
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … David Benjamin
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … Marco Oliverio
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … David Benjamin
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … David Benjamin
- Re: [TLS] Issues with buffered, ACKed KeyUpdates … Watson Ladd
- [TLS]Re: Issues with buffered, ACKed KeyUpdates i… David Benjamin