[TLS] draft-ietf-tls-dtls13-42 responses to feedback

Eric Rescorla <ekr@rtfm.com> Thu, 22 April 2021 17:32 UTC

MIME-Version: 1.0
From: Eric Rescorla <ekr@rtfm.com>
Date: Thu, 22 Apr 2021 10:31:42 -0700
Message-ID: <CABcZeBO2L1gp-hoh983THrXaUwtbOX_nyfhZVZNrHnXPSZWBmQ@mail.gmail.com>
To: "<tls@ietf.org>" <tls@ietf.org>, IESG <iesg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000969a8605c0930f37"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/d1osAkPh_DmVLxAsYF99y5IT0nA>
Subject: [TLS] draft-ietf-tls-dtls13-42 responses to feedback
Precedence: list

I have posted draft-ietf-tls-dtls13-42, addressing the IESG
Feedback. Thanks to everyone who provided reviews. Here is a
description how I handled comments. If there is somebody whose
feedback I missed please let me know.

-Ekr

**** Erik Kline

> [ section 4.4 ]
>
> * "respectively" -> "respectively."
>
> * Could a DTLS implementation packetize to a min-MTU for an IP version
>   and avoid all pMTU issues?  Such a strategy would probably be poor for
>   IPv4 but might be acceptable for IPv6 communications.

Maybe, but I think we probably don't need to say much.


> [ section 4.5.3 ]
>
> * "MUST NOT used" -> "MUST NOT be used"

Fixed.

> [ section 5.8.4 ]
>
> * "NOT have send" -> "NOT send", I think

Fixed.

> [ section 6 ]
>
> * "which are needed to encrypt to decrypt"?

Fixed.


**** Francesca Palombini

> section 2. Conventions and Terminology
>
> FP: Please spell out that network byte order (most significant byte
first) is used throughout the document.

Done.

>
>     Once the client has transmitted the ClientHello message, it expects
>     to see a HelloRetryRequest or a ServerHello from the server.
>     However, if the server's message is lost, the client knows that
>     either the ClientHello or the response from the server has been lost
>     and retransmits. When the server receives the retransmission, it
>     knows to retransmit.
>
> FP: It would be good to mention retransmission max times here.

DTLS actually doesn't have an overall timeout. This is left to
the discretion of the implementation. It does have a maxmimum
backoff, butbackoff isn't mentioned at all.


>              |                |   /+----------------+\
>              | 31 < OCT < 64 -+--> |DTLS Ciphertext |
>              |                |    |(header bits    |
>              |      else      |    | start with 001)|
>              |       |        |   /+-------+--------+\
>
>         The value for the "DTLS-OK" column is "Y". IANA is requested to
>         reserve the content type range 32-63 so that content types in this
>         range are not allocated.
>
> FP: IANA is asked to reserve 32-63, but I could not see any explanation
for that. I would like to see it justified in section 4.1 or in the
respective IANA section.

Done.

>
>
>     fragmentation, clients of the DTLS record layer SHOULD attempt to
>     size records so that they fit within any PMTU estimates obtained from
>     the record layer.
>
> FP: First time PMTU appears, please expand and add reference.

Done.

>     performing PMTU discovery, whether via [RFC1191] or [RFC4821]
>     mechanisms. In particular:
>
> FP: I think this is missing areference to RFC 8201 since IPv6 is
mentioned below.

Done.

>     Any TLS cipher suite that is specified for use with DTLS MUST define
>     limits on the use of the associated AEAD function that preserves
>     margins for both confidentiality and integrity. That is, limits MUST
>     be specified for the number of packets that can be authenticated and
>     for the number of packets that can fail authentication before a key
>     update is required. Providing a reference to any analysis upon which
>     values are based - and any assumptions used in that analysis - allows
>     limits to be adapted to varying usage conditions.
>
> FP: This seems important enough that it should be highlighted for the
experts reviewing the registration. I see that
https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-4
has a number of notes, maybe that would be enough, or maybe add it (as an
update?) to RFC 8447?

Done.

> zero
> length vector (i.e., a single zero byte length field).
>
> FP: I suggest using TLS 1.3 terminology of "zero-length vector (i.e., a
zero-valued single byte length field)"

Done.

>     flow shown in Figure 11 if the client does not send the ACK message
>
> FP: s/11/12

Done.


***** Martin Duke

> COMMENTS:
> Sec 2. It might be useful to introduce the term "epoch" in the glossary,
for those who read this front to back.

Done.


> Sec 4.2.3: "The encrypted sequence number is computed by XORing the
leading bytes of the Mask with the sequence number. Decryption is
accomplished by the same process."
>
> The text is unclear if the XOR is applied to the expanded sequence number
or merely the 1-2 octets on the wire. I presume it's the latter, but this
should be clarified.

Fixed.


> Sec 4.2.3: It's implied here that the sn_key rotates with the epoch. As
this is different from QUIC, it's probably worth spelling out.

Fixed.


> Sec 5.1 is a bit vague about the amplification limit; why not at least
RECOMMEND 3, as we've converged on this elsewhere?

Added.


> Sec 5.1. Reading between the lines, it's clear that the cookie can't be
used as address verification across connections in the way that a NEW_TOKEN
token is. It would be good to spell this out for clients -- use the
resumption token or whatever instead.

Added some text.


> Sec 7.2 "As noted above, the receipt of any record responding to a given
flight MUST be taken as an implicit acknowledgement for the entire flight."
I think this should be s/entire flight/entire previous flight?

Added some text.


> Sec 7.2 "Upon receipt of an ACK that leaves it with only some messages
from a flight having been acknowledged an implementation SHOULD retransmit
the unacknowledged messages or fragments."
>
> This language appears inconsistent with Figure 12, where Record 1 has not
been acknowledged but is also not retransmitted. It appears there is an
implied handling of empty ACKs that isn't written down anywhere in Sec 7.2

This is just a bug in the diagram. Good catch. Fixed.



> Sec 9. Should there be any flow control limits on new_connection_id? Or
should receivers be free to simply drop CIDs they can't handle? It might be
good to specify.

Added some text.


> Finally, a really weird one. Reading this document and references to
connection ID prompted to me to think how QUIC-LB could apply to DTLS. The
result is here: https://github.com/quicwg/load-balancers/pull/106/files.
Please note the rather unfortunate third-to-last paragraph. I'm happy to
take the answer that this use case doesn't matter, since I made it up
today. But if it does, it would be very helpful if (1) DTLS 1.3 clients
MUST include a connection_id extension in their ClientHello, even if zero
length, and/or (2) this draft updated 4.1.4 of 8446 to allow the server to
include connection_id in HelloRetryRequest even if the client didn't offer
it. Thoughts?

Addressed.



> NITS:
> 5.2 s/select(HandshakeType)/select(msg_type). Though with pseudocode your
mileage may vary as to what's clearer.

Agreed.


> 5.7 s/consitute/constitute

Fixed.


> Sec 5.7 In table 1, why include one ACK in the diagram but not the other?
It's clear from the note, but the figure is a weird omission.

I don't think I understand why we did this, so I just removed it.



**** Lars Eggert

Indicated minor changes made.


**** Zaheduzzaman Sarker

> This was very well written document. Thanks for this.
>
> Minor observations below-
>
> * Section 3.1 :
>   - Once the client has transmitted the ClientHello message, it expects
to see a HelloRetryRequest or a ServerHello from the server. However, if
the server's message is lost, the client knows that either the ClientHello
or the response from the server has been lost and retransmits.
>
> is this supposed to mean when the timer expires the client knows either
the ClientHello or the response from the server has been lost? the current
text does not imply that - the server's message is lost is an
interpretation of timer expired event.
>

Fixed.


>   -  The server also maintains a retransmission timer and retransmits
when that timer expires.
>
> The way it is written following the previous paragraph, almost made me
feel that the server is also maintaining a timer for the client hello. It
would be nicer if some text explains the usage of timers at the server to
break the continuous read from previous paragraph.

Fixed.


> * Section 3.3: I would add a reference to section 4.4.

Done.


> * Section 4.5.2: I assume the silent discard of invalid records will not
impact the timers, is that a valid assumption? if yes, then it would be
good if this is clarified in the text.

This is correct, but I don't quite see why one would think it does, as they
don't even get to the point where you it would impact the timer. Anyway,
added some text.



> * Section 5.8.1:
>
>     Because DTLS clients send the first message (ClientHello), they start
in the PREPARING state. DTLS servers start in the WAITING state, but with
empty buffers and no retransmit timer
>
> This is repeated twice in the section, is there any reason for that?

Fixed.


**** John Scudder

COMMENTS:

> Section 3.1:
>
> I found the explanatory text to be confusing. You start with a figure
illustrating a lost HelloRetryRequest. Then you tell me the server
maintains a rexmit timer:
>
>    The server also maintains a retransmission timer and retransmits when
>    that timer expires.
>
> But then you immediately tell me that it actually doesn’t:
>
>    Note that timeout and retransmission do not apply to the
>    HelloRetryRequest since this would require creating state on the
>    server.  The HelloRetryRequest is designed to be small enough that it
>    will not itself be fragmented, thus avoiding concerns about
>    interleaving multiple HelloRetryRequests.
>
> I presume that if I added some more words to this, your intent is that
the server maintains a retransmission timer *for messages other than
HelloRetryRequest*. As written, it gave me some whiplash.

Fixed.


> Section 4.2.1:
>
>    In general,
>    implementations SHOULD discard records from earlier epochs, but if
>    packet loss causes noticeable problems implementations MAY choose to
>    retain keying material from previous epochs for up to the default MSL
>    specified for TCP [RFC0793] to allow for packet reordering.
>
> It seems to me as though “if packet loss causes noticeable problems” is
saying either too much, or not enough. Not enough: problems for whom?
Noticeable by whom? How is this determined? Do you really mean I’m supposed
to work this out dynamically as the text sort-of implies? Too much: if
you’re not going to answer the foregoing, maybe don’t taunt me, and omit
the clause entirely? Or, possibly a less vague rewrite could be in the
nature of “if providing service to an application that is especially
sensitive to packet loss”.

Removed.


> NITS:
>
> Section 2:
>
> “The reader is also as to be familiar” s/as/assumed/

Fixed.

> Section 11:
>
>    Although the cookie must allow the server to produce the right
>    handshake transcript, they
>
> “It” not “they” (agreement in number)

Fixed.

> and
>
>    DTLS with connection IDs allow for endpoint addresses to
>    change during the association;
>
> “allows” not “allow” (agreement in number)

Fixed.



**** Eric Vyncke
>
>
> -- Section 3 --
> s/TLS cannot be used directly in datagram environments/TLS cannot be used
directly over a datagram transport/ ?
>
> Bullet 2) s/to enable reassembly in the correct order/to enable
reordering/ ?

Fixed.


> -- Section 3.1 --
> Should there be a hint to a maximum retry count ?

I'm not sure what we would put here given the diversity of environments,
so we opted not to.



> -- Section 3.3 --
> I understand the motivation (and no need to reply), but, sigh...
implementing frag/reassembly above the transport layer...

Indeed. If it helps, think of DTLS as the transport layer :)


**** Robert Wilton
> 1) Although it is clear from the metadata, it might be helpful if the
> introduction also stated that it obsoletes DTLS 1.2.

Done.

> 2) This document is a set of deltas against TLS 1.3.  Given that it
> talks about the DTLS 1.1/1.2 documents being deltas in the
> introduction, I would have also included that information for this
> document in the introduction rather than in the Terminology and
> Considerations section.  Initially, having read the introduction I had
> assumed that it was not going to be deltas.

Done.


**** Bernard Aboba

> Summary: The timeout and retransmission scheme looks workable for common
cases, but could use some refinement to make it more robust.
>
> Technical Comments
>
> 4.5.2. Handling Invalid Records
>
> Unlike TLS, DTLS is resilient in the face of invalid records (e.g.,
> invalid formatting, length, MAC, etc.). In general, invalid records
> SHOULD be silently discarded, thus preserving the association;
> however, an error MAY be logged for diagnostic purposes.
>
> [BA] How does silent discard of invalid records interact with
retransmission timers?

It doesn't. How could it? But I added some text anyway.


> Implementations which choose to generate an alert instead, MUST
> generate error alerts to avoid attacks where the attacker repeatedly
> probes the implementation to see how it responds to various types of
> error. Note that if DTLS is run over UDP, then any implementation
> which does this will be extremely susceptible to denial-of-service
> (DoS) attacks because UDP forgery is so easy. Thus, this practice is
> NOT RECOMMENDED for such transports, both to increase the reliability
> of DTLS service and to avoid the risk of spoofing attacks sending
> traffic to unrelated third parties.
>
> [BA] "this practice" refers to "generate an alert instead", correct?

Yes. Addressed,


> 5.8.2. Timer Values
>
> Though timer values are the choice of the implementation, mishandling
> of the timer can lead to serious congestion problems, for example if
>
> [BA] Saying "timer values are the choice of the implementation" seems
> odd, because it is followed by normative language. I would delete this
> and start the sentence with "Mishandling...".

It has been deleted.


> many instances of a DTLS time out early and retransmit too quickly on
> a congested link. Implementations SHOULD use an initial timer value
> of 100 msec (the minimum defined in RFC 6298 [RFC6298]) and double
> the value at each retransmission, up to no less than 60 seconds (the
> RFC 6298 maximum). Application specific profiles, such as those used
> for the Internet of Things environment, may recommend longer timer
> values. Note that a 100 msec timer is recommended rather than the
> 3-second RFC 6298 default in order to improve latency for time-
> sensitive applications. Because DTLS only uses retransmission for
> handshake and not dataflow, the effect on congestion should be
> minimal.
>
> Implementations SHOULD retain the current timer value until a message
> is transmitted and acknowledged without having to be retransmitted,
> at which time the value may be reset to the initial value.
>
> [BA] Is it always possible to distinguish a retransmission from a late
> arrival of an original packet? This seems like it could result in
> wrongly resetting the timer in some situations.

The intent of this text is that you didn't retransmit at all.


> 5.8.3. Large Flight Sizes
>
> DTLS does not have any built-in congestion control or rate control;
> in general this is not an issue because messages tend to be small.
> However, in principle, some messages - especially Certificate - can
> be quite large. If all the messages in a large flight are sent at
> once, this can result in network congestion. A better strategy is to
> send out only part of the flight, sending more when messages are
> acknowledged. DTLS offers a number of mechanisms for minimizing the
> size of the certificate message, including the cached information
> extension [RFC7924] and certificate compression [RFC8879].
>
> [BA] How does the implementation know how much of the flight to send?
> Not sure how prevalent large certs are for DTLS (e.g. compared with the
self-signed certs of WebRTC),
> but in EAP-TLS deployments, large certs have caused problems.
> The EAP-TLS cert document draft-ietf-emu-eaptlscert cites some additional
> mechanisms for reducing certificate sizes, such as draft-ietf-tls-ctls
> and [RFC6066] which defines the "client_certificate_url"
> extension which allows TLS clients to send a sequence of Uniform
> Resource Locators (URLs) instead of the client certificate.

We added some text.


> 5.11. Alert Messages
>
> Note that Alert messages are not retransmitted at all, even when they
> occur in the context of a handshake. However, a DTLS implementation
> which would ordinarily issue an alert SHOULD generate a new alert
> message if the offending record is received again (e.g., as a
> retransmitted handshake message). Implementations SHOULD detect when
> a peer is persistently sending bad messages and terminate the local
> connection state after such misbehavior is detected. Note that
> alerts are not reliably transmitted; implementation SHOULD NOT depend
> on receiving alerts in order to signal errors or connection closure.
>
> [BA] For the fatal alert case, it does seem like retransmission would
> be a good idea; otherwise the peer can be left hanging.

This has been the practice since DTLS 1.0, and there's no way to
ack them, so I don't think we should change no.


> Section 7.1
> "Disruptions" such as reordering do not affect timers, correct?

No. The timers are only on the sender side, so they kind of
can't.



> ACKs SHOULD NOT be sent for these flights unless generating the
> responding flight takes significant time.
>
> What is "significant time"?

Rewritten.


> Editorial Comments (NITs)
>
> Section 2
>
> The reader is also as to be familiar with
>
> [BA] "as" -> "assumed"

Fixed.

> Section 3
>
> The basic design philosophy of DTLS is to construct "TLS over
> datagram transport". Datagram transport does not require nor provide
> reliable or in-order delivery of data. The DTLS protocol preserves
> this property for application data. Applications such as media
> streaming, Internet telephony, and online gaming use datagram
> transport for communication due to the delay-sensitive nature of
> transported data. The behavior of such applications is unchanged
> when the DTLS protocol is used to secure communication, since the
> DTLS protocol does not compensate for lost or reordered data traffic.
>
> [BA] While low-latency streaming and gaming does use DTLS to protect data
(e.g. for
> protection of WebRTC data channel), telephony and RTC Audio/Video uses
DTLS/SRTP for
> key derivation only, and SRTP for protection of data. So you might want
to make a
> distinction.

Done.

> Section 3.1
>
> Note that timeout and retransmission do not apply to the
> HelloRetryRequest since this would require creating state on the
> server. The HelloRetryRequest is designed to be small enough that it
> will not itself be fragmented, thus avoiding concerns about
> interleaving multiple HelloRetryRequests.
>
> [BA] I would add "For more detail on timeouts and retransmission,
> see Section 5.8."

Done.

> 4.3. Transport Layer Mapping
>
> DTLS messages MAY be fragmented into multiple DTLS records. Each
> DTLS record MUST fit within a single datagram. In order to avoid IP
> fragmentation, clients of the DTLS record layer SHOULD attempt to
> size records so that they fit within any PMTU estimates obtained from
> the record layer.
>
> [BA] You might reference PMTU considerations described in Section 4.4.

Done.

>     Post-handshake client authentication
>
> Messages of each category can be sent independently, and reliability
> is established via independent state machines each of which behaves
> as described in Section 5.8.1. For example, if a server sends a
> NewSessionTicket and a CertificateRequest message, two independent
> state machines will be created.
>
> As explained in the corresponding sections, sending multiple
> instances of messages of a given category without having completed
> earlier transmissions is allowed for some categories, but not for
> others. Specifically, a server MAY send multiple NewSessionTicket
> messages at once without awaiting ACKs for earlier NewSessionTicket
> first. Likewise, a server MAY send multiple CertificateRequest
> messages at once without having completed earlier client
> authentication requests before. In contrast, implementations MUST
> NOT have send KeyUpdate, NewConnectionId or RequestConnectionId
>
> [BA] "send" -> "sent"

Changed.

>     Example of Handshake with Timeout and Retransmission
>
> The following is an example of a handshake with lost packets and
> retransmissions. Note that the client sends an empty ACK message
> because it can only acknowledge Record 1 sent by the server once it
> has processed messages in Record 0 needed to establish epoch 2 keys,
> which are needed to encrypt to decrypt messages found in Record 1.
>
> [BA] "encrypt to decrypt" -> "encrypt or decrypt"?

Changed.

> Section 7.3
>
> In the first case the use of the ACK message is optional because the
> peer will retransmit in any case and therefore the ACK just allows
> for selective retransmission, as opposed to the whole flight
> retransmission in previous versions of DTLS. For instance in the
> flow shown in Figure 11 if the client does not send the ACK message
>
> [BA] Figure 11 is the DTLS State Machine. Are you referring to another
figure?

Fixed.

> The use of the ACK for the second case is mandatory for the proper
> functioning of the protocol. For instance, the ACK message sent by
> the client in Figure 13, acknowledges receipt and processing of
> record 4 (containing the NewSessionTicket message) and if it is not
> sent the server will continue retransmission of the NewSessionTicket
> indefinitely until its transmission cap is reached.
>
> [BA] Do you mean "maximum retransmission timemout value"?

Fixed.

[TLS] draft-ietf-tls-dtls13-42 responses to feedb… Eric Rescorla
Re: [TLS] draft-ietf-tls-dtls13-42 responses to f… Sean Turner