[Tsv-art] Tsvart telechat review of draft-ietf-tls-dtls13-41

Bernard Aboba via Datatracker <noreply@ietf.org> Sat, 20 March 2021 14:46 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Bernard Aboba via Datatracker <noreply@ietf.org>
To: tsv-art@ietf.org
Cc: draft-ietf-tls-dtls13.all@ietf.org, last-call@ietf.org, tls@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <161625159004.8810.14102809716247880517@ietfa.amsl.com>
Reply-To: Bernard Aboba <bernard.aboba@gmail.com>
Date: Sat, 20 Mar 2021 07:46:30 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/SkrNF1BMHHziQABC4fiUVe-VhEY>
Subject: [Tsv-art] Tsvart telechat review of draft-ietf-tls-dtls13-41

Reviewer: Bernard Aboba
Review result: Ready with Issues

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

TSV-ART Review of draft-ietf-tls-dtls13-41
Reviewer: Bernard Aboba

Summary: The timeout and retransmission scheme looks workable for common cases,
but could use some refinement to make it more robust.

Technical Comments

4.5.2.  Handling Invalid Records

   Unlike TLS, DTLS is resilient in the face of invalid records (e.g.,
   invalid formatting, length, MAC, etc.).  In general, invalid records
   SHOULD be silently discarded, thus preserving the association;
   however, an error MAY be logged for diagnostic purposes.

[BA] How does silent discard of invalid records interact with retransmission
timers?

   Implementations which choose to generate an alert instead, MUST
   generate error alerts to avoid attacks where the attacker repeatedly
   probes the implementation to see how it responds to various types of
   error.  Note that if DTLS is run over UDP, then any implementation
   which does this will be extremely susceptible to denial-of-service
   (DoS) attacks because UDP forgery is so easy.  Thus, this practice is
   NOT RECOMMENDED for such transports, both to increase the reliability
   of DTLS service and to avoid the risk of spoofing attacks sending
   traffic to unrelated third parties.

[BA] "this practice" refers to "generate an alert instead", correct?

5.8.2.  Timer Values

   Though timer values are the choice of the implementation, mishandling
   of the timer can lead to serious congestion problems, for example if

[BA] Saying "timer values are the choice of the implementation" seems
odd, because it is followed by normative language. I would delete this
and start the sentence with "Mishandling...".

   many instances of a DTLS time out early and retransmit too quickly on
   a congested link.  Implementations SHOULD use an initial timer value
   of 100 msec (the minimum defined in RFC 6298 [RFC6298]) and double
   the value at each retransmission, up to no less than 60 seconds (the
   RFC 6298 maximum).  Application specific profiles, such as those used
   for the Internet of Things environment, may recommend longer timer
   values.  Note that a 100 msec timer is recommended rather than the
   3-second RFC 6298 default in order to improve latency for time-
   sensitive applications.  Because DTLS only uses retransmission for
   handshake and not dataflow, the effect on congestion should be
   minimal.

   Implementations SHOULD retain the current timer value until a message
   is transmitted and acknowledged without having to be retransmitted,
   at which time the value may be reset to the initial value.

[BA] Is it always possible to distinguish a retransmission from a late
arrival of an original packet? This seems like it could result in
wrongly resetting the timer in some situations.

5.8.3.  Large Flight Sizes

   DTLS does not have any built-in congestion control or rate control;
   in general this is not an issue because messages tend to be small.
   However, in principle, some messages - especially Certificate - can
   be quite large.  If all the messages in a large flight are sent at
   once, this can result in network congestion.  A better strategy is to
   send out only part of the flight, sending more when messages are
   acknowledged.  DTLS offers a number of mechanisms for minimizing the
   size of the certificate message, including the cached information
   extension [RFC7924] and certificate compression [RFC8879].

[BA] How does the implementation know how much of the flight to send?
Not sure how prevalent large certs are for DTLS  (e.g. compared with the
self-signed certs of WebRTC), but in EAP-TLS deployments, large certs have
caused problems. The  EAP-TLS cert document draft-ietf-emu-eaptlscert cites
some additional mechanisms for reducing certificate sizes, such as
draft-ietf-tls-ctls and [RFC6066] which defines the "client_certificate_url"
extension which allows TLS clients to send a sequence of Uniform Resource
Locators (URLs) instead of the client certificate.

5.11.  Alert Messages

   Note that Alert messages are not retransmitted at all, even when they
   occur in the context of a handshake.  However, a DTLS implementation
   which would ordinarily issue an alert SHOULD generate a new alert
   message if the offending record is received again (e.g., as a
   retransmitted handshake message).  Implementations SHOULD detect when
   a peer is persistently sending bad messages and terminate the local
   connection state after such misbehavior is detected.  Note that
   alerts are not reliably transmitted; implementation SHOULD NOT depend
   on receiving alerts in order to signal errors or connection closure.

[BA] For the fatal alert case, it does seem like retransmission would
be a good idea; otherwise the peer can be left hanging.

Section 7.1
"Disruptions" such as reordering do not affect timers, correct?

   ACKs SHOULD NOT be sent for these flights unless generating the
   responding flight takes significant time.

What is "significant time"?

Editorial Comments (NITs)

Section 2

   The reader is also as to be familiar with

[BA] "as" -> "assumed"

Section 3

   The basic design philosophy of DTLS is to construct "TLS over
   datagram transport".  Datagram transport does not require nor provide
   reliable or in-order delivery of data.  The DTLS protocol preserves
   this property for application data.  Applications such as media
   streaming, Internet telephony, and online gaming use datagram
   transport for communication due to the delay-sensitive nature of
   transported data.  The behavior of such applications is unchanged
   when the DTLS protocol is used to secure communication, since the
   DTLS protocol does not compensate for lost or reordered data traffic.

[BA] While low-latency streaming and gaming does use DTLS to protect data (e.g.
for protection of WebRTC data channel), telephony and RTC Audio/Video uses
DTLS/SRTP for key derivation only, and SRTP for protection of data. So you
might want to make a distinction.

Section 3.1

   Note that timeout and retransmission do not apply to the
   HelloRetryRequest since this would require creating state on the
   server.  The HelloRetryRequest is designed to be small enough that it
   will not itself be fragmented, thus avoiding concerns about
   interleaving multiple HelloRetryRequests.

[BA] I would add "For more detail on timeouts and retransmission,
see Section 5.8."

4.3.  Transport Layer Mapping

   DTLS messages MAY be fragmented into multiple DTLS records.  Each
   DTLS record MUST fit within a single datagram.  In order to avoid IP
   fragmentation, clients of the DTLS record layer SHOULD attempt to
   size records so that they fit within any PMTU estimates obtained from
   the record layer.

[BA] You might reference PMTU considerations described in Section 4.4.

   5.  Post-handshake client authentication

   Messages of each category can be sent independently, and reliability
   is established via independent state machines each of which behaves
   as described in Section 5.8.1.  For example, if a server sends a
   NewSessionTicket and a CertificateRequest message, two independent
   state machines will be created.

   As explained in the corresponding sections, sending multiple
   instances of messages of a given category without having completed
   earlier transmissions is allowed for some categories, but not for
   others.  Specifically, a server MAY send multiple NewSessionTicket
   messages at once without awaiting ACKs for earlier NewSessionTicket
   first.  Likewise, a server MAY send multiple CertificateRequest
   messages at once without having completed earlier client
   authentication requests before.  In contrast, implementations MUST
   NOT have send KeyUpdate, NewConnectionId or RequestConnectionId

[BA] "send" -> "sent"

6.  Example of Handshake with Timeout and Retransmission

   The following is an example of a handshake with lost packets and
   retransmissions.  Note that the client sends an empty ACK message
   because it can only acknowledge Record 1 sent by the server once it
   has processed messages in Record 0 needed to establish epoch 2 keys,
   which are needed to encrypt to decrypt messages found in Record 1.

[BA] "encrypt to decrypt" -> "encrypt or decrypt"?

Section 7.3

   In the first case the use of the ACK message is optional because the
   peer will retransmit in any case and therefore the ACK just allows
   for selective retransmission, as opposed to the whole flight
   retransmission in previous versions of DTLS.  For instance in the
   flow shown in Figure 11 if the client does not send the ACK message

[BA] Figure 11 is the DTLS State Machine. Are you referring to another figure?

   The use of the ACK for the second case is mandatory for the proper
   functioning of the protocol.  For instance, the ACK message sent by
   the client in Figure 13, acknowledges receipt and processing of
   record 4 (containing the NewSessionTicket message) and if it is not
   sent the server will continue retransmission of the NewSessionTicket
   indefinitely until its transmission cap is reached.

[BA] Do you mean "maximum retransmission timemout value"?

[Tsv-art] Tsvart telechat review of draft-ietf-tl… Bernard Aboba via Datatracker