[dtn] Benjamin Kaduk's Discuss on draft-ietf-dtn-tcpclv4-18: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Wed, 19 February 2020 22:45 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: dtn@ietf.org
Delivered-To: dtn@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 051EC1200C7; Wed, 19 Feb 2020 14:45:55 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-dtn-tcpclv4@ietf.org, Edward Birrane <edward.birrane@jhuapl.edu>, dtn-chairs@ietf.org, edward.birrane@jhuapl.edu, dtn@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.118.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <158215235500.17580.7759757155303566523.idtracker@ietfa.amsl.com>
Date: Wed, 19 Feb 2020 14:45:55 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/dtn/xUWTb8zyZyM74jzjUGnUvMpCa4Y>
Subject: [dtn] Benjamin Kaduk's Discuss on draft-ietf-dtn-tcpclv4-18: (with DISCUSS and COMMENT)
X-BeenThere: dtn@ietf.org
X-Mailman-Version: 2.1.29
List-Id: "Delay Tolerant Networking \(DTN\) discussion list at the IETF." <dtn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dtn>, <mailto:dtn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dtn/>
List-Post: <mailto:dtn@ietf.org>
List-Help: <mailto:dtn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dtn>, <mailto:dtn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Feb 2020 22:45:55 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-dtn-tcpclv4-18: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-dtn-tcpclv4/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

In Section 4.2 we say that "If an entity is capable of [...] TLS 1.2 or
any successors [...], the CAN_TLS flag within its contanct [sic] header
SHALL be set to 1."  I don't understand why we should allow in the spec
for an entity to not be capable of TLS 1.2+.  Can you give me some
examples of situations when you would want to use a TCPCL but cannot use
TLS with it?  A new major version of TCPCL would be the least-bad time
to make a clean break and mandate TLS...

There's some good discussion in Section 4.4.2 of the mechanics of TLS
X.509 certificate authentication; thanks for that!  I do think that
there's a fairly critical omission, though, namely that the BP agent
needs to provide to the TCPCL Entity the Node ID of the expected next
hop from the routing decision (in addition to the hostname/IP address to
which to initiate a TCP connection), and this Node ID must also be
validated against the TLS certificate and the SESS_INIT from the peer.
Otherwise we are in effect falling back to an authorization policy of
"anyone with a cert with a uniformResourceIdentifier SAN of the expected
scheme is authorized to do anything", which is a pretty weak policy.
(In some sense, if we require this, then the Node ID in the SESS_INIT
becomes redundant, though I think there are some edge cases where it
would still be needed in order for both endpoints to agree on the
communicating identities.)

I also think we need to discuss the TLS X.509 authentication model that
will be used, i.e., "what PKI is being used?".  (To be clear, I don't
know that any changes to the text will be required, but do think we
should be sure we have a clear picture of what the expected deployment
strategies are.) The usage of SNI to pick a cert and the DNS-ID (RFC
6125) to authenticate a hostname might imply that the typical "Web PKI"
(that deals in hostnames) is intended, but the URIs we need to
authenticate Node IDs are not commonly certified by that PKI.  Since the
server has to present a single certificate even if it is attempting to
authenticate as both DNS-ID and the NodeID URI, it seems like it would
be challenging to use this scheme in practice against the Web PKI roots.
This hybrid of hostname and Node-ID authentication also suffers from an
awkward ordering issue when the TLS handshake occurs before the
SESS_INIT messages that convey what Node ID is intended to be
authenticated -- this requires implementations to use a TLS stack that
preserves the peer's certificate and perform name validation after a
completed TLS handshake, which is moving more of the complications out
of the TLS stack and into the application logic (which introduces risk
of security-relevant bugs).  It also means that certificate selection
must be based solely on SNI hostname and cannot involve the requested
Node ID.  [There is in theory the selectable name_type field in the TLS
server_name extension, but in practice that joint has rusted shut and it
seems unlikely that there would be much implementation traction to
define a name type for DTN Node ID; RFC 6066 also fails to give a clear
indication of the intended semantics when multiple name types are
present.]

Please double-check for lingering text that assumes the RFC 7242
behaviors where all parameters are in the contact header (vs. SESS_INIT)
and use of SDNV encoding vs. fixed-lengths.  I noted several instances
in the COMMENT section, but do not claim to have made an exhaustive
review.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Please consider the comments from the secdir telechat review.

Abstract

   This document describes a TCP-based convergence layer (TCPCL) for
   Delay-Tolerant Networking (DTN).  This version of the TCPCL protocol
   is based on implementation issues in the earlier TCPCL Version 3 of
   RFC7242 and updates to the Bundle Protocol (BP) contents, encodings,
   and convergence layer requirements in BP Version 7.  Specifically,

nit: I would hope that it's the changes in this version that are
intended to resolve implementation issues from the previous version, and
not some explicitly-operations-unfriendly protocol that seeks to
maximize implementation issues in deployment ;)

Section 1.1

   o  Mechanisms for locating or identifying other bundle entities
      (peers) within a network or across an internet.  The mapping of
      Node ID to potential CL protocol and network address is left to

nit: I don't think even draft-ietf-dtn-bpbis even really defines "CL" as
an acronym (just "CLA"), so we should probably expand it somewhere.

   o  Policies or mechanisms for assigning X.509 certificates,
      provisioning, deploying, or accessing certificates and private
      keys, deploying or accessing certificate revocation lists (CRLs),
      or configuring security parameters on an individual entity or
      across a network.

In a similar vein to my Discuss points, I think I'm going to need some
more convincing that (some of) these are best left out of scope.  To
help me understand the current ecosystem, could you give me some
pointers to what has actually been used for deployments?  The security
of the system as a whole rests on the criteria used by CAs on whether to
certify a given public key as being associated with a Node ID (or host
name) and the trustworthiness of the CAs to apply those rules accurately
and without error.  (Perhaps this is not what you meant by "assigning
X.509 certificates", though.)

   o  Uses of TLS which are not based on X.509 certificate
      authentication (see Section 8.10.2) or in which authentication is
      not available (see Section 8.10.1).

Section 4.2 seems to describe "opportunistic TLS" and reference RFC
7435, so I don't think we need to say "or in which authentication is not
available" here.

Section 2.1

         A TCPCL Entity MAY support zero or more passive listening
         elements that listen for connection requests from other TCPCL
         Entities operating on other entitys in the network.

nit: "entities"

         A TCPCL Entity MAY passivley initiate any number of TCPCL
         Sessions from requests received by its passive listening
         element(s) if the entity uses such elements.

I'm not sure what "passively initiate" is supposed to mean.  Is this a
session initiation that's triggered by an incoming message in some
fashion?
Also, nit: "passively"

   TCPCL Session:  A TCPCL session (as opposed to a TCP connection) is a
      TCPCL communication relationship between two TCPCL entities.
      Within a single TCPCL session there are two possible transfer
      streams; one in each direction, with one stream from each entity
      being the outbound stream and the other being the inbound stream.
      The lifetime of a TCPCL session is bound to the lifetime of an
      underlying TCP connection.  A TCPCL session is terminated when the
      TCP connection ends, due either to one or both entities actively
      closing the TCP connection or due to network errors causing a
      failure of the TCP connection.  For the remainder of this
      document, the term "session" without the prefix "TCPCL" refers to
      a TCPCL session.

This leaves me confused as to whether there is one TCP connection or two
between interacting entities ("two possible transfer streams" vs. "the
TCP connection ends") -- are the two transfer streams just the two
directions of the single TCP connection?

   Idle Session:  A TCPCL session is idle while the only messages being
      transmitted or received are KEEPALIVE messages.

   Live Session:  A TCPCL session is live while any messages are being
      transmitted or received.

nit: any messages other than KEEPALIVE, I presume.


In Figure 2, are any of the respective sessions 1 through n targetting
the same "other TCPCL Entity's Listener" or is the idea that they're 1:1
session:other-entity?  If they're 1:1, it might be worth trying to make
the arrow from #n go to the box in the background instead of the one in
the foreground.  (This potentially holds true whether the Acks are TCP
ACKs or XFER_ACKs.)

In Figure 3, if there's actually only one (bidirectional) TCP
connection, we might want to think about indicating the sequencing
between Acks and Segments in the same direction.

Section 3.1

   Session State Changed:  The TCPCL supports indication when the
      session state changes.  The top-level session states indicated
      are:

Is this an indication from the TCPCL Entity to the BP agent?  (Similarly
for "Idle Changed", etc.)

      ongoing transfer.  Because TCPCL transmits serially over a TCP
      connection, it suffers from "head of queue blocking" this
      indication provides information about when a session is available
      for immediate transfer start.

nit: run-on sentence.

   Begin Transmission:  The principal purpose of the TCPCL is to allow a
      BP agent to transmit bundle data over an established TCPCL
      session.  Transmission request is on a per-session basis, the CL
      does not necessarily perform any per-session or inter-session
      queueing.  Any queueing of transmissions is the obligation of the

nit: comma splice.

   Transmission Failure:  The TCPCL supports positive indication of
      certain reasons for bundle transmission failure, notably when the
      peer entity rejects the bundle or when a TCPCL session ends before
      transfer success.  The TCPCL itself does not have a notion of
      transfer timeout.

I'm sure there is a subtle distinction here between a "TCPCL notion of
transfer timeout" and the underlying TCP connection timing out on
retransmissions, but I'm not sure what it is, yet.

Section 3.2

   negotiate the use of TLS security (as described in Section 4).  Once
   contact negotiation is complete, TCPCL messaging is available and the
   session negotiation is used to set parameters of the TCPCL session.
   One of these parameters is a Node ID of each TCPCL Entity.  This is
   used to assist in routing and forwarding messages by the BP Agent and
   is part of the authentication capability provided by TLS.

I might phrase this as "the Node ID that each TCPCL Entity is acting
as".

   Once negotiated, the parameters of a TCPCL session cannot change and
   if there is a desire by either peer to transfer data under different
   parameters then a new session must be established.  This makes CL
   logic simpler but relies on the assumption that establishing a TCP
   connection is lightweight enough that TCP connection overhead is
   negligable compared to TCPCL data sizes.

(I assume this assumption holds true ~universally for DTN TCPCL
consumers?)

   There is no fundamental limit on the number of TCPCL sessions which a
   single node can establish beyond the limit imposed by the number of
   available (ephemeral) TCP ports of the passive entity.

"epehemeral TCP ports" on the passive entity seems like a typo, as
Section 4.1 suggests that the passive entity usually uses the assigned
port 4556 and the source port on the active entity is an ephemeral one.

   Section 4.3).  Regardless of the reason, session termination is
   initiated by one of the entities and responded-to by the other as
   illustrated by Figure 13 and Figure 14.  Even when there are no
   transfers queued or in-progress, the session termination procedure
   allows each entity to distinguish between a clean end to a session
   and the TCP connection being closed because of some underlying
   network issue.

[This is also useful for TLS connections, as proper usage of
bidirectional "close_notify" alerts is far from universal.]

Section 3.3

It would probably be worth expanding "PCH" and "PSI" on first use rather
than last use.

   Contact negotiation involves exchanging a Contact Header (CH) in both
   directions and deriving a negotiated state from the two headers.  The
   contact negotiation sequencing is performed either as the active or
   passive entity, and is illustrated in Figure 5 and Figure 6
   respectively which both share the data validation and analyze final
   states of the "[PCH]" activity of Figure 7 and the "[TCPCLOSE]"
   activity which indicates TCP connection close.  Successful

I'm not sure what "analyze final states" is intended to mean.  (It shows
up again later, in discussion of Figure 10.)

Where does [PCH] occur in Figure 6?  (Should the "[PSI]" be changed to
"[PCH]"?)

Several of the figures have "negotiate <X>" steps that do not have
possible paths for failure of negotiation; only Figure 4 seems to have a
disclaimer that it only shows the "nominal" case.

   Session termination involves one entity initiating the termination of
   the session and the other entity acknowledging the termination.  For
   either entity, it is the sending of the SESS_TERM message which
   transitions the session to the ending substate.  While a session is
   being terminated only in-progress transfers can be completed and no
   new transfers can be started.

Would it be more clear to say "is in the ending substate" than the
current "is being terminated"?

Section 3.4

   Each TCPCL session allows a negotiated transfer segmentation polcy to

nit: "policy"

   be applied in each transfer direction.  A receiving node can set the
   Segment MRU in its contact header to determine the largest acceptable

nit: this is in SESS_INIT now, not the contact header.

   attempt, and it SHOULD use a (binary) exponential back-off mechanism

Thank you for specifying the base of the exponent!

Section 4.1

   established, the entity SHALL close the TCP connection.  The ordering
   of the contact header exchange allows the passive entity to avoid
   allocating resources to a potential TCPCL session until after a valid
   contact header has been received from the passive entity.  This

I'm pretty sure that the passive entity is not going to get a contact
header from itself.

Section 4.2

It might be worth discussing the invariants of the contact
header/protocol, akin to https://tools.ietf.org/html/rfc8446#section-9.3
(though presumably less complicated!), since we are changing how things
work between TCPCLv3 and TCPCLv4.

   If an entity is capable of exchanging messages according to TLS 1.2
   [RFC5246] or any successors [RFC8446] that are compatible with TLS
   1.2, the CAN_TLS flag within its contanct header SHALL be set to 1.
   This behavor prefers the use of TLS when possible, even if security
   policy does not allow or require authentication.  This follows the
   opportunistic security model of [RFC7435].

When possible and there is not an active attack, which is an important
difference!

Section 4.3

   The first negotiation is on the TCPCL protocol version to use.  The
   active entity always sends its Contact Header first and waits for a
   response from the passive entity.  The active entity can repeatedly
   attempt different protocol versions in descending order until the
   passive entity accepts one with a corresponding Contact Header reply.
   Only upon response of a Contact Header from the passive entity is the
   TCPCL protocol version established and parameter negotiation begun.

Is this on the same TCP connection or successive new ones?  (The next
paragraph implies the same one, but it would be good to be explicit
about this.)

Section 4.4

   session to non-TLS operation.  If this is desired, the entire TCPCL
   session MUST be terminated and a new non-TLS-negotiated session
   established.

Absent some reason why this might be desired, I don't think we need to
have this last sentence.

   Once established, the lifetime of a TLS session SHALL be bound to the
   lifetime of the underlying TCP connection.  Immediately prior to

As the secdir reviewer notes, "TLS session" is an existing term of art
for TLS-related state that can be persisted across underlying TCP
connections by means of "resumption".  Given my current understanding of
TCPCL (possibly flawed), I suggest avoiding the phrase and sticking with
the current TCPCL session semantics that are bound to the TCP
connection.

   actively ending a TLS session after TCPCL session termination, the
   peer which sent the original (non-reply) SESS_TERM message SHOULD
   follow the Closure Alert procedure of [RFC5246] to cleanly terminate
   the TLS session.  Because each TCPCL message is either fixed-length

RFC 8446 is the current reference
(https://tools.ietf.org/html/rfc8446#section-6.1).

Section 4.4.1

   session entities SHALL begin a TLS handshake in accordance with TLS
   1.2 [RFC5246] or any successors that are compatible with TLS 1.2.  By

I think we can just say "begin a TLS handshake", citing RFC 8446, drop
the "in accordance with..." text, and say that version 1.2 or higher
MUST be negotiated.

   convention, this protocol uses the node which initiated the
   underlying TCP connection as the "client" role of the TLS handshake
   request.

That is to say, the "active" endpoint is the TLS client?

   Server Certificate:  The passive entity SHALL supply a certificate
      within the TLS handshake to allow authentication of its side of
      the session.  When assigned a stable host name or address, the
      passive entity certificate SHOULD contain a subjectAltName entry
      which authenticates that host name or address.  The passive entity

The current best practice is to reference RFC 6125 and talk about the
"DNS-ID" of the certificate.

      certificate SHOULD contain a subjectAltName entry of type
      uniformResourceIdentifier which authenticates the Node ID of the

Is this "SHOULD" a "MUST (unless using opportunistic TLS)"?  I might
suggest a different phrasing if so.

      peer.  The passive entity MAY use the SNI host name to choose an
      appropriate server-side certificate which authenticates that host
      name and corresponding Node ID.

I'm not sure on how often there will be a unique "corresponding Node ID"
for a given hostname.  This seems like a new requirement that might be
worth documenting in a management considerations section.

   Client Certificate:  The active entity SHALL supply a certificate

[ditto about RFC 6125, even though technically it only claims to apply
to server certificates -- since the situation here is symmetric, I think
it's okay to reuse the terminology.]

   All certificates supplied during TLS handshake SHALL conform with the
   profile of [RFC5280], or any updates or successors to that profile.

nit: "profile of [RFC5280]" sounds like it's RFC 5280 that's being
profiled, not that RFC 5280 is itself a profile of X.509.  So, I'd say
either just "conform with [RFC5280]" or "confirm with the X.509 profile
of [RFC5280]".  (Well, I'd probably s/conform with/conform to/, too...)

   When a certificate is supplied during TLS handshake, the full
   certification chain SHOULD be included unless security policy
   indicates that is unnecessary.

Is this intended to say that the root (trust anchor) certificate should
also be sent, even when the peer is assumed to already have a copy?
(This is not typical in TLS usage in other scenarios.)

   If a TLS handshake cannot negotiate a TLS session, both entities of
   the TCPCL session SHALL close the TCP connection.  At this point the
   TCPCL session has not yet been established so there is no TCPCL
   session to terminate.  This also avoids any potential security issues
   assoicated with further TCP communication with an untrusted peer.

I'm not entirely sure what's intended by "potential security issues"
here -- how is this different from any other TCP connection not using
TLS?

Section 4.4.2

The procedure that we describe here has a fairly convoluted description
and may be a little hard to follow (roughly, it seems to be "try really
hard to validate Node ID, and try somewhat less hard to validate
hostname/IP, but if you can do the former but not the latter it's still
okay").  It might be worth restructuring to have something of a
prioritized list/procedure, along the lines of "attempt to validate the
Node ID; if that succeeds, you're done and all is good.  If all fields
for validation is present but validation still fails, abort.  Otherwise
(not all fields for Node-ID validation are present), if policy allows,
validate the hostname (possibly limited to some set known from
out-of-band information to be "trusted" to a higher extent than just
having a certificate might indicate).  Fallback to opportunistic TLS can
be allowed by local policy."

   Using X.509 certificates exchanged during the TLS handshake, each of
   the entities can attempt to authenticate its peer at the network
   layer (host name and address) and at the application layer (BP Node
   ID).  The Node ID exchanged in the Session Initialization is likely

I think it's perhaps subjective or controversial to say that the network
address is authenticated by the X.509 validation process (barring
iPAddress certificates) and would suggest just using "host name" if
there is no specific reason to also list address.

   By using the SNI host name (see Section 4.4.1) a single passive
   entity can act as a convergence layer for multiple BP agents with
   distinct Node IDs.  When this "virtual host" behavior is used, the
   host name is used as the indication of which BP Node the passive
   entity is attempting to communicate with.  A virtual host CL entity

nit: either "communicate as" or "active entity is attempting".

   entity is attempting to communicate with.  A virtual host CL entity
   can be authenticated by a certificate containing all of the host
   names and/or Node IDs being hosted or by several certificates each
   authenticating a single host name and/or Node ID.

I suggest to append ", using the SNI value from the client to select
which certificate to use".

   of the TCP connection.  The passive entity SHALL attempt to
   authenticate the IP address of the other side of the TCP connection.
   The passive entity MAY use the IP address to resolve one or more host
   names of the active entity and attempt to authenticate those.  If

Er, is this saying that we basically expect everyone to have IP-address
certificates?
Also, using reverse DNS like this is pretty risky security posture in
the absence of DNSSEC on the reverse zone.

I might note in the caption for Figure 17 that the closure procedures
can be initiated by either entity.

Section 4.5

   | SESS_TERM    | 0x05 |              Indicates that one of the      |
   |              |      | entities participating in the session       |
   |              |      | wishes to cleanly terminate the session, as |
   |              |      | described in             Section 6.         |

Is section 6.1 a better reference?

Section 4.6

   Keepalive Interval:  A 16-bit unsigned integer indicating the
      interval, in seconds, between any subsequent messages being
      transmitted by the peer.  The peer receiving this contact header
      uses this interval to determine how long to wait after any last-
      message transmission and a necessary subsequent KEEPALIVE message
      transmission.

It might be worth giving more clarity as to whether this is an
indication of "this is what I will do" vs. "this is what you should do
when talking to me".

   Node ID Length and Node ID Data:  Together these fields represent a
      [...]
      message.  Every Node ID SHALL be a URI consistent with the
      requirements of [RFC3986] and the URI schemes of
      [I-D.ietf-dtn-bpbis].  The Node ID itself can be authenticated as

It may be better to refer to the "Bundle Protocol URI scheme types"
registry established by bpbis than the hardcoded list of URI schemes
contained within bpbis.

Section 4.7

   An entity calculates the parameters for a TCPCL session by
   negotiating the values from its own preferences (conveyed by the
   contact header it sent to the peer) with the preferences of the peer
   node (expressed in the contact header that it received from the
   peer).  The negotiated parameters defined by this specification are

Aren't these from the SESS_INIT message rather than the contact header?

   Session Keepalive:  Negotiation of the Session Keepalive parameter is
      performed by taking the minimum of this two contact headers'
      Keepalive Interval.  The Session Keepalive interval is a parameter
      for the behavior described in Section 5.1.1.  If the Session
      Keepalive interval is unacceptable, the node SHALL terminate the
      session with a reason code of "Contact Failure".

It's probably worth mentioning that a value of zero means "keepalives
are disabled".  The procedure given here does seem consistent with the
discussion in Section 5.1.1 that indicates that either peer can
unilaterally disable keepalives (as opposed to the behavior one might
naively expect by having the encoded value zero reflect an effective
interval of "infinity").

Section 4.8

   Item Type:  A 16-bit unsigned integer field containing the type of
      the extension item.  This specification does not define any
      extension types directly, but does allocate an IANA registry for
      such codes (see Section 9.3).

nit: I think we typically say that we "allocate" codepoints from
registries but "create" registries themselves.

   choose a keepalive interval no shorter than 30 seconds.  There is no
   logical maximum value for the keepalive interval, but an idle TCP

It's a fixed-width field; there's a maximum.

   Note: The Keepalive Interval SHOULD NOT be chosen too short as TCP
   retransmissions MAY occur in case of packet loss.  Those will have to
   be triggered by a timeout (TCP retransmission timeout (RTO)), which
   is dependent on the measured RTT for the TCP connection so that
   KEEPALIVE messages MAY experience noticeable latency.

nit: s/MAY/may/; this statement is describing a property of TCP and not
granting implementations permission to engage in a particular behavior.

Section 5.2.1

   Each transfer entails the sending of a sequence of some number of
   XFER_SEGMENT and XFER_ACK messages; all are correlated by the same
   Transfer ID.

   Transfer IDs from each node SHALL be unique within a single TCPCL
   session.  The initial Transfer ID from each node SHALL have value
   zero.  Subsequent Transfer ID values SHALL be incremented from the
   prior Transfer ID value by one.  Upon exhaustion of the entire 64-bit
   Transfer ID space, the sending node SHALL terminate the session with
   SESS_TERM reason code "Resource Exhaustion".

I'd suggest adding some discussion that the sender of the XFER_SEGMENTs
allocates the Transfer ID and the XFER_ACKs respond to it; the current
text is potentially confusing since the ID in an XFER_ACK could be
considered to be a "transfer ID from [each] node" but is really in the
peer node's namespace.

Also, it's not entirely clear that there's a need for predictable IDs
starting from zero;
https://tools.ietf.org/html/draft-gont-numeric-ids-sec-considerations-04
has some further discussion about potential pitfalls of such predictable
identifiers.

Section 5.2.2

   The flags portion of the message contains two optional values in the
   two low-order bits, denoted 'START' and 'END' in Table 5.  The

I'm always a little wary of using "optional" for situations like these,
since "optional" can imply that they appear subject to the whims of the
implementation/higher-layer, but when they do/don't appear here is
actually tightly constrained by the protocol specification.  Perhaps
it's better to talk of information being conveyed by the value of these
bits than them containing optional values.

Section 5.2.3

   A receiving TCPCL node SHALL send an XFER_ACK message in response to
   each received XFER_SEGMENT message.  The flags portion of the
   XFER_ACK header SHALL be set to match the corresponding DATA_SEGMENT
   message being acknowledged.  The acknowledged length of each XFER_ACK

I don't think there is a DATA_SEGMENT (vs. XFER_SEGMENT) message
defined.

Also, is the receiver expected to echo flags from the XFER_SEGMENT that
it does not comprehend?

Section 5.2.4

   | Extension  | 0x01 | A failure processing the Transfer Extension   |
   | Failure    |      | Items ha occurred.                            |

nit: "has"

   Note: If a bundle transmission is aborted in this way, the receiver
   MAY not receive a segment with the 'END' flag set to 1 for the
   aborted bundle.  The beginning of the next bundle is identified by

nit: I think "might not" is more appropriate here, as this is describing
expectations and not some active choice the receiver can make.

Section 5.2.5.1

   Total Length:  A 64-bit unsigned integer indicating the size of the
      data-to-be-transferred.  The Total Length field SHALL be treated
      as authoritative by the receiver.  If, for whatever reason, the
      actual total length of bundle data received differs from the value
      indicated by the Total Length value, the receiver SHALL treat the
      transmitted data as invalid.

It seems like this might be setting things up for information skew
between endpoints, where the receiver has discarded a bundle but the
sender thinks it was successfully transmitted.  Would it make sense to
require an XFER_REFUSE in such a case?

Section 6.1

   Instead of following a clean shutdown sequence, after transmitting a
   SESS_TERM message an entity MAY immediately close the associated TCP
   connection.  When performing an unclean shutdown, a receiving node
   SHOULD acknowledge all received data segments before closing the TCP
   connection.  Not acknowledging received segments can result in
   unnecessary retransmission.  When performing an unclean shutodwn, a

I'm not sure I understand the nature of this indicated "unnecessary
retransmission".  My first thought was that there would be a way to
reestablish a new session between the same endpoints and reuse the
Transaction ID to "start in the middle" of the bundle, but I don't see
any mechanisms designed to allow that or to assign semantics to
transaction IDs across sessions.  So the only thing I can come up with
would be ordinary TCP-layer retransmissions, but those are within the
TCP stack's abstraction boundary, so I'm not sure if it makes sense to
talk about them.

   | Contact      | 0x04 | The node cannot interpret or negotiate      |
   | Failure      |      | contact header option.                      |

nit: I think there's a singular/plural mismatch here.

   connection without sending a SESS_TERM message.  If the content of
   the Session Extension Items data disagrees with the Session Extension
   Length (i.e. the last Item claims to use more octets than are present
   in the Session Extension Length), the reception of the contact header
   is considered to have failed.

nit: is it reception of the contact header or the SESS_INIT that is to
have failed?

Section 8

Thanks for the throrough security considerations; I really appreciate
the work that went into the discussion with the secdir reviewer.

Section 8.2

I think this text implies that a malicious or compromised node can view
bundle contents (which to some extent is a consideration for the core
bundle protocol and not the convergence layer anyway), so it's probably
okay to leave it as-is.

Section 8.3

I suggest noting that an active on-path attacker can also cause use of
an older/different TCPCL version.

   of TCPCL which does not support transport security.  It is up to
   security policies within each TCPCL node to ensure that the TCPCL
   version in use meets transport security requirements.

I'm not sure whether it's worth noting this in the document, but a
consequence of this is that if the security policy allows more than one
TCPCL version, the on-path attacker has control over which of those
versions gets used, even if that protocol itself nominally operats in a
secure fashion.

Section 8.4

   The purpose of the CAN_TLS flag is to allow the use of TCPCL on
   entities which simply do not have a TLS implementation available.
   When TLS is available on an entity, it is strongly encouraged that
   the security policy disallow non-TLS sessions.  This requires that

Is it only "strongly encouraged" or a "SHALL-level requirement" (per
Section 4.2)?

Section 8.6

   or using it after it has been revoked by its CA.  Validating a
   certificate is a complex task and may require network connectivity if
   a mechanism such as the Online Certificate Status Protocol (OCSP) is
   used by the CA.  The configuration and use of particular certificate

nit: I suggest "external network connectivity"; some level of network
connectivity has to be present in order for TCPCL to be used at all...

Section 8.7

   When permitted by the negotiated TLS version (see [RFC8446]), it is
   advisable to take advantage of session key updates to avoid those
   limits.  When key updates are not possible, establishing new TCPCL/
   TLS session is an alternative to limit session key use.

This is not necessarily a question that needs to be answered in the RFC,
but do you consider TLS 1.2 renegotiation to be a "session key update
mechanism"?

Section 8.8

   certificates can be guaranteed to validate the peer's Node ID.  In
   circumstances where certificates can only be issued to network host
   names, Node ID validation is not possible but it could be reasonable
   to assume that a trusted host is not going to present an invalid Node
   ID.

I think we should also say that determination of whether a given host is
trusted in this fashion is out of scope for this document (since it's a
very important consideration but we cannot give universal guidance on
it).

Section 8.9

TCP RST injection is also a DoS vector (potentially from off-path), as
boring as it is...

Section 8.10


   This specification makes use of public key infrastructure (PKI)
   certificate validation and authentication within TLS.  There are
   alternate uses of TLS which are not necessarily incompatible with the
   security goals of this specification, but are outside of the scope of
   this document.

There are other PKIs than X.509 PKIs, so I'd suggest "X.509 public key
infrastructure".

Also, I think it would be reasonable to give some examples of other
modes of using TLS ("e.g., non-X.509 certificates or raw public keys")
if you want.

Section 9

Should we make registries for the various flag words (vs. requiring an
RFC updating this one in order to allocate new values)?

Section 9.1

The IESG should perhaps talk (again) about updating the port
registration made by an IRTF-stream document.

Section 11.2

One could perhaps argue that [AEAD-LIMITS] is more properly normative,
but I don't feel very strongly about it.