Re: [tsvwg] Benjamin Kaduk's Discuss on draft-ietf-tsvwg-rfc4960-bis-17: (with DISCUSS and COMMENT)

tuexen@fh-muenster.de Sun, 26 December 2021 12:00 UTC

Content-Type: multipart/signed; boundary="Apple-Mail=_F0B81022-CB47-4FC5-9DDB-09CC3E6B7441"; protocol="application/pkcs7-signature"; micalg="sha-256"
Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.40.0.1.81\))
From: tuexen@fh-muenster.de
In-Reply-To: <163966609848.27749.5437634889896180266@ietfa.amsl.com>
Date: Sun, 26 Dec 2021 13:00:08 +0100
Cc: The IESG <iesg@ietf.org>, draft-ietf-tsvwg-rfc4960-bis@ietf.org, tsvwg-chairs@ietf.org, tsvwg@ietf.org, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7EC2B47C-63BA-481A-B8A4-415BE9048767@fh-muenster.de>
References: <163966609848.27749.5437634889896180266@ietfa.amsl.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/IFYcQ8lc7cCL88RkK3UctIHgQPA>
Subject: Re: [tsvwg] Benjamin Kaduk's Discuss on draft-ietf-tsvwg-rfc4960-bis-17: (with DISCUSS and COMMENT)
Precedence: list

> On 16. Dec 2021, at 15:48, Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> 
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-tsvwg-rfc4960-bis-17: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
> for more information about how to handle DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-tsvwg-rfc4960-bis/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> My apologies for placing a Discuss so close to the telechat.  I believe
> that both of these topics are comparatively minor and should be easy
> to resolve, but that it's important for the document to have a clear
> answer for them.
No problem. Whatever improves the document is appreciated.
> 
> I ask this with nospecific answer in mind that I need to hear -- per my
> comments on §5.1.3, what are the actual requirements on the
> (cryptographic) protection of the State Cookie?  I feel like I got
> different signals from different parts of the document, and it would be
> good to have consistent messaging throughout.
There are two things relevant here:
(1) The sender of the state cookie considers the IP address, it send the
    state cookie to, belonging to the peer.
(2) The sender of the state cookie uses the information in it, when
    receiving the state cookie, to establish the association and wants
    to be sure that it has not been modified.
There is no requirement for privacy, since the information in the cookie
has been on the wire as part of the INIT and/or INIT ACK.

Does that make sense to you and is consistent with the signals you got?
> 
> Section 15.5 establishes a registry for payload protocol identifiers,
> but I am not sure how this registry is supposed to be able to
> effectively avoid collisions when we do not specify the endianness in
> which the value is represented on the wire (per §3.3.1).
Section 3 contains, just before Subsection 3.1:
All integer fields in an SCTP packet MUST be transmitted in network byte order,
unless otherwise stated.

So the PPID is transmitted always in network byte order, since its description
does not make any other statement. The only statement that is made is
that any byte order conversions must be done by the application, this is
not automatically done by the SCTP implementation.
> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Peeking in the github repo for purposes of preparing an editorial PR, I
> see that the original RFC 4960 was prepared in nroff -- my thanks to the
> authors for the thankless work of converting it to XML!
> 
> I'm also glad to see the robustness changes made to the procedures for
> handling INIT (and INIT ACK) chunks with invalid parameters.
> 
> I am also sympathetic to Alvaro's point about obsoleting additional
> documents.
This document now also obsoletes RFC 4460 and RFC 8540.
> 
> I only got a chance to review the diff from RFC 4960 and a small
> selection of portions of this document, so this review is not as
> comprehensive as I would typically perform.  That said, I trust that the
> document has received sufficient review and therefore am comfortable
> reballoting No Objection once my discuss points are resolved.
> 
> I put some editorial suggestions in a PR on github:
> https://github.com/sctplab/rfc4960bis/pull/79
Since I guess you want the discussion of these proposed changes in the
GitHub infrastructure, please see my comments for the pull request
at Github.
> 
> Abstract
> 
> I see that Warren already remarked about the length of the Abstract.
> I have in my bookmarks
> https://www.rfc-editor.org/policy.html#policy.abstract that indicates
> "[a]n Abstract of more than 20 lines is generally not acceptable."
The abstract is based on what is in RFC 4960 and then a first paragraph was
added to discuss which documents are obsoleted.

The document you are linking states at the beginning:
Note: This page has been replaced by the RFC Style Guide. Please see that page for current information.
Looking at
https://www.rfc-editor.org/rfc/rfc7322.html#section-4.3
I don't see this section mentioning the limitation of number of lines.

I'm also not sure which format should be used for counting lines
(isn't the .xml format the binding one), and if empty lines count.

I'm happy to remove the paragraph about the documents being obsoleted,
tweak the .xml to remove the number of empty lines (when rendered as
ASCII) or whatever you suggest. Please advise.
> 
> Section 3.3.4
> 
>   Cumulative TSN Ack: 32 bits (unsigned integer)
>      The largest TSN, such that all TSNs smaller than or equal to it
>      have been received and the next one has not been received.
> 
> (nit) The "and the next one has not been received" is implied by
> "largest ... such that", since if the next one had been received, this
> one would not be the largest such that the condition holds.
> (This phrasing also seemse to appear in §3.3.8.)  I didn't remove this
> in my editorial PR because it is plausible that the redundancy is
> desired for emphasis.
The text in 3.3.4 and 3.3.8 should be consistent.
I agree, that the text you cite is redundant, but this redundancy is
there on purpose as a result of feedback got via WG LC. If you prefer
not to have the redundancy, please let me know and I will remove the
"and the next one has not been received" part.
> 
> 
> Section 5.1.3
> 
> When would it be permissible to not include a MAC on the State Cookie
> data?  It seems like we refer to it as essentially mandatory from other
You need some protection against changes by others. So encryption
would also be a possibility with some checks. That it why we did not
specify a MUST here. But we put in the spec how to handle the case
when using a MAC.
> parts of the spec, so I was surprised to see it only as a "SHOULD" here.
> E.g., in §5.1.5 the first step in processing COOKIE ECHO is "compute a
> MAC [and compare it to the one in the cookie]" -- there's no point in
> doing that if there's no MAC in the cookie.  Similarly, in §14.1
> "Parameters Necessary for the SCTP Instance" we list a secret key for
> computing the MAC.
Section 14 contains in the first paragraph:

This section is for illustrative purposes and is not considered to
be requirements on an implementation...

So if you think we should make the MAC mandatory using MUST, please
let me know and I do the change. All SCTP implementations I know of,
use a MAC and don't encrypt the cookie. But there are implementation
I don't know (but I'm aware that they exists).
> 
>   Since the State Cookie is not encrypted, it MUST NOT contain
>   information which is not being envisioned to be shared.
> 
> What prevents the state cookie from being encrypted (other than
> performance concerns)?  Perhaps we just need to require that *if* the
> state cookie is not encrypted, it doesn't contain such information.
OK. Changed to using "If the State Cookie..."
> 
> Section 6.2
> 
>   An SCTP receiver MUST NOT generate more than one SACK chunk for every
>   incoming packet, other than to update the offered window as the
>   receiving application consumes new data.  When the window opens up,
>   an SCTP receiver SHOULD send additional SACK chunks to update the
>   window even if no new data is received.  The receiver MUST avoid
>   sending a large number of window updates -- in particular, large
>   bursts of them.  [...]
> 
> "a large number" feels somewhat subjective, such that the power of the
> "MUST"-level requirement is weakened.
I understand. But I don't want to force any particular implementation
here. That is why this text is generic. It is hard to specify a number
or rate...
> 
>   If an endpoint receives a DATA chunk with no user data (i.e., the
>   Length field is set to 16), it MUST send an ABORT chunk with a "No
>   User Data" error cause.
> 
>   An endpoint SHOULD NOT send a DATA chunk with no user data part.
>   This avoids the need to be able to return a zero-length user message
>   in the API, especially in the socket API as specified in [RFC6458]
>   for details.
> 
> This is just a COMMENT since the behavior was present in 4960, but why
> allow sending an empty data chunk when the receiver is required to abort
> because of it?  This seems like the reverse of the Postel principle.
I agree.
I think both should be the same level. If you API does allow handling of
empty messages, the protocol could do it. So should we change the first
text to:

  If an endpoint receives a DATA chunk with no user data (i.e., the
  Length field is set to 16), it SHOULD send an ABORT chunk with a "No
  User Data" error cause.

This improves consistency, still describes what is implemented and
provides a way for implementations handling empty messages in the API
(the RECEIVE primitive would allow this).
> 
> Section 6.5
> 
>   Every DATA chunk MUST carry a valid stream identifier.  If an
>   endpoint receives a DATA chunk with an invalid stream identifier, it
>   SHOULD acknowledge the reception of the DATA chunk following the
>   normal procedure, immediately send an ERROR chunk with cause set to
>   "Invalid Stream Identifier" (see Section 3.3.10), and discard the
>   DATA chunk.  [...]
> 
> What other behaviors are permitted (i.e., why is this SHOULD vs MUST)?
An implementation might decide that sending a SACK, but not being able
to deliver the user data to the application, is something which triggers
an ABORT. 
> 
>   The Stream Sequence Number in all the outgoing streams MUST start
>   from 0 when the association is established.  [...]
> 
> If we were starting from scratch this would probably be undesirable (a
> la draft-gont-numeric-ids-sec-considerations), but I guess it's not
> really changeable now, and we have to rely on the random initial TSN and
> verifiers to prevent off-path guessing.
You can't change this without breaking compatibility with existing
implementations, which is what is not in scope for a -bis document in
my mind.
However, you would need to negotiate the initial values which would add
up to 128 KB to the handshake.

Looking at
https://www.rfc-editor.org/rfc/rfc9000.html#section-19.8
it seems that QUIC does not randomise the initial Offset (i). It
states "The first byte in the stream has an offset of 0.".
> 
> Section 11.2.5
> 
> Looking at the diff from RFC 4960, it seems that we no longer claim that
> there is a "data retrieval id" passed with the communication-lost
> notification.  I don't have a great sense for whether this makes sense,
> e.g., due to an expectation that SEND-FAILURE events will be generated
> separately for all affected messages.
This was a mistake in the specification. The data retrieval id is used
by RECEIVE_UNSENT and RECEIVE_UNACKED and is an identifier for a
user message. If the communication is lost, there can be multiple messages
to be retrieved. That is why you can't provide a single data retrieval id
in a communication lost notification. If the communication is lost, the
upper layer gets a single communication lost notification and a send failure
notification for each message. That way the mechanism can work.

This mistake was not detected earlier, since this is a conceptual API.
Most kernel implementations use the Socket API specified in RFC 6458,
which does it a bit differently (due to Socket API constraints).
20 years ago we did a userland stack with an API based on the one given
in RFC 2960, but we did not implement the data retrieval ID part.
So we did not catch it...
> 
> Section 12
> 
> My apologies if I failed to find some relevant content in my
> somewhat-piecemeal review, but I note that we require each HEARTBEAT
> chunk to receive a corresponding HEARTBEAT ACK, so that (according to
> the protocol rules) an endpoint can force the peer to do work.  Since
> the heartbeat is expected to only occur once every RTO or so (or less
> often), should we allow an endpoint to abort the association on receipt
> of too many heartbeats?
I would prefer not to specify anything. The sending of the HEARTBEAT
can also be triggered by the application. And the network could also
buffer several HEARTBEATs that were sent using an appropriate timing,
but arrive (due to network buffering) in a bulk.

Any endpoint is free to abort an association anytime to protect itself.
But I find it hard to specify appropriate thresholds and specifying
is not needed for interoperability.
> 
> It also looks like it's allowed for a sender to leave gaps in the TSN
> space when sending (non-fragmented) DATA chunks.  This is also the case
No, that is not true. The sender needs to assign TSN consecutive.
Section 3.3.1 contains:

The TSNs of DATA chunks sent SHOULD be strictly sequential.

> for QUIC (for the analogous numer space), and in QUIC's security
> considerations there is a note that a sender who suspects the peer of
> misbehaving (i.e., sending false/speculative ACKs) can deliberately
> leave such a gap and abort the association if the non-existent TSN is
> ACKed.  Do we want to make such a statement here as well?
Section 12.4 contains:

An SCTP implementation SHOULD abort the association if it receives
a SACK chunk acknowledging a TSN that has not been sent.

I think this is the best we can state...
> 
> Section 12.2.3
> 
>   Whenever ESP is in use, application-level encryption is not generally
>   required.
> 
> I suggest removing this statement; we now have plenty of examples where
> double-encryption is actually the right solution, and advances in
> computing power have removed many reasons to desire avoiding it.
I removed that statement.
> 
> Section 12.2.4.1
> 
>   The design of SCTP is resistant to flooding attacks, particularly in
>   its use of a four-way startup handshake, its use of a cookie to defer
>   commitment of resources at the responding SCTP node until the
>   handshake is completed, and its use of a Verification Tag to prevent
>   insertion of extraneous packets into the flow of an established
>   association.
> 
> I suggest mentioning here that using distinct initial TSN and
> initiate tag values improves robustness against such attacks, by making
> attackers guess 32 more bits of values.  (§3.2.2 and 3.2.3 do allow for
> the values to be the same, with concomitant reduced protection.)
There are a couple of points here:
1. An endpoint can select the Initial TSN in whatever way it wants.
   It will be able to interoperate.
2. Using a randomized Initial TSN makes it harder for an off-path
   attacker to inject DATA chunks, but not other chunks. So the
   impact of the selection of the Initial TSN is limited (to user data).
3. Selecting the Initial TSN to be the same value as the verification
   tag is somewhat strange. It is random, but the same value as another
   one. So it does not really give you increased security, possibly it
   helps in debugging (since you see in a trace which does not contain
   the initial hand shake the Initial TSN).
4. I am aware of an (old) implementation, which selected the Initial
   TSN to be 0.
5. Major implementations select the Initial TSN as a random number.

So contemplating about it, I think we could remove the text that
the verification tag and the initial TSN MAY be set tot the same value and
add a text stating that the value SHOULD be a random number.

That reflects reality for a lot of implementations, provides a more
consistent selection, but still allows other choices (for whatever
reason).
> 
>   The IP Authentication Header and Encapsulating Security Payload might
>   be useful in reducing the risk of certain kinds of denial-of-service
>   attacks.
> 
> Current IPSECME WG guidance is to prefer ESP with NULL encryption over
> AH, so I'd recommend not mentioning AH here.
Text is changed to:

<t>ESP might be useful in reducing the risk of certain kinds of
denial-of-service attacks.</t>
> 
> Section 12.4
> 
>   INIT ACK chunk will be larger than the original INIT chunk.  An SCTP
>   implementation SHOULD attempt to make the INIT ACK chunk as small as
>   possible to reduce the possibility of byte amplification attacks.
> 
> The emerging consensus in QUIC and (D)TLS seems to be to limit the
> amplification factor to at most three times the received data; do we
> want to mention that heuristic here?
I'm not sure...

From a high level, a factor of 3 is OK. Since you can use the concatenation
of the INIT chunk and the INIT ACK chunk (without the state cookie parameter)
as the state cookie and you can assume that size of the INIT ACK chunk
without the state cookie is about the size of the INIT chunk.

However, the state cookie also has some overhead, like the MAC, some
address information, a timestamp, tie tags, some other implementation
specific stuff.

Testing with the shortest possible INIT chunk of 20 bytes, FreeBSD responds
with an INIT ACK chunk of 196 bytes and Linux responds with an INIT ACK chunk
of 252 bytes. This can be optimised a bit (at least on FreeBSD), but already
the HMAC-SHA1 gives 20 bytes overhead, which is as large as the input...

For larger INIT chunks (more supported extensions), the ratio would go against 3...

So I would keep the text as is.
> 
> Section 15.6
> 
>   SCTP services can use contact port numbers to provide service to
>   unknown callers, as in TCP and UDP.  IANA is requested to open the
>   existing "Service Name and Transport Protocol Port Number Registry"
>   for SCTP using the following rules, which we intend to mesh well with
>   existing port-number registration procedures.  [...]
> 
> I don't really see anything following that looks like rules for IANA to
> follow.  Is that phrase stale text from 4960?  (Deferring the actual
> procedures to RFC 6335 seems like the right approach.)
What is following is the deferral to RFC 6335. But let us make this clearer.
What about just using:

<t>SCTP services can use contact port numbers to provide service to unknown
callers, as in TCP and UDP.
An IESG-appointed expert reviewer supports IANA in evaluating SCTP port
allocation requests, according to the procedure defined in
<xref target='RFC8126'/>.
The details of this process are defined in <xref target='RFC6335'/>.</t>
> 
> Section 18
> 
> Some of these references probably don't need to be classified as
> normative.  e.g., RFC 3873 is the MIB for SCTP, but we don't require its
> implementation, and RFC 4301 is only referenced for "operators might
> consult [RFC4301]".
Done.
> 
> Appendix A
> 
> I'd consider updating the note about the modifications made to the
> sample code; noteworthy changes appear to include the use of fixed-width
> types and some general cleanup for modern C best practices.
I changed the sentence to:

The code is somewhat modified from <xref target='WILLIAMS93'/>, to ensure
portability between big-endian and little-endian architectures, use fixed
sized types to allow portability between 32-bit and 64-bit platforms, and
general C code improvements.
> 
> Also, it might be worth checking for undefined behavior -- "1 << (31 -
> i)" will be UB for i==1 on systems with 32-bit int, since 2**31 is not
> representable in (signed) int, which is the default type in which the
> expression will be evaluated, even if the result is going to be assigned
> to an unsigned type.
I guess you mean for i == 0. I changed the line to
      rw |= 1UL << (31 - i);
> 
> 
>

Attachment: smime.p7s

[tsvwg] Benjamin Kaduk's Discuss on draft-ietf-ts… Benjamin Kaduk via Datatracker
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… tuexen
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… tuexen
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… tuexen
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [tsvwg] Benjamin Kaduk's Discuss on draft-iet… tuexen

Re: [tsvwg] Benjamin Kaduk's Discuss on draft-ietf-tsvwg-rfc4960-bis-17: (with DISCUSS and COMMENT)

Attachment: smime.p7s