Re: [tsvwg] Reminder: WGLC for SCTP.bis at midnight UTC on 14th April 2021

Randall Stewart <randall@lakerest.net> Sun, 11 July 2021 10:50 UTC

Return-Path: <randall@lakerest.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80B963A2D78 for <tsvwg@ietfa.amsl.com>; Sun, 11 Jul 2021 03:50:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0
X-Spam-Level:
X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H8IDsd-SN94P for <tsvwg@ietfa.amsl.com>; Sun, 11 Jul 2021 03:50:11 -0700 (PDT)
Received: from smtp-out.a.misk.com (smtp-out.a.misk.com [199.47.128.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 284DB3A2D76 for <tsvwg@ietf.org>; Sun, 11 Jul 2021 03:50:10 -0700 (PDT)
X-Authenticated-User: randall@lakerest.net
Received: from 10.1.1.78 (072-239-132-196.res.spectrum.com [72.239.132.196]) by smtp.misk.com (MiskSMTP) with ESMTPSA id cbfaec0630a04130b429a68cb059d163 for <tsvwg@ietf.org>; Sun, 11 Jul 2021 10:50:07 +0000
To: Michael Tuexen <michael.tuexen@lurchi.franken.de>, Magnus Westerlund <magnus.westerlund=40ericsson.com@dmarc.ietf.org>
Cc: "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <AAF37F10-8FB6-4C0F-BC61-38388608A80C@erg.abdn.ac.uk> <HE1PR0702MB3772F2C36FE232D0984D58F295579@HE1PR0702MB3772.eurprd07.prod.outlook.com> <E2BC5D2E-3E8B-418E-AED9-83298CCB2238@lurchi.franken.de> <HE1PR0702MB37725DA310975CCD82672883951C9@HE1PR0702MB3772.eurprd07.prod.outlook.com> <0ED937C5-D6EA-4019-AB53-9D912F518877@lurchi.franken.de>
From: Randall Stewart <randall@lakerest.net>
Message-ID: <0c4bd065-76b0-5521-c6ce-253ad64868f5@lakerest.net>
Date: Sun, 11 Jul 2021 06:50:05 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <0ED937C5-D6EA-4019-AB53-9D912F518877@lurchi.franken.de>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/vUHEiXDRU0sazePJ0RwUKKCYbU0>
Subject: Re: [tsvwg] Reminder: WGLC for SCTP.bis at midnight UTC on 14th April 2021
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jul 2021 10:50:17 -0000

+1


On 7/10/21 7:19 PM, Michael Tuexen wrote:
>> On 5. Jul 2021, at 17:33, Magnus Westerlund <magnus.westerlund=40ericsson.com@dmarc.ietf.org> wrote:
>>
>> Hi,
>>
>> Responses inline.
>>
>> I think my main remaining issue is the question if SCTP really should continue to be oblivious on specification level of source address and not define paths as source - destination pairs where possible. I would appreciate other feedback into this issue. My main proposal would be to recommend implementation to use source - destination pairs as identifiers for paths, rather than just destination when able to. That way avoid some mistakes.
> Hi Magnus,
>
> some comments regarding using src/dst address pairs instead of dst addresses.
>
> * Considering all src/dst address pairs or only the dst addresses was a design
>    decision made very early in the process of defining SCTP.
>
> * Consider the setup where two nodes can communicate via n separate networks, which are
>    independent, especially no cross routing is possible.
>    Considering only the dst address, each note has n remote addresses to supervise and use.
>    All of them should be working in case of no failures.
>    If you consider add src/dst pairs, each node has to supervise n^2 paths, n of them are
>    potentially working, n * (n - 1) are permanently failing. For n > 2 this means that
>    a dominant part of paths are permanently not working.
>
> * I would consider an SCTP implementation using all src/dst address pairs realising a
>    different flavour of SCTP. One can specify such a flavour and implement it, but it
>    requires a carful analysis what needs to be done per address pair, what not (for example
>    address validation does not need to be done per src/dst address pair), detection of
>    association failures might need to be done differently, the API has to be changed (a
>    lot of calls only allow the dst address to be specified), ...
>    I think getting this working is not trivial, since I have seen stacks trying to
>    do that but not getting it right (whatever that would be, but it could be determined
>    that what they did was not the right thing).
>
> * I don't think changing from dst addresses to src/dst address pair is something in the
>    scope of RFC 8540. This is not an errata or an issue. It is specifying a different
>    flavour of SCTP, which changes one of the initial design decisions.
>
> So if you think it is important to have such a flavour of SCTP, I would suggest to
> write a specification for it and describe what has to be done differently as described
> in RFC 4960bis and RFC 6458 and possibly RFC 5061 (not sure).
>
> Other comments in-line.
>
> Best regards
> Michael
>>
>>> -----Original Message-----
>>> From: Michael Tuexen <michael.tuexen@lurchi.franken.de>
>>> Sent: den 5 juli 2021 02:31
>>> To: Magnus Westerlund <magnus.westerlund@ericsson.com>
>>> Cc: gorry@erg.abdn.ac.uk; tsvwg@ietf.org; randall@lakerest.net
>>> Subject: Re: [tsvwg] Reminder: WGLC for SCTP.bis at midnight UTC on 14th
>>> April 2021
>>>
>>>> On 7. May 2021, at 18:16, Magnus Westerlund
>>> <magnus.westerlund=40ericsson.com@dmarc.ietf.org> wrote:
>>>> Hi,
>>>>
>>>> Sorry for the delay in finishing this review. However, I have now completed
>>> it. I don’t think I have reviewed SCTP completely since maybe 2006. Thus, this
>>> review reflects the review of just about everything that I have reacted to.
>>> Some of these comments the answer will simply be, we can’t do anything
>>> without affecting backwards compatibility. However, I think there are some
>>> clarifications and improvements that can be made to make a first time reader
>>> make less mistakes.
>>>> I did review the HTML version:
>>>> https://www.ietf.org/archive/id/draft-ietf-tsvwg-rfc4960-bis-11.html
>>> Hi Magnus,
>>>
>>> thank you very much for the review. See my comments regarding the issues
>>> you brought up in-line.
>>>
>>> Best regards
>>> Michael
>>>> High level comments.
>>>>
>>>> First of all I think the concept behind assignment of TSN should be
>>>> improved in Section 2. I struggled with several aspects that made
>>>> understanding things harder. I think the TSN requires strict increase
>>>> by one for each data chunk transmitted, independently if they are
>>>> bundled or not and in order of inclusion in the packet. I would also
>>>> include a statement that makes
>>> Well, what you actually want it to assign the TSN in a way such that they are
>>> received in sequence.
>>> One way to do that is to assign them in sequence and assume that they are
>>> received in the sequence they were sent. I think it was left open to allow
>>> other ways in case of sending on multiple paths, use some sort of hardware
>>> offloading or something else.
>>>
>>> I can add a statement with a SHOULD if you want. Just let me know.
>>>> it clears that retransmissions are of Data chunks not of SCTP packets. Thus,
>>> when retransmitting a data chunk the same TSN is used.
>> I think a SHOULD is a good solution. I hadn't thought about the potential for inconsistencies on path changes as well as with hardware offloading. I would suggest that a note is added why inconsistencies may occur to motivate the SHOULD.
> I added at the end of Section 3.3.1:
> <t>The TSNs of DATA chunks sent out SHOULD be strictly sequential.</t>
>> Also, it appears that any case which could result in a TSN never being sent is highly problematic for the protocol and should not occur.
>>
>>> That is correct. This is the reason why "assign" was used.
>>>
>>> I added to 2.5.4:
>>> If a user data fragment or unfragmented message needs to be
>>> retransmitted, the TSN assigned to it is used.
>> Looks good.
>>
>>>> Multipath and the destination dependency only: There are several
>>> transport functions that appear to be significantly impacted by which source
>>> – destination pair that is used, but here are only tracked on destination basis.
>>> These include congestion window, path MTU, and path liveness.
>>> There are at least two reasons why things have not been laid out as
>>> source/destination pairs:
>>> * An SCTP implementation can not assume that it can control the source
>>> address. So what
>>>   it can do is to select the destination address and let IP layer determine the
>>> source
>>>   address. That way the source address is a function of the destination
>>> address.
>>> * Having n local addresses and m remote addresses would require to manage
>>> n * m states in
>>>   comparison to m states. In case of a lot of addresses, this might not scale
>>> well.
>> Sure, there might be some platforms where control or even just tracking of used source IP address is problematic. However, on many platforms this is not an issue either. Also, I don't see that n*m instances of the state needed for a path is a particular problematic. Especially for SCTP where you don't use them concurrently, only tracks if they paths work. The amount of state per pair is not particular much. On path level what you are keeping are keep-alive state, last seen RTT, and congestion window state and maybe a little bit. Or are you really seeing that there are much state here?
> On FreeBSD (64-bit arm) the size of the corresponding structure is 736 bytes. So it is not small...
>> I would note that to keep alive state of path is highly dependent on the source address. So if it works an endpoint should know the source address for accurate tracking of the state. I would note that Firewall and  NAT state will definitely be dependent on the source address.
>>
>>>  From my perspective, I think it would be highly relevant to recommend (not mandate) that endpoint tracks the source - destination pairs in regards to keep-alive and when path changes needs to happen it can select from work sets of source and destination pairs. And that tracking of transport functions like congestion control state etc. is thus also kept on src-dst par level rather than solely on destination.
>> Are these really massive changes to the implementations that you know of?
> Not sure what massive means. But they are substantial in my view, especially requiring
> changes to the API and the parametrisation. See my comments above.
>>
>>>>
>>>> Section 2.3:
>>>> Path: The route taken by the SCTP packets sent by one SCTP endpoint to a
>>> specific destination transport address of its peer SCTP endpoint. Sending to
>>> different destination transport addresses does not necessarily guarantee
>>> getting separate paths.
>>> Sure. If you want to avoid single point of failures, you need to take you
>>> network architecture into account.
>>> In case of signalling for example, you often use two networks which don't
>>> share paths (or even equipment).
>> My point was that if the endpoints side is the cause of the failure and one don't switch source address (source network) then no path will work. I guess the peer endpoints keep-alive traffic or switching of their destination might result in the endpoint on the failure side to switch source address. But, frankly that is not given I think how it is currently described. Aren't implementation already doing better control of source address than what the spec hints at?
> The implementation I know (FreeBSD kernel implementation) controls the destination address and leaves
> the source address selection up to the IP layer, which knows the configured routes. The SCTP only
> ensures that the source address selected actually belongs to the end-point.
>>>> I am a bit surprised that there are no mention of the potential impact on
>>> path by sending from another source IP/interface.
>>> This is because an implementation is not required to even be able to control
>>> the source address or the emitting interface. It is assumed that this decision
>>> is done by the IP layer based on the destination address.
>> Understood, and my point is that one really should recommend better control of source address and tracking src-dest pairs to avoid some failure cases.
> See above.
>>>> Missing Term: State Cookie. State Cookie is used extensively in Section 3,
>>> but not really explained until one read through Section 5. So I think it would
>>> be good to have a high level explanation of this term.
>>> I added the following text to Section 2.3 Key Terms:
>>>
>>> State Cookie: A container of all information needed to establish an
>>> association.
>> Thanks.
>>
>>>>
>>>> Section 2.5.5:
>>>> When chunk bundling of retransmissions, are there any refragmentation
>>> done?
>>> No. I added to Section 2.5.3 User Data Fragmentation:
>>>
>>> Once a user message has been fragmented, this fragmentation cannot be
>>> changed anymore.
>> Thanks
>>
>>>> This is later answered in Section 6.10 but could maybe be included as a
>>> fundamental principle that simplifies but limits the protocol. I would in fact
>>> dare to state that this principle can in fact stall out an SCTP association
>>> completely. Because if one have committed to a particular fragmentation,
>>> and the underlying path change PMTU then this user message may never be
>>> delivered if IP fragmentation doesn’t work on the path.
>>> This is true, except this has be be true on all paths.
>>>
>>> It is already state:
>>>
>>> Instead, if the PMTU has been reduced, then IP fragmentation MUST be
>>> used.
>>>
>>> So you mean adding a statement this means in particular that if IP
>>> fragmentation doesn't work, the association will fail? For me this is implied by
>>> the above sentence, but I'm happy to add another note.
>> Yes, I think an additional sentence would be good, something like: "In cases where IP fragmentation is not working on the used paths this can lead to SCTP association failure."
> I added:
> Therefore, an SCTP association can fail if IP fragmentation is not working on any path.
>>>> Section 2.3, and 2.5.6: Endpoint looking up association state based on VTAG
>>> rather than just on transport address pair. The SCTP association is defined to
>>> be on port. However, the SCTPNAT work has shown that this is limiting. Has
>>> there been discussion on making this change in rfc4960bis rather than having
>>> the SCTPNAT apply that update?
>>> I think adding this to the base stack is not appropriate, especially given the
>>> current state of the NAT document.
>>>
>>> We defined the scope of changes to the base specification in RFC 8540, which
>>> was open a couple of years.
>>> We will still addresses mistakes or minor issues which come up after RFC 8540
>>> was finalised and RFC 4960bis is not.
>> Understood, I guess the best way forward is simply to have the SCTP-NAT make an update on this docs functionality.
>>
>>>> Section 3.3.1
>>>> "When a user message is fragmented into multiple chunks, the TSNs are
>>> used by the receiver to reassemble the message. This means that the TSNs
>>> for each fragment of a fragmented user message MUST be strictly
>>> sequential."
>>>> Shouldn't the HOL issue if one uses large user message that blocks also
>>> other streams and an informational reference to the I-DATA chunk in RFC
>>> 8260 be added here?
>>> Sure. I rearranged the text a bit and added:
>>>
>>> <t>Note: The extension described in <xref target='RFC8260'/> can be used to
>>> mitigate the head of line blocking when transferring large user
>>> messages.</t>
>>>
>> Thanks.
>>
>>>> Section 3.3.1:
>>>>
>>>> Stream Sequence Number n: 16 bits (unsigned integer) This value
>>>> represents the Stream Sequence Number of the following user data within
>>> the stream S. Valid range is 0 to 65535.
>>>> For clarity here. I would recommend changing "user data" to "user
>>> message".
>>> But the the user data following might not be a user message in case of
>>> fragmentation. So keeping the more generic term user data which applies
>>> also to parts of user messages.
>> Ok.
>>
>>>> Section 3.3.4
>>>>
>>>> Cumulative TSN Ack: 32 bits (unsigned integer) This parameter contains
>>>> the TSN of the last DATA chunk received in sequence before a gap. In the
>>> case where no DATA chunk has been received, this value is set to the peer's
>>> Initial TSN minus one.
>>>> I initially interpret this definition to always point to the highest TSN that was
>>> received in sequence, i.e. no losses from initial TSN until this. I think maybe
>>> one should make it clear that this is indicating all the TSN received, including
>>> after retransmissions.
>>> OK. Changed to
>>>
>>> The largest TSN, such that all TSNs smaller than or equal to it have been
>>> received and the next one has not been received.
>> Thanks, that is clear.
>>
>>>> Section 3.3.5:
>>>> When a HEARTBEAT chunk is being used for path verification purposes, it
>>> MUST hold a 64-bit random nonce.
>>>> Shouldn't it say a least 64-bit in length random nonce? Are there more text
>>> that defines requirement on the nonce, I assume this is a cryptographically
>>> random nonce, i.e. something that fulfills the requirements in RFC 4086 .
>>> OK. Changed to
>>> When a HEARTBEAT chunk is being used for path verification purposes, it
>>> MUST hold a random nonce of length 64-bit or longer (<xref
>>> target='RFC4086'/> provides some information on randomness
>>> guidelines).</t>
>> Looks good.
>>
>>>> Heartbeat Info field: I would think that there would be a benefit to be
>>> clearer here on a couple of things:
>>>> ·         That it is variable size and sender implementation specific. There are
>>> no discussion about sizes. The TLV field supports the whole TLV to be 2^16-1
>>> bytes. But, I assume that there actually should be a recommendation that
>>> the complete heartbeat should not be larger than one SCTP packet, as it
>>> otherwise have to rely on IP fragmentation.
>>> The HEARTBEAT chunk is left as generic as possible to allow the sender to use
>>> it various scenarios. For example, it might make sense to send packets
>>> containing a HEARTBEAT chunk to test if IP fragmentation actually works or
>>> not. So I would prefer to keep the flexibility...
>> I guess what is there is sufficient for the purpose of the heartbeat info.
>>
>>
>>>> ·         That it is recommended that the sender actually encrypts and integrity
>>> protect the Info with a private key. This for two reasons, the first to prevent
>>> information leakage to third parties on-path. Second to prevent any type of
>>> manipulation of this information. And by encrypting it the 64-bit random
>>> nonce can be replaced by any mechanism that ensures each HB info to be
>>> unique, i.e. a global sequence number would be sufficient as no other party
>>> can read or manipulate it.
>>> SCTP doesn't use encryption (there were ideas about encryption chunks like
>>> auth chunks, but there was never enough support to progress that work) at
>>> all, only some information is integrity protected (the cookie).
>>> In general, there is not much protection against on-path attackers (if not
>>> using SCTP-AUTH), since an on path attacker can always drop packets and
>>> therefore kill the association.
>>> The level of protection of the heartbeat information is up to the
>>> implementation on the sender side. It is possible to do integrity protection or
>>> just sanitise the input. This is not relevant for interoperability, since the
>>> receiver just reflects it.
>>> An implementation should only put information in an HEARTBEAT chunk
>>> which can be read by on path attackers (or the peer).
>> Ok, if you anyway have a crypto library available then one way of resolving all the requirement is to simply encrypt and integrity protect the send timestamp and ensure that the result is longer than 64-bit, which should be trivial.
>>
>> So to conclude no change.
> Just to be clear: Any implementation can encrypt the HEARTBEAT chunks or (possibly more important if you
> want to go down that path) the cookie. But it is not required.
>>>>
>>>> Section 3.3.8:
>>>>
>>>> Cumulative TSN Ack: 32 bits (unsigned integer) This parameter contains
>>>> the TSN of the last chunk received in sequence before any gaps.
>>>>
>>>> Once more the "in sequence before any gaps" which I think needs a proper
>>> definition that makes it actually represent what it should.
>>> Changed to
>>>
>>> <t>The largest TSN, such that all TSNs smaller than or equal to it have been
>>> received and the next one has not been received.</t>
>> Works for me.
>>
>>>> "Since SHUTDOWN does not contain Gap Ack Blocks, the receiver of the
>>> SHUTDOWN MUST NOT interpret the lack of a Gap Ack Block as a renege.
>>> (See Section 6.2 for information on reneging.)"
>>>> Section 9.2 says:
>>>> To indicate any gaps in TSN, the endpoint MAY also bundle a SACK with the
>>> SHUTDOWN chunk in the same SCTP packet.
>>>> Considering the comment here regarding gap-acks, shouldn't it be explicit
>>> to state that one can use SACK chunks together with the SHUTDOWN to
>>> indicate beyond the Cumulative TSN Ack value?
>>> OK, I added:
>>>
>>> <t>The sender of the SHUTDOWN chunk MAY bundle a SACK chunk to
>>> indicate any gaps in the received TSNs.</t>
>> Works for me.
>>
>>>> Section 3.3.9
>>>>
>>>> I don't find anything in the document about what one does if one receives
>>> a SHUTDOWN ACK without having sent a SHUTDOWN. I assume it should be
>>> dropped, but I don't find anything on this. I find a lot said about SHUTDOWN
>>> ACK handling for OOB, and in cases when the SHUTDOWN process are
>>> beyond having sent the SHUTDOWN ACK once.
>>> The reason is that this can't happen in valid scenarios. The reason you find a
>>> lot about SHUTDOWN ACK chunk handling is because this can happen if the
>>> SHUTDOWN COMPLETE chunk was sent and lost. For a graceful teardown,
>>> this described handling of an OOTB SHUTDOWN-ACK is required.
>>>
>>> An unexpected SHUTDOWN-ACK is a message which can handled by just
>>> ignoring or terminating the association ungracefully. That is up to the
>>> implementation.
>>>
>>> This is actually tested in SCTP_DM_O_4_10 of the ETSI Testsuite specified in
>>> https://protect2.fireeye.com/v1/url?k=787e0a85-27e533c8-787e4a1e-
>>> 86b568293eb5-21ca3a5c7fcc3421&q=1&e=8fb1eb66-2c88-474d-96b1-
>>> 9abec7fd6319&u=https%3A%2F%2Fwww.etsi.org%2Fdeliver%2Fetsi_ts%2F1
>>> 02300_102399%2F102369%2F01.01.01_60%2Fts_102369v010101p.pdf
>> Ok, I guess that this is sufficient explanation.
>>
>>>> Section 3.3.10
>>>>
>>>> What do you do with unknown cause codes? Are there recognition
>>> required codes, vs errors that are mostly informative but enables the
>>> association to continue? Anyway some text about what to do if one don’t
>>> know the cause code is needed here.
>>> The only error cause requiring an action is the stale cookie error. Section 5.2.6
>>> describes the handling.
>>> Most of the other error causes are informative (FreeBSDs implementation
>>> does disable features which are negotiated but lateron chunks/parameters
>>> are reported to be unknown, this is mostly to deal with broken
>>> implementations).
>>>
>>> However, received ERROR chunks are reported via the API described in
>>> Section 11.2.6 or
>>> https://datatracker.ietf.org/doc/html/rfc6458#section-6.1.3
>>>
>>> So the handling is mostly left up to the implementer, as long as it is not
>>> important for interoperability (as in the stale cookie case).
>> Okay, if this was a new protocol I would have argued for ranges and/or some "comprehension required" flag. I think the question remains if there are an consideration for future values that should be defined here. Will it always be safe to ignore these, or that is resolved through the functionality using the INIT to indicate that you are required to understand this option.
> I think it is clear that future extensions cannot expect any specific processing for unknown error chunks.
> Extensions normally are negotiated during the handshake.
>>>> Section 3.3.10.13:
>>>>
>>>> “An implementation MAY provide additional information specifying what
>>> kind of protocol violation has been detected.”
>>>> In what format is this information. Is it UFT-8 text, or binary or what?
>>> It is left open... More or less intentionally. Some people thought providing
>>> information helps, some people thought providing information only helps
>>> attackers...
>>>
>>> The FreeBSD stack puts in an ASCII string describing the protocol violation
>>> case in a sense that you can look up the code which generated this ABORT
>>> chunk. This helps at least while debugging issues and you only have an
>>> tracefile of the communication.
>> Ok, but shouldn't this information have a defined format, like UTF-8 string? That would at least allow implementation or debugging tools to know how to present the information.
> I don't think it is a good idea to require some format after the cause are defined and deployed.
> No issues have been reported to me in the 20 years or so I'm looking at SCTP tracefiles.
> For the case of FreeBSD, we are sending out ASCII strings which at least allow us to figure
> out which code path was taken.
> In general, I agree. But can't remember why it was left open. Possibly, because specifying is
> not required for interoperability. And whether it is a good idea to provide any information
> at all or not, wasn't clear...
>>>> Section 4:
>>>> In the SHUTDOWN-SENT state, the endpoint MUST acknowledge any
>>> received DATA chunks without delay.
>>>> Is this formulation causing a potential for creating abnormal traffic patterns
>>> from an SCTP endpoint. If the attacker initiates an association, then sends
>>> some DATA packets or user protocol messages that will result in the peer
>>> shutting down the association, but before that creates a situation where a
>>> large number of DATA chunks appears to be outstanding, will that create a
>>> situation where the endpoint will ACK every incoming packet?
>>> Sure, a peer can do that. But to trigger the sending of a packet containing
>>> SACK chunks for each received packet, it can just leave a gap in the TSN
>>> space. So I guest this does not increase the attack surface substantially, if at
>>> all.
>> Ok. I guess that is acceptable.
>>
>>>>
>>>> Section 5.1, B) :
>>>> The Init ACK response is explicit about what to set the destination IP
>>> address. However, it is not explicit about the need to send the packet back
>>> with the source and destination ports flipped from the INIT.
>>> The text you are referring to just focusses on the destination address, not of
>>> the source address or the port numbers.
>>> For consistency with other text in the document, I would like to keep it.
>>> I'm not aware of any implementation, which was not using the correct port
>>> numbers.
>> Yes, I can understand that this is unlikely to be a practical problem. However, if one as cant send it back using the same source address as it arrived on, in certain networks it will not make it due to NAT/Firewalls. I think this needs to be take into the larger context of
> Don't you need to setup the routes correctly? But, as you say, this is an issue of the source address selection.
> So see my comments above.
>> source address handling in general.
>>
>>>>
>>>> Section 5.1.3:
>>>>
>>>> An implementation SHOULD make the cookie as small as possible to ensure
>>> interoperability.
>>>> I find this statement strange. I can understand performance issues etc with
>>> to large cookies, however interoperability. And if there exists issues, should
>>> there be any size requirements here?
>>> The point is that the cookie is put into the INIT-ACK chunk and
>>> * this INIT-ACK must make it to the peer, possibly relying on IP
>>> fragmentation, and
>>> * the receiver of the INIT-ACK must be able to process the INIT-ACK chunk
>>> and reflect the cookie, and
>>> * the COOKIE chunk must make it to the sender of the INIT-ACK, possibly
>>> relying on IP fragmentation.
>>> In case of using IP fragmentation, the end-points must be willing to
>>> reassemble the corresponding packets.
>>>
>>> To increase the probability to have all of the above work, it is best that the
>>> cookie is as small as possible.
>>>
>>> It is hard to make any size requirements, since, for example, the number of
>>> addresses of both endpoints can be large...
>> Very reasonable arguments. But is the potential need to rely on IP fragmentation an interoperability issue? I fully agree the need to keep the cookie small, however should it say Interoperability at the end of the quoted sentence, or should it say functionality?
> For me interoperability describes it better, since both sides are involved. But I'm
> willing to change the wording, if others also think that the current wording is bad.
>>
>>>> Section 6
>>>>
>>>> When converting user messages into DATA chunks, an endpoint MUST
>>> fragment user messages into multiple DATA chunks.
>>>> If read as written, this formulation implies that even a User messages that
>>> can be sent in a single data chunk fitting within a single SCTP packet MUST be
>>> fragmented. I don't think that is intended and the purpose of the MUST is a
>>> bit strange. Is this intended to say that user messages that cannot be sent
>>> within a single DATA chunk, i.e.  that are larger than AMDCS MUST be
>>> fragmented into multiple data chunks, so that each DATA chunk SHOULD be
>>> no larger than AMDCS.
>>>> Or if the intention is to say that each user message MUST be sent as one or
>>> more DATA chunks then some reformulation is needed. This later comment
>>> is related to the plural of the "user messages".
>>> Yepp, I screwed up this text. Thanks a lot for spotting it!
>>>
>>> I changed it to:
>>>
>>> When converting large user messages into DATA chunks, an endpoint MUST
>>> fragment user messages into multiple DATA chunks.
>>>
>>> The size restriction is given in the next sentence:
>>> The size of each DATA chunk SHOULD be smaller than or equal to the
>>> Association Maximum DATA Chunk Size (AMDCS).
>> Okay, that works.
>>
>>>> Section 6.1:
>>>> When the receiver has no buffer space, a probe being sent is called a zero
>>> window probe. A zero window probe SHOULD only be sent when all
>>> outstanding DATA chunks have been cumulatively acknowledged and no
>>> DATA chunks are in flight. Zero window probing MUST be supported.
>>>> I struggled with this paragraph. My understanding after having read the
>>> whole section is that a zero window probe is a packet that is sent when there
>>> is less RWND than one PMDCS available, and no outstanding packets needing
>>> retransmission. And in that case the sender can send a zero window probe.
>>> That probe can be up to PMDCS worth of new data to probe if the window
>>> has been increased. I think this section could benefit from some very high
>>> level description and how the zero window probe relate to this.
>>> The point is:
>>> * The sender has no outstanding data. Everything is cum-acked.
>>> * The sender can not send the next DATA chunk, because the rwnd is not
>>> large enough.
>>> Then the sender can send a packet containing this DATA chunk as a zero
>>> window probe.
>>> It is required to compensate for a lost window update.
>>>
>>> Zero window probes are only one aspect of Section 6.1, so I'm not sure why
>>> we should focus on them.
>> So, this comment was in some sense related to two things.
>>
>> First, that the first part of 6.1 dives straight into normative text without any explanation to what it is for.
> This is also done in various other subsections.
>> Secondly, it introduces this new concept zero window probe in the middle of it and it is hard to interpret.
> OK, I added to the beginning of 6.1:
>
> <t>This section specifies the rules for sending DATA chunks.
> In particular, it defines zero window probing, which is required to
> avoid the indefinte stalling of an association in case of a loss of
> packets containing SACK chunks performing window updates.</t>
>   
>> Therefore my suggestion was an introduction high level description of what this section does and which hopefully can include some better introduction of zero window probe among the other things this section does.
>>
>>
>>>> Section 6.1:
>>>>
>>>> The data sender SHOULD NOT use a TSN that is more than 2**31 - 1 above
>>> the beginning TSN of the current send window.
>>>> Is this another way of sending that a sender SHOULD NOT attempt to have
>>> more than 2**31-1 packets outstanding counted between cumulative
>>> acknowledged and highest TSN sent mod 2**32?
>>> Except that it is not about packets, but DATA chunks....
>> Ok, you are right it is DATA chunks. I guess no change necessary.
>>
>>>> Section 6.2:
>>>>
>>>> If the new incoming DATA chunk holds a TSN value less than the largest TSN
>>> received so far, then the receiver SHOULD drop the largest TSN held for
>>> reordering and accept the new incoming DATA chunk.
>>>> I find this sentence confusing. There is no qualification of this
>>>> sentence that the largest TSN actually is containing data beyond
>>>> available receiver window, i.e. being a zero window probe. If a sender
>>>> is behaving and keeping within window earlier
>>> The condition from the first sentence still applies also to the second
>>> sentence.
>>>> advertised. Then providing a rwnd to the sender of 0 is basically saying to it
>>> stop sending new data, and even if there was some window left from earlier
>>> I withdraw that. But, this appears to indicate that the receiver should throw
>>> away data that has been received within limits that was available when the
>>> sender transmitted. Is this how it is intended, or is this primarily to deal with
>>> received zero window probes needing to be thrown away when
>>> retransmissions occur?
>>> The point is that we want to make sure that the association can make
>>> progress. So if there is no receiver space anymore, drop higher TSNs for
>>> lower ones. That way the association can make progress. This is one example
>>> of renegging.
>> Okay, I understand this. Maybe the quoted sentence could be tweaked to make it clear that it is in that situation of the first sentence, because that sentence stands on its own.
>>
>>    When the receiver's advertised window is 0, the receiver MUST drop
>>    any new incoming DATA chunk with a TSN larger than the largest TSN
>>    received so far. Also, if the new incoming DATA chunk holds a TSN value
>>    less than the largest TSN received so far, then the receiver SHOULD
>>    drop the largest TSN held for reordering and accept the new incoming
>>    DATA chunk.
>>
>> SO I think a better linking to the first was needed. Secondly, on re-reading it, I assume this is a SHOULD because the receiver may in
> Done.
>> fact have the space available without dropping the largest received TSN for a TSN that fills in a gap. Maybe that should be clarified with some addition to the second sentence. For example: ", in case it is necessary to be able to receive the new incoming DATA chunk."
> What is described is a way of avoiding that the data transfer stalls indefinitely by using renegging.
> It is a SHOULD, because you can implement a different strategy.
>>
>>>>
>>>> Section 3.3.1  and 6.1: I think the assignment of TSN to Data Chunks are
>>> underspecified. It is not clear that each Data chunk needs a TSN increased for
>>> each DATA chunk that is bundled. Also, that they need to be strictly
>>> sequential is also not mandated anywhere what I can find. Section 6.9 and
>>> 6.10 implies things but it is not generally specified.
>>> See my response at the beginning of the e-mail.
>>>> Section 6.2.1 c)
>>>>
>>>> Any time a DATA chunk is marked for retransmission, either via T3-rtx timer
>>> expiration (Section 6.3.3) or via Fast Retransmit (Section 7.2.4), add the data
>>> size of those chunks to the rwnd.
>>>> So is this C) option to re add the data that was subtracted in B) by the
>>> scheduling of the retransmission? I would assume that retransmission are
>>> done using already committed RWND space? It is not clear that this is
>>> intended to negate execution of B).
>>> If you transmit or retransmit a DATA chunk, you subtract it, if you declare it
>>> lost you add it.
>>> That way transmitting and retransmitting a DATA chunk is counted only once.
>> Ok, that works I guess and maybe makes it simpler when it comes to PR-SCTP also.
>>
>>>> Section 6.3.1:
>>>>
>>>> Shouldn't there be a reference here to Section 16 as there are so many
>>> parameters used.
>>> I added:
>>>
>>> <t>See <xref target='sec_parameter_values'/> for suggested parameter
>>> values.</t>
>> Thanks.
>>
>>>> Section 6.5:
>>>> The Stream Sequence Number in all the streams MUST start from 0 when
>>> the association is established. Also, when the Stream Sequence Number
>>> reaches the value 65535 the next Stream Sequence Number MUST be set to
>>> 0.
>>>> This is not stating that all new User Messages transmitted on a
>>>> particular streamID with ordered delivery increases the Stream Sequence
>>> Number with 1. Nor is it explicit that unordered MUST NOT increase the
>>> sequence number, which should follow if the ordered are required to be
>>> increased by only 1.
>>> I added:
>>>
>>> <t>The Stream Sequence Number in all the outgoing streams MUST start
>>> from 0 when the association is established.
>>> The Stream Sequence Number of an outgoing stream MUST be incremented
>>> by 1 for each ordered user message sent on that outgoing stream.
>>> In particular, when the Stream Sequence Number reaches the value 65535
>>> the next Stream Sequence Number MUST be set to 0.
>>> For unordered user messages the Stream Sequence Number MUST NOT be
>>> changed.</t>
>> That looks very clear.
>>
>>>> Section 7.1:
>>>>
>>>> The sender keeps a separate congestion control parameter set for each of
>>> the destination addresses it can send to (not each source-destination pair
>>> but for each destination).
>>>> I am quite surprised by this. Because the network path may be significantly
>>> different depending on source address. So what is the motivation behind
>>> keeping SCTP like this? Partial path sharing may occur in both cases, but
>>> assuming that they are the same independent on the source address
>>> appears strange.
>>> It is not assumed that the SCTP stack can select the source address (or
>>> interface). So the source address is a function of the destination address.
>> Understood. I still think this is problematic. For any endpoint that is properly multihomed, for example by two different providers the endpoint can change the path significantly without changing the destination address.
>>
>> I would really like to hear other view on this.
>>
>>>> Section 7.3:
>>>>
>>>> An endpoint SHOULD apply these techniques, and SHOULD do so on a per-
>>> destination-address basis.
>>>> How can this not be done based on source and destination pairs. If the first
>>> hop(s) from the source are the MTU limiting part, then it source address will
>>> be the most important factor in the PMTU value that is possible to use.
>>> As said above: the source address is a function of the destination address. At
>>> least that is the model used.
>>>> Section 11.1.14:
>>>>
>>>> “This primitive reads a user message, which has never been sent, into the
>>> buffer specified by ULP.”
>>>> I am uncertain what this does. Is it removing a message from the
>>> transmission buffer prior to having been sent?
>>> In case the message is not sent (the association failed or partial reliability
>>> kicks in), the stack will trigger a SEND FAILURE notification (See 11.2.2) and
>>> the application can retrieve the message using RECEIVE_UNACKED and
>>> RECEIVE_UNSENT. That way the application can decide the re-route the
>>> messages.
>> Ok. I guess based on that I understood it correctly that it is sufficient.
>>
>>>> Also, I don't see how the message can have a stream sequence number
>>> associated with it. As if it wasn't sent, this was not committed, and the next
>>> in sequence user message on this stream will use the stream sequence
>>> number of this removed message.
>>> It is an optional parameter and the socket API actually doesn't provide the
>>> SSN.
>>> See https://datatracker.ietf.org/doc/html/rfc6458#section-6.1.11
>>>
>>> But if a stack does an early assignment and the association fails, it is clear
>>> which SSN would have been used.
>> Ok, I for that later case it makes sense.
>>
>>>> Section 12.2.3: Shouldn't RFC 6083 be noted as an alternative solution for
>>> confidentiality here?
>>> OK. The new text reads:
>>>
>>> If that is true, encryption of the SCTP user data only might be considered.
>>> As with the supplementary checksum service, user data encryption MAY be
>>> performed by the SCTP user application.
>>> <xref target='RFC6083'/> MAY be used for this.
>>>
>> Yes, that works.
>>
>>
>>>> Section 16: Looking on a number of parameters here I find that they appear
>>> to be far from the values I would expect to provide good performance on
>>> today's Internet. Shouldn't these values have been updated a bit.
>>> I actually update RTO.Initial from 3 seconds to 1 second based on RFC 6289.
>>> This is still the value suggested by RFC 8961. According to RFC 8961 RTO.Max
>>> can't be less than
>>> 60 seconds. RTO.Min is still based on RFC 6289, whereas RFC 8961 does not
>>> provide a value.
>>> So I'm not sure, which RFC we can use to specify a value for RTO.Min...
>>>
>>> On the other hand, the SCTP allows to change this value on a per association
>>> granularity, so everyone can change these values.
>>>
>>> For a performance perspective, it might make more sense to adopt RACK as
>>> the loss recovery algorithm for SCTP than to finding better values for
>>> RTO.Min...
>> I guess you are correct. I would think that quite a lot of QUICs adaptation could be used in SCTP.
> Which QUIC features do you have in mind which could be integrated into
> SCTP to improve it?
>>>> For example RTO.min of 1 s looks ridiculous. Isn't the limit factors here
>>> related to clock resolution and the smallest RTT are very short today, while
>>> still one need to take into account Internet usage. Therefore a  RTO.Initial of
>>> 1 seconds is not particularly problematic. So aren't there more modern
>>> recommendations to be made here?
>> I guess this is a bigger separate work to actually look into an alternative congestion controller for SCTP, that clearly should be separate.
> I guess most of the CC available for TCP could be implemented for SCTP...
>> Cheers
>>
>> Magnus Westerlund
>>